system-plan/PHASES/FASE_2_PROCESAMIENTO_IA.md

# FASE 2: PROCESAMIENTO IA

**Complejidad:** Compleja
**Duración estimada:** 3-5 días
**Prioridad:** ALTA

---

## OBJETIVO

Desplegar GRACE en RunPod para procesamiento de:
- ASR (Speech-to-Text)
- OCR (Imágenes a texto)
- TTS (Text-to-Speech)
- Embeddings (Vectorización semántica)
- Face Detection
- Avatar Generation

---

## PREREQUISITOS

- [x] FASE 1 completada
- [ ] Cuenta RunPod con créditos
- [ ] API Key de RunPod
- [ ] Docker Hub o registro privado para imágenes

---

## PASO 2.1: Preparar imagen Docker para RunPod

### Estructura del handler

El archivo `grace/runpod/handler.py` ya está implementado y soporta:

| Módulo | Modelo | VRAM |
|--------|--------|------|
| ASR_ENGINE | Faster Whisper Large V3 | ~4GB |
| OCR_CORE | GOT-OCR 2.0 | ~8GB |
| TTS | XTTS-v2 | ~4GB |
| FACE_VECTOR | InsightFace Buffalo L | ~2GB |
| EMBEDDINGS | BGE-Large | ~2GB |
| AVATAR_GEN | SDXL Base 1.0 | ~8GB |

### Dockerfile optimizado

```dockerfile
# grace/runpod/Dockerfile
FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04

WORKDIR /app

# Variables de entorno
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/models
ENV HF_HOME=/app/models

# Dependencias del sistema
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsm6 \
    libxext6 \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Dependencias Python base
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Precargar modelos para reducir cold start
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cpu', compute_type='int8')"
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')"

# Código del handler
COPY handler.py .

# Iniciar handler de RunPod
CMD ["python", "-u", "handler.py"]
```

### requirements.txt

```
runpod>=1.3.0
torch>=2.1.0
transformers>=4.36.0
faster-whisper>=0.10.0
TTS>=0.22.0
sentence-transformers>=2.2.2
insightface>=0.7.3
onnxruntime-gpu>=1.16.0
diffusers>=0.24.0
accelerate>=0.25.0
safetensors>=0.4.0
Pillow>=10.0.0
opencv-python-headless>=4.8.0
numpy>=1.24.0
boto3>=1.34.0
```

---

## PASO 2.2: Construir y subir imagen

### Opción A: Docker Hub

```bash
# En máquina con Docker
cd grace/runpod

# Construir
docker build -t tzzr/grace-gpu:v1.0 .

# Login a Docker Hub
docker login

# Subir
docker push tzzr/grace-gpu:v1.0
```

### Opción B: RunPod Registry

```bash
# Usar la CLI de RunPod
runpodctl build --name grace-gpu --tag v1.0 .
```

---

## PASO 2.3: Crear template en RunPod

### Via Dashboard

1. Ir a RunPod → Templates → Create Template
2. Configurar:
   - **Name**: GRACE-GPU
   - **Container Image**: tzzr/grace-gpu:v1.0
   - **Container Disk**: 50GB
   - **Volume Disk**: 100GB (para modelos)
   - **Volume Mount Path**: /app/models
   - **Expose HTTP Ports**: 8000
   - **Expose TCP Ports**: (vacío)

### Via API

```bash
RUNPOD_API_KEY="<tu_api_key>"

curl -X POST "https://api.runpod.io/graphql" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "query": "mutation { saveTemplate(input: { name: \"GRACE-GPU\", imageName: \"tzzr/grace-gpu:v1.0\", dockerArgs: \"\", containerDiskInGb: 50, volumeInGb: 100, volumeMountPath: \"/app/models\", ports: \"8000/http\", isServerless: true }) { id name } }"
  }'
```

---

## PASO 2.4: Crear endpoint serverless

### Via Dashboard

1. Ir a Serverless → Create Endpoint
2. Configurar:
   - **Name**: GRACE-Endpoint
   - **Template**: GRACE-GPU
   - **GPU Type**: RTX 4090 (24GB VRAM)
   - **Min Workers**: 0
   - **Max Workers**: 3
   - **Idle Timeout**: 5 segundos
   - **Flash Boot**: Enabled

### Configuración recomendada por módulo

| Módulo | GPU Mínima | GPU Recomendada |
|--------|------------|-----------------|
| ASR_ENGINE | RTX 3080 (10GB) | RTX 4090 (24GB) |
| OCR_CORE | RTX 3090 (24GB) | RTX 4090 (24GB) |
| TTS | RTX 3080 (10GB) | RTX 4090 (24GB) |
| FACE_VECTOR | RTX 3060 (8GB) | RTX 4090 (24GB) |
| EMBEDDINGS | RTX 3060 (8GB) | RTX 4090 (24GB) |
| AVATAR_GEN | RTX 3090 (24GB) | RTX 4090 (24GB) |

---

## PASO 2.5: Probar endpoint

### Test ASR_ENGINE

```bash
ENDPOINT_ID="<tu_endpoint_id>"
RUNPOD_API_KEY="<tu_api_key>"

# Crear audio de prueba (o usar uno existente)
# ffmpeg -f lavfi -i "sine=frequency=440:duration=3" -ar 16000 test.wav

# Codificar en base64
AUDIO_B64=$(base64 -w0 test.wav)

# Enviar request
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "input": {
      "contract_version": "2.1",
      "profile": "LITE",
      "envelope": {
        "trace_id": "test-asr-001"
      },
      "routing": {
        "module": "ASR_ENGINE"
      },
      "payload": {
        "type": "audio",
        "encoding": "base64",
        "content": "'$AUDIO_B64'"
      },
      "context": {
        "lang": "es"
      }
    }
  }'
```

### Respuesta esperada

```json
{
  "id": "...",
  "status": "COMPLETED",
  "output": {
    "contract_version": "2.1",
    "status": {"code": "SUCCESS"},
    "result": {
      "schema": "asr_output_v1",
      "data": {
        "text": "...",
        "language_detected": "es",
        "duration_seconds": 3.0,
        "segments": [...]
      }
    },
    "quality": {
      "confidence": 0.95
    }
  }
}
```

### Test OCR_CORE

```bash
# Imagen de prueba
IMAGE_B64=$(base64 -w0 test_image.png)

curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "input": {
      "contract_version": "2.1",
      "routing": {"module": "OCR_CORE"},
      "payload": {
        "type": "image",
        "encoding": "base64",
        "content": "'$IMAGE_B64'"
      }
    }
  }'
```

---

## PASO 2.6: Documentar endpoint

### Guardar en credentials

```bash
# En ARCHITECT, actualizar repo credentials
cd /tmp && rm -rf credentials
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" \
  git clone ssh://git@localhost:2222/tzzr/credentials.git

cd credentials

cat >> inventario/08-gpu-runpod.md << 'EOF'

## GRACE Endpoint (Actualizado 2025-12-24)

| Parámetro | Valor |
|-----------|-------|
| Endpoint ID | <endpoint_id> |
| Template | GRACE-GPU v1.0 |
| GPU | RTX 4090 |
| Max Workers | 3 |
| Idle Timeout | 5s |

### Módulos disponibles

- ASR_ENGINE (Whisper Large V3)
- OCR_CORE (GOT-OCR 2.0)
- TTS (XTTS-v2)
- FACE_VECTOR (InsightFace)
- EMBEDDINGS (BGE-Large)
- AVATAR_GEN (SDXL)

### Ejemplo de uso

```bash
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Authorization: Bearer ${RUNPOD_API_KEY}" \
  -d '{"input": {...}}'
```
EOF

git add -A
git commit -m "Documentar GRACE endpoint RunPod"
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" git push origin main
```

---

## PASO 2.7: Integrar con DECK

### Crear cliente GRACE en DECK

```python
# /opt/deck/grace_client.py

import os
import base64
import requests
from typing import Dict, Any, Optional

class GraceClient:
    def __init__(self):
        self.endpoint_id = os.getenv("GRACE_ENDPOINT_ID")
        self.api_key = os.getenv("RUNPOD_API_KEY")
        self.base_url = f"https://api.runpod.ai/v2/{self.endpoint_id}"

    def call(self, module: str, content: bytes, context: Dict = None) -> Dict[str, Any]:
        """Llamar a módulo GRACE"""
        payload = {
            "input": {
                "contract_version": "2.1",
                "routing": {"module": module},
                "payload": {
                    "type": self._get_type(module),
                    "encoding": "base64",
                    "content": base64.b64encode(content).decode()
                },
                "context": context or {}
            }
        }

        response = requests.post(
            f"{self.base_url}/runsync",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json=payload,
            timeout=120
        )

        return response.json()

    def _get_type(self, module: str) -> str:
        types = {
            "ASR_ENGINE": "audio",
            "OCR_CORE": "image",
            "TTS": "text",
            "FACE_VECTOR": "image",
            "EMBEDDINGS": "text",
            "AVATAR_GEN": "text"
        }
        return types.get(module, "binary")

    def transcribe(self, audio_bytes: bytes, lang: str = "es") -> str:
        """Convenience method para ASR"""
        result = self.call("ASR_ENGINE", audio_bytes, {"lang": lang})
        return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")

    def ocr(self, image_bytes: bytes) -> str:
        """Convenience method para OCR"""
        result = self.call("OCR_CORE", image_bytes)
        return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")

    def embed(self, text: str) -> list:
        """Convenience method para embeddings"""
        result = self.call("EMBEDDINGS", text.encode(), {})
        return result.get("output", {}).get("result", {}).get("data", {}).get("vector", [])
```

---

## CHECKLIST FINAL FASE 2

- [ ] 2.1 - Dockerfile preparado
- [ ] 2.2 - Imagen subida a registro
- [ ] 2.3 - Template creado en RunPod
- [ ] 2.4 - Endpoint serverless configurado
- [ ] 2.5 - Tests exitosos (ASR, OCR, etc.)
- [ ] 2.6 - Credenciales documentadas
- [ ] 2.7 - Cliente integrado en DECK

---

## MÉTRICAS DE ÉXITO

| Métrica | Valor Esperado |
|---------|----------------|
| Cold start | < 60s |
| Warm ASR (30s audio) | < 10s |
| Warm OCR (imagen) | < 5s |
| Disponibilidad | > 99% |

---

## COSTOS ESTIMADOS

| Uso | GPU | Costo/hora | Estimado mensual |
|-----|-----|------------|------------------|
| Bajo (10 req/día) | RTX 4090 | $0.69 | ~$5-10 |
| Medio (100 req/día) | RTX 4090 | $0.69 | ~$30-50 |
| Alto (1000 req/día) | RTX 4090 | $0.69 | ~$100-200 |

---

## SIGUIENTE FASE

Continuar con [FASE_3_FLUJO_EMPRESARIAL.md](FASE_3_FLUJO_EMPRESARIAL.md)