# FASE 2: PROCESAMIENTO IA **Complejidad:** Compleja **Duración estimada:** 3-5 días **Prioridad:** ALTA --- ## OBJETIVO Desplegar GRACE en RunPod para procesamiento de: - ASR (Speech-to-Text) - OCR (Imágenes a texto) - TTS (Text-to-Speech) - Embeddings (Vectorización semántica) - Face Detection - Avatar Generation --- ## PREREQUISITOS - [x] FASE 1 completada - [ ] Cuenta RunPod con créditos - [ ] API Key de RunPod - [ ] Docker Hub o registro privado para imágenes --- ## PASO 2.1: Preparar imagen Docker para RunPod ### Estructura del handler El archivo `grace/runpod/handler.py` ya está implementado y soporta: | Módulo | Modelo | VRAM | |--------|--------|------| | ASR_ENGINE | Faster Whisper Large V3 | ~4GB | | OCR_CORE | GOT-OCR 2.0 | ~8GB | | TTS | XTTS-v2 | ~4GB | | FACE_VECTOR | InsightFace Buffalo L | ~2GB | | EMBEDDINGS | BGE-Large | ~2GB | | AVATAR_GEN | SDXL Base 1.0 | ~8GB | ### Dockerfile optimizado ```dockerfile # grace/runpod/Dockerfile FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 WORKDIR /app # Variables de entorno ENV PYTHONUNBUFFERED=1 ENV TRANSFORMERS_CACHE=/app/models ENV HF_HOME=/app/models # Dependencias del sistema RUN apt-get update && apt-get install -y \ ffmpeg \ libsm6 \ libxext6 \ libgl1-mesa-glx \ && rm -rf /var/lib/apt/lists/* # Dependencias Python base COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Precargar modelos para reducir cold start RUN python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cpu', compute_type='int8')" RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')" # Código del handler COPY handler.py . # Iniciar handler de RunPod CMD ["python", "-u", "handler.py"] ``` ### requirements.txt ``` runpod>=1.3.0 torch>=2.1.0 transformers>=4.36.0 faster-whisper>=0.10.0 TTS>=0.22.0 sentence-transformers>=2.2.2 insightface>=0.7.3 onnxruntime-gpu>=1.16.0 diffusers>=0.24.0 accelerate>=0.25.0 safetensors>=0.4.0 Pillow>=10.0.0 opencv-python-headless>=4.8.0 numpy>=1.24.0 boto3>=1.34.0 ``` --- ## PASO 2.2: Construir y subir imagen ### Opción A: Docker Hub ```bash # En máquina con Docker cd grace/runpod # Construir docker build -t tzzr/grace-gpu:v1.0 . # Login a Docker Hub docker login # Subir docker push tzzr/grace-gpu:v1.0 ``` ### Opción B: RunPod Registry ```bash # Usar la CLI de RunPod runpodctl build --name grace-gpu --tag v1.0 . ``` --- ## PASO 2.3: Crear template en RunPod ### Via Dashboard 1. Ir a RunPod → Templates → Create Template 2. Configurar: - **Name**: GRACE-GPU - **Container Image**: tzzr/grace-gpu:v1.0 - **Container Disk**: 50GB - **Volume Disk**: 100GB (para modelos) - **Volume Mount Path**: /app/models - **Expose HTTP Ports**: 8000 - **Expose TCP Ports**: (vacío) ### Via API ```bash RUNPOD_API_KEY="" curl -X POST "https://api.runpod.io/graphql" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -d '{ "query": "mutation { saveTemplate(input: { name: \"GRACE-GPU\", imageName: \"tzzr/grace-gpu:v1.0\", dockerArgs: \"\", containerDiskInGb: 50, volumeInGb: 100, volumeMountPath: \"/app/models\", ports: \"8000/http\", isServerless: true }) { id name } }" }' ``` --- ## PASO 2.4: Crear endpoint serverless ### Via Dashboard 1. Ir a Serverless → Create Endpoint 2. Configurar: - **Name**: GRACE-Endpoint - **Template**: GRACE-GPU - **GPU Type**: RTX 4090 (24GB VRAM) - **Min Workers**: 0 - **Max Workers**: 3 - **Idle Timeout**: 5 segundos - **Flash Boot**: Enabled ### Configuración recomendada por módulo | Módulo | GPU Mínima | GPU Recomendada | |--------|------------|-----------------| | ASR_ENGINE | RTX 3080 (10GB) | RTX 4090 (24GB) | | OCR_CORE | RTX 3090 (24GB) | RTX 4090 (24GB) | | TTS | RTX 3080 (10GB) | RTX 4090 (24GB) | | FACE_VECTOR | RTX 3060 (8GB) | RTX 4090 (24GB) | | EMBEDDINGS | RTX 3060 (8GB) | RTX 4090 (24GB) | | AVATAR_GEN | RTX 3090 (24GB) | RTX 4090 (24GB) | --- ## PASO 2.5: Probar endpoint ### Test ASR_ENGINE ```bash ENDPOINT_ID="" RUNPOD_API_KEY="" # Crear audio de prueba (o usar uno existente) # ffmpeg -f lavfi -i "sine=frequency=440:duration=3" -ar 16000 test.wav # Codificar en base64 AUDIO_B64=$(base64 -w0 test.wav) # Enviar request curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -d '{ "input": { "contract_version": "2.1", "profile": "LITE", "envelope": { "trace_id": "test-asr-001" }, "routing": { "module": "ASR_ENGINE" }, "payload": { "type": "audio", "encoding": "base64", "content": "'$AUDIO_B64'" }, "context": { "lang": "es" } } }' ``` ### Respuesta esperada ```json { "id": "...", "status": "COMPLETED", "output": { "contract_version": "2.1", "status": {"code": "SUCCESS"}, "result": { "schema": "asr_output_v1", "data": { "text": "...", "language_detected": "es", "duration_seconds": 3.0, "segments": [...] } }, "quality": { "confidence": 0.95 } } } ``` ### Test OCR_CORE ```bash # Imagen de prueba IMAGE_B64=$(base64 -w0 test_image.png) curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $RUNPOD_API_KEY" \ -d '{ "input": { "contract_version": "2.1", "routing": {"module": "OCR_CORE"}, "payload": { "type": "image", "encoding": "base64", "content": "'$IMAGE_B64'" } } }' ``` --- ## PASO 2.6: Documentar endpoint ### Guardar en credentials ```bash # En ARCHITECT, actualizar repo credentials cd /tmp && rm -rf credentials GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" \ git clone ssh://git@localhost:2222/tzzr/credentials.git cd credentials cat >> inventario/08-gpu-runpod.md << 'EOF' ## GRACE Endpoint (Actualizado 2025-12-24) | Parámetro | Valor | |-----------|-------| | Endpoint ID | | | Template | GRACE-GPU v1.0 | | GPU | RTX 4090 | | Max Workers | 3 | | Idle Timeout | 5s | ### Módulos disponibles - ASR_ENGINE (Whisper Large V3) - OCR_CORE (GOT-OCR 2.0) - TTS (XTTS-v2) - FACE_VECTOR (InsightFace) - EMBEDDINGS (BGE-Large) - AVATAR_GEN (SDXL) ### Ejemplo de uso ```bash curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \ -H "Authorization: Bearer ${RUNPOD_API_KEY}" \ -d '{"input": {...}}' ``` EOF git add -A git commit -m "Documentar GRACE endpoint RunPod" GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" git push origin main ``` --- ## PASO 2.7: Integrar con DECK ### Crear cliente GRACE en DECK ```python # /opt/deck/grace_client.py import os import base64 import requests from typing import Dict, Any, Optional class GraceClient: def __init__(self): self.endpoint_id = os.getenv("GRACE_ENDPOINT_ID") self.api_key = os.getenv("RUNPOD_API_KEY") self.base_url = f"https://api.runpod.ai/v2/{self.endpoint_id}" def call(self, module: str, content: bytes, context: Dict = None) -> Dict[str, Any]: """Llamar a módulo GRACE""" payload = { "input": { "contract_version": "2.1", "routing": {"module": module}, "payload": { "type": self._get_type(module), "encoding": "base64", "content": base64.b64encode(content).decode() }, "context": context or {} } } response = requests.post( f"{self.base_url}/runsync", headers={"Authorization": f"Bearer {self.api_key}"}, json=payload, timeout=120 ) return response.json() def _get_type(self, module: str) -> str: types = { "ASR_ENGINE": "audio", "OCR_CORE": "image", "TTS": "text", "FACE_VECTOR": "image", "EMBEDDINGS": "text", "AVATAR_GEN": "text" } return types.get(module, "binary") def transcribe(self, audio_bytes: bytes, lang: str = "es") -> str: """Convenience method para ASR""" result = self.call("ASR_ENGINE", audio_bytes, {"lang": lang}) return result.get("output", {}).get("result", {}).get("data", {}).get("text", "") def ocr(self, image_bytes: bytes) -> str: """Convenience method para OCR""" result = self.call("OCR_CORE", image_bytes) return result.get("output", {}).get("result", {}).get("data", {}).get("text", "") def embed(self, text: str) -> list: """Convenience method para embeddings""" result = self.call("EMBEDDINGS", text.encode(), {}) return result.get("output", {}).get("result", {}).get("data", {}).get("vector", []) ``` --- ## CHECKLIST FINAL FASE 2 - [ ] 2.1 - Dockerfile preparado - [ ] 2.2 - Imagen subida a registro - [ ] 2.3 - Template creado en RunPod - [ ] 2.4 - Endpoint serverless configurado - [ ] 2.5 - Tests exitosos (ASR, OCR, etc.) - [ ] 2.6 - Credenciales documentadas - [ ] 2.7 - Cliente integrado en DECK --- ## MÉTRICAS DE ÉXITO | Métrica | Valor Esperado | |---------|----------------| | Cold start | < 60s | | Warm ASR (30s audio) | < 10s | | Warm OCR (imagen) | < 5s | | Disponibilidad | > 99% | --- ## COSTOS ESTIMADOS | Uso | GPU | Costo/hora | Estimado mensual | |-----|-----|------------|------------------| | Bajo (10 req/día) | RTX 4090 | $0.69 | ~$5-10 | | Medio (100 req/día) | RTX 4090 | $0.69 | ~$30-50 | | Alto (1000 req/día) | RTX 4090 | $0.69 | ~$100-200 | --- ## SIGUIENTE FASE Continuar con [FASE_3_FLUJO_EMPRESARIAL.md](FASE_3_FLUJO_EMPRESARIAL.md)