- ARCHITECTURE.md: Estado real de 23 repos - IMPLEMENTATION_PLAN.md: 7 fases de implementacion - PHASES/: Scripts detallados para cada fase Resultado de auditoria: - 5 repos implementados - 4 repos parciales - 14 repos solo documentacion
9.8 KiB
9.8 KiB
FASE 2: PROCESAMIENTO IA
Complejidad: Compleja Duración estimada: 3-5 días Prioridad: ALTA
OBJETIVO
Desplegar GRACE en RunPod para procesamiento de:
- ASR (Speech-to-Text)
- OCR (Imágenes a texto)
- TTS (Text-to-Speech)
- Embeddings (Vectorización semántica)
- Face Detection
- Avatar Generation
PREREQUISITOS
- FASE 1 completada
- Cuenta RunPod con créditos
- API Key de RunPod
- Docker Hub o registro privado para imágenes
PASO 2.1: Preparar imagen Docker para RunPod
Estructura del handler
El archivo grace/runpod/handler.py ya está implementado y soporta:
| Módulo | Modelo | VRAM |
|---|---|---|
| ASR_ENGINE | Faster Whisper Large V3 | ~4GB |
| OCR_CORE | GOT-OCR 2.0 | ~8GB |
| TTS | XTTS-v2 | ~4GB |
| FACE_VECTOR | InsightFace Buffalo L | ~2GB |
| EMBEDDINGS | BGE-Large | ~2GB |
| AVATAR_GEN | SDXL Base 1.0 | ~8GB |
Dockerfile optimizado
# grace/runpod/Dockerfile
FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
WORKDIR /app
# Variables de entorno
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/models
ENV HF_HOME=/app/models
# Dependencias del sistema
RUN apt-get update && apt-get install -y \
ffmpeg \
libsm6 \
libxext6 \
libgl1-mesa-glx \
&& rm -rf /var/lib/apt/lists/*
# Dependencias Python base
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Precargar modelos para reducir cold start
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cpu', compute_type='int8')"
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')"
# Código del handler
COPY handler.py .
# Iniciar handler de RunPod
CMD ["python", "-u", "handler.py"]
requirements.txt
runpod>=1.3.0
torch>=2.1.0
transformers>=4.36.0
faster-whisper>=0.10.0
TTS>=0.22.0
sentence-transformers>=2.2.2
insightface>=0.7.3
onnxruntime-gpu>=1.16.0
diffusers>=0.24.0
accelerate>=0.25.0
safetensors>=0.4.0
Pillow>=10.0.0
opencv-python-headless>=4.8.0
numpy>=1.24.0
boto3>=1.34.0
PASO 2.2: Construir y subir imagen
Opción A: Docker Hub
# En máquina con Docker
cd grace/runpod
# Construir
docker build -t tzzr/grace-gpu:v1.0 .
# Login a Docker Hub
docker login
# Subir
docker push tzzr/grace-gpu:v1.0
Opción B: RunPod Registry
# Usar la CLI de RunPod
runpodctl build --name grace-gpu --tag v1.0 .
PASO 2.3: Crear template en RunPod
Via Dashboard
- Ir a RunPod → Templates → Create Template
- Configurar:
- Name: GRACE-GPU
- Container Image: tzzr/grace-gpu:v1.0
- Container Disk: 50GB
- Volume Disk: 100GB (para modelos)
- Volume Mount Path: /app/models
- Expose HTTP Ports: 8000
- Expose TCP Ports: (vacío)
Via API
RUNPOD_API_KEY="<tu_api_key>"
curl -X POST "https://api.runpod.io/graphql" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-d '{
"query": "mutation { saveTemplate(input: { name: \"GRACE-GPU\", imageName: \"tzzr/grace-gpu:v1.0\", dockerArgs: \"\", containerDiskInGb: 50, volumeInGb: 100, volumeMountPath: \"/app/models\", ports: \"8000/http\", isServerless: true }) { id name } }"
}'
PASO 2.4: Crear endpoint serverless
Via Dashboard
- Ir a Serverless → Create Endpoint
- Configurar:
- Name: GRACE-Endpoint
- Template: GRACE-GPU
- GPU Type: RTX 4090 (24GB VRAM)
- Min Workers: 0
- Max Workers: 3
- Idle Timeout: 5 segundos
- Flash Boot: Enabled
Configuración recomendada por módulo
| Módulo | GPU Mínima | GPU Recomendada |
|---|---|---|
| ASR_ENGINE | RTX 3080 (10GB) | RTX 4090 (24GB) |
| OCR_CORE | RTX 3090 (24GB) | RTX 4090 (24GB) |
| TTS | RTX 3080 (10GB) | RTX 4090 (24GB) |
| FACE_VECTOR | RTX 3060 (8GB) | RTX 4090 (24GB) |
| EMBEDDINGS | RTX 3060 (8GB) | RTX 4090 (24GB) |
| AVATAR_GEN | RTX 3090 (24GB) | RTX 4090 (24GB) |
PASO 2.5: Probar endpoint
Test ASR_ENGINE
ENDPOINT_ID="<tu_endpoint_id>"
RUNPOD_API_KEY="<tu_api_key>"
# Crear audio de prueba (o usar uno existente)
# ffmpeg -f lavfi -i "sine=frequency=440:duration=3" -ar 16000 test.wav
# Codificar en base64
AUDIO_B64=$(base64 -w0 test.wav)
# Enviar request
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-d '{
"input": {
"contract_version": "2.1",
"profile": "LITE",
"envelope": {
"trace_id": "test-asr-001"
},
"routing": {
"module": "ASR_ENGINE"
},
"payload": {
"type": "audio",
"encoding": "base64",
"content": "'$AUDIO_B64'"
},
"context": {
"lang": "es"
}
}
}'
Respuesta esperada
{
"id": "...",
"status": "COMPLETED",
"output": {
"contract_version": "2.1",
"status": {"code": "SUCCESS"},
"result": {
"schema": "asr_output_v1",
"data": {
"text": "...",
"language_detected": "es",
"duration_seconds": 3.0,
"segments": [...]
}
},
"quality": {
"confidence": 0.95
}
}
}
Test OCR_CORE
# Imagen de prueba
IMAGE_B64=$(base64 -w0 test_image.png)
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-d '{
"input": {
"contract_version": "2.1",
"routing": {"module": "OCR_CORE"},
"payload": {
"type": "image",
"encoding": "base64",
"content": "'$IMAGE_B64'"
}
}
}'
PASO 2.6: Documentar endpoint
Guardar en credentials
# En ARCHITECT, actualizar repo credentials
cd /tmp && rm -rf credentials
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" \
git clone ssh://git@localhost:2222/tzzr/credentials.git
cd credentials
cat >> inventario/08-gpu-runpod.md << 'EOF'
## GRACE Endpoint (Actualizado 2025-12-24)
| Parámetro | Valor |
|-----------|-------|
| Endpoint ID | <endpoint_id> |
| Template | GRACE-GPU v1.0 |
| GPU | RTX 4090 |
| Max Workers | 3 |
| Idle Timeout | 5s |
### Módulos disponibles
- ASR_ENGINE (Whisper Large V3)
- OCR_CORE (GOT-OCR 2.0)
- TTS (XTTS-v2)
- FACE_VECTOR (InsightFace)
- EMBEDDINGS (BGE-Large)
- AVATAR_GEN (SDXL)
### Ejemplo de uso
```bash
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
-H "Authorization: Bearer ${RUNPOD_API_KEY}" \
-d '{"input": {...}}'
EOF
git add -A git commit -m "Documentar GRACE endpoint RunPod" GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" git push origin main
---
## PASO 2.7: Integrar con DECK
### Crear cliente GRACE en DECK
```python
# /opt/deck/grace_client.py
import os
import base64
import requests
from typing import Dict, Any, Optional
class GraceClient:
def __init__(self):
self.endpoint_id = os.getenv("GRACE_ENDPOINT_ID")
self.api_key = os.getenv("RUNPOD_API_KEY")
self.base_url = f"https://api.runpod.ai/v2/{self.endpoint_id}"
def call(self, module: str, content: bytes, context: Dict = None) -> Dict[str, Any]:
"""Llamar a módulo GRACE"""
payload = {
"input": {
"contract_version": "2.1",
"routing": {"module": module},
"payload": {
"type": self._get_type(module),
"encoding": "base64",
"content": base64.b64encode(content).decode()
},
"context": context or {}
}
}
response = requests.post(
f"{self.base_url}/runsync",
headers={"Authorization": f"Bearer {self.api_key}"},
json=payload,
timeout=120
)
return response.json()
def _get_type(self, module: str) -> str:
types = {
"ASR_ENGINE": "audio",
"OCR_CORE": "image",
"TTS": "text",
"FACE_VECTOR": "image",
"EMBEDDINGS": "text",
"AVATAR_GEN": "text"
}
return types.get(module, "binary")
def transcribe(self, audio_bytes: bytes, lang: str = "es") -> str:
"""Convenience method para ASR"""
result = self.call("ASR_ENGINE", audio_bytes, {"lang": lang})
return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")
def ocr(self, image_bytes: bytes) -> str:
"""Convenience method para OCR"""
result = self.call("OCR_CORE", image_bytes)
return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")
def embed(self, text: str) -> list:
"""Convenience method para embeddings"""
result = self.call("EMBEDDINGS", text.encode(), {})
return result.get("output", {}).get("result", {}).get("data", {}).get("vector", [])
CHECKLIST FINAL FASE 2
- 2.1 - Dockerfile preparado
- 2.2 - Imagen subida a registro
- 2.3 - Template creado en RunPod
- 2.4 - Endpoint serverless configurado
- 2.5 - Tests exitosos (ASR, OCR, etc.)
- 2.6 - Credenciales documentadas
- 2.7 - Cliente integrado en DECK
MÉTRICAS DE ÉXITO
| Métrica | Valor Esperado |
|---|---|
| Cold start | < 60s |
| Warm ASR (30s audio) | < 10s |
| Warm OCR (imagen) | < 5s |
| Disponibilidad | > 99% |
COSTOS ESTIMADOS
| Uso | GPU | Costo/hora | Estimado mensual |
|---|---|---|---|
| Bajo (10 req/día) | RTX 4090 | $0.69 | ~$5-10 |
| Medio (100 req/día) | RTX 4090 | $0.69 | ~$30-50 |
| Alto (1000 req/día) | RTX 4090 | $0.69 | ~$100-200 |
SIGUIENTE FASE
Continuar con FASE_3_FLUJO_EMPRESARIAL.md