- ARCHITECTURE.md: Estado real de 23 repos - IMPLEMENTATION_PLAN.md: 7 fases de implementacion - PHASES/: Scripts detallados para cada fase Resultado de auditoria: - 5 repos implementados - 4 repos parciales - 14 repos solo documentacion
431 lines
9.8 KiB
Markdown
431 lines
9.8 KiB
Markdown
# FASE 2: PROCESAMIENTO IA
|
|
|
|
**Complejidad:** Compleja
|
|
**Duración estimada:** 3-5 días
|
|
**Prioridad:** ALTA
|
|
|
|
---
|
|
|
|
## OBJETIVO
|
|
|
|
Desplegar GRACE en RunPod para procesamiento de:
|
|
- ASR (Speech-to-Text)
|
|
- OCR (Imágenes a texto)
|
|
- TTS (Text-to-Speech)
|
|
- Embeddings (Vectorización semántica)
|
|
- Face Detection
|
|
- Avatar Generation
|
|
|
|
---
|
|
|
|
## PREREQUISITOS
|
|
|
|
- [x] FASE 1 completada
|
|
- [ ] Cuenta RunPod con créditos
|
|
- [ ] API Key de RunPod
|
|
- [ ] Docker Hub o registro privado para imágenes
|
|
|
|
---
|
|
|
|
## PASO 2.1: Preparar imagen Docker para RunPod
|
|
|
|
### Estructura del handler
|
|
|
|
El archivo `grace/runpod/handler.py` ya está implementado y soporta:
|
|
|
|
| Módulo | Modelo | VRAM |
|
|
|--------|--------|------|
|
|
| ASR_ENGINE | Faster Whisper Large V3 | ~4GB |
|
|
| OCR_CORE | GOT-OCR 2.0 | ~8GB |
|
|
| TTS | XTTS-v2 | ~4GB |
|
|
| FACE_VECTOR | InsightFace Buffalo L | ~2GB |
|
|
| EMBEDDINGS | BGE-Large | ~2GB |
|
|
| AVATAR_GEN | SDXL Base 1.0 | ~8GB |
|
|
|
|
### Dockerfile optimizado
|
|
|
|
```dockerfile
|
|
# grace/runpod/Dockerfile
|
|
FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
|
|
|
|
WORKDIR /app
|
|
|
|
# Variables de entorno
|
|
ENV PYTHONUNBUFFERED=1
|
|
ENV TRANSFORMERS_CACHE=/app/models
|
|
ENV HF_HOME=/app/models
|
|
|
|
# Dependencias del sistema
|
|
RUN apt-get update && apt-get install -y \
|
|
ffmpeg \
|
|
libsm6 \
|
|
libxext6 \
|
|
libgl1-mesa-glx \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Dependencias Python base
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Precargar modelos para reducir cold start
|
|
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cpu', compute_type='int8')"
|
|
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')"
|
|
|
|
# Código del handler
|
|
COPY handler.py .
|
|
|
|
# Iniciar handler de RunPod
|
|
CMD ["python", "-u", "handler.py"]
|
|
```
|
|
|
|
### requirements.txt
|
|
|
|
```
|
|
runpod>=1.3.0
|
|
torch>=2.1.0
|
|
transformers>=4.36.0
|
|
faster-whisper>=0.10.0
|
|
TTS>=0.22.0
|
|
sentence-transformers>=2.2.2
|
|
insightface>=0.7.3
|
|
onnxruntime-gpu>=1.16.0
|
|
diffusers>=0.24.0
|
|
accelerate>=0.25.0
|
|
safetensors>=0.4.0
|
|
Pillow>=10.0.0
|
|
opencv-python-headless>=4.8.0
|
|
numpy>=1.24.0
|
|
boto3>=1.34.0
|
|
```
|
|
|
|
---
|
|
|
|
## PASO 2.2: Construir y subir imagen
|
|
|
|
### Opción A: Docker Hub
|
|
|
|
```bash
|
|
# En máquina con Docker
|
|
cd grace/runpod
|
|
|
|
# Construir
|
|
docker build -t tzzr/grace-gpu:v1.0 .
|
|
|
|
# Login a Docker Hub
|
|
docker login
|
|
|
|
# Subir
|
|
docker push tzzr/grace-gpu:v1.0
|
|
```
|
|
|
|
### Opción B: RunPod Registry
|
|
|
|
```bash
|
|
# Usar la CLI de RunPod
|
|
runpodctl build --name grace-gpu --tag v1.0 .
|
|
```
|
|
|
|
---
|
|
|
|
## PASO 2.3: Crear template en RunPod
|
|
|
|
### Via Dashboard
|
|
|
|
1. Ir a RunPod → Templates → Create Template
|
|
2. Configurar:
|
|
- **Name**: GRACE-GPU
|
|
- **Container Image**: tzzr/grace-gpu:v1.0
|
|
- **Container Disk**: 50GB
|
|
- **Volume Disk**: 100GB (para modelos)
|
|
- **Volume Mount Path**: /app/models
|
|
- **Expose HTTP Ports**: 8000
|
|
- **Expose TCP Ports**: (vacío)
|
|
|
|
### Via API
|
|
|
|
```bash
|
|
RUNPOD_API_KEY="<tu_api_key>"
|
|
|
|
curl -X POST "https://api.runpod.io/graphql" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $RUNPOD_API_KEY" \
|
|
-d '{
|
|
"query": "mutation { saveTemplate(input: { name: \"GRACE-GPU\", imageName: \"tzzr/grace-gpu:v1.0\", dockerArgs: \"\", containerDiskInGb: 50, volumeInGb: 100, volumeMountPath: \"/app/models\", ports: \"8000/http\", isServerless: true }) { id name } }"
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## PASO 2.4: Crear endpoint serverless
|
|
|
|
### Via Dashboard
|
|
|
|
1. Ir a Serverless → Create Endpoint
|
|
2. Configurar:
|
|
- **Name**: GRACE-Endpoint
|
|
- **Template**: GRACE-GPU
|
|
- **GPU Type**: RTX 4090 (24GB VRAM)
|
|
- **Min Workers**: 0
|
|
- **Max Workers**: 3
|
|
- **Idle Timeout**: 5 segundos
|
|
- **Flash Boot**: Enabled
|
|
|
|
### Configuración recomendada por módulo
|
|
|
|
| Módulo | GPU Mínima | GPU Recomendada |
|
|
|--------|------------|-----------------|
|
|
| ASR_ENGINE | RTX 3080 (10GB) | RTX 4090 (24GB) |
|
|
| OCR_CORE | RTX 3090 (24GB) | RTX 4090 (24GB) |
|
|
| TTS | RTX 3080 (10GB) | RTX 4090 (24GB) |
|
|
| FACE_VECTOR | RTX 3060 (8GB) | RTX 4090 (24GB) |
|
|
| EMBEDDINGS | RTX 3060 (8GB) | RTX 4090 (24GB) |
|
|
| AVATAR_GEN | RTX 3090 (24GB) | RTX 4090 (24GB) |
|
|
|
|
---
|
|
|
|
## PASO 2.5: Probar endpoint
|
|
|
|
### Test ASR_ENGINE
|
|
|
|
```bash
|
|
ENDPOINT_ID="<tu_endpoint_id>"
|
|
RUNPOD_API_KEY="<tu_api_key>"
|
|
|
|
# Crear audio de prueba (o usar uno existente)
|
|
# ffmpeg -f lavfi -i "sine=frequency=440:duration=3" -ar 16000 test.wav
|
|
|
|
# Codificar en base64
|
|
AUDIO_B64=$(base64 -w0 test.wav)
|
|
|
|
# Enviar request
|
|
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $RUNPOD_API_KEY" \
|
|
-d '{
|
|
"input": {
|
|
"contract_version": "2.1",
|
|
"profile": "LITE",
|
|
"envelope": {
|
|
"trace_id": "test-asr-001"
|
|
},
|
|
"routing": {
|
|
"module": "ASR_ENGINE"
|
|
},
|
|
"payload": {
|
|
"type": "audio",
|
|
"encoding": "base64",
|
|
"content": "'$AUDIO_B64'"
|
|
},
|
|
"context": {
|
|
"lang": "es"
|
|
}
|
|
}
|
|
}'
|
|
```
|
|
|
|
### Respuesta esperada
|
|
|
|
```json
|
|
{
|
|
"id": "...",
|
|
"status": "COMPLETED",
|
|
"output": {
|
|
"contract_version": "2.1",
|
|
"status": {"code": "SUCCESS"},
|
|
"result": {
|
|
"schema": "asr_output_v1",
|
|
"data": {
|
|
"text": "...",
|
|
"language_detected": "es",
|
|
"duration_seconds": 3.0,
|
|
"segments": [...]
|
|
}
|
|
},
|
|
"quality": {
|
|
"confidence": 0.95
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Test OCR_CORE
|
|
|
|
```bash
|
|
# Imagen de prueba
|
|
IMAGE_B64=$(base64 -w0 test_image.png)
|
|
|
|
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
|
|
-H "Content-Type: application/json" \
|
|
-H "Authorization: Bearer $RUNPOD_API_KEY" \
|
|
-d '{
|
|
"input": {
|
|
"contract_version": "2.1",
|
|
"routing": {"module": "OCR_CORE"},
|
|
"payload": {
|
|
"type": "image",
|
|
"encoding": "base64",
|
|
"content": "'$IMAGE_B64'"
|
|
}
|
|
}
|
|
}'
|
|
```
|
|
|
|
---
|
|
|
|
## PASO 2.6: Documentar endpoint
|
|
|
|
### Guardar en credentials
|
|
|
|
```bash
|
|
# En ARCHITECT, actualizar repo credentials
|
|
cd /tmp && rm -rf credentials
|
|
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" \
|
|
git clone ssh://git@localhost:2222/tzzr/credentials.git
|
|
|
|
cd credentials
|
|
|
|
cat >> inventario/08-gpu-runpod.md << 'EOF'
|
|
|
|
## GRACE Endpoint (Actualizado 2025-12-24)
|
|
|
|
| Parámetro | Valor |
|
|
|-----------|-------|
|
|
| Endpoint ID | <endpoint_id> |
|
|
| Template | GRACE-GPU v1.0 |
|
|
| GPU | RTX 4090 |
|
|
| Max Workers | 3 |
|
|
| Idle Timeout | 5s |
|
|
|
|
### Módulos disponibles
|
|
|
|
- ASR_ENGINE (Whisper Large V3)
|
|
- OCR_CORE (GOT-OCR 2.0)
|
|
- TTS (XTTS-v2)
|
|
- FACE_VECTOR (InsightFace)
|
|
- EMBEDDINGS (BGE-Large)
|
|
- AVATAR_GEN (SDXL)
|
|
|
|
### Ejemplo de uso
|
|
|
|
```bash
|
|
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
|
|
-H "Authorization: Bearer ${RUNPOD_API_KEY}" \
|
|
-d '{"input": {...}}'
|
|
```
|
|
EOF
|
|
|
|
git add -A
|
|
git commit -m "Documentar GRACE endpoint RunPod"
|
|
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" git push origin main
|
|
```
|
|
|
|
---
|
|
|
|
## PASO 2.7: Integrar con DECK
|
|
|
|
### Crear cliente GRACE en DECK
|
|
|
|
```python
|
|
# /opt/deck/grace_client.py
|
|
|
|
import os
|
|
import base64
|
|
import requests
|
|
from typing import Dict, Any, Optional
|
|
|
|
class GraceClient:
|
|
def __init__(self):
|
|
self.endpoint_id = os.getenv("GRACE_ENDPOINT_ID")
|
|
self.api_key = os.getenv("RUNPOD_API_KEY")
|
|
self.base_url = f"https://api.runpod.ai/v2/{self.endpoint_id}"
|
|
|
|
def call(self, module: str, content: bytes, context: Dict = None) -> Dict[str, Any]:
|
|
"""Llamar a módulo GRACE"""
|
|
payload = {
|
|
"input": {
|
|
"contract_version": "2.1",
|
|
"routing": {"module": module},
|
|
"payload": {
|
|
"type": self._get_type(module),
|
|
"encoding": "base64",
|
|
"content": base64.b64encode(content).decode()
|
|
},
|
|
"context": context or {}
|
|
}
|
|
}
|
|
|
|
response = requests.post(
|
|
f"{self.base_url}/runsync",
|
|
headers={"Authorization": f"Bearer {self.api_key}"},
|
|
json=payload,
|
|
timeout=120
|
|
)
|
|
|
|
return response.json()
|
|
|
|
def _get_type(self, module: str) -> str:
|
|
types = {
|
|
"ASR_ENGINE": "audio",
|
|
"OCR_CORE": "image",
|
|
"TTS": "text",
|
|
"FACE_VECTOR": "image",
|
|
"EMBEDDINGS": "text",
|
|
"AVATAR_GEN": "text"
|
|
}
|
|
return types.get(module, "binary")
|
|
|
|
def transcribe(self, audio_bytes: bytes, lang: str = "es") -> str:
|
|
"""Convenience method para ASR"""
|
|
result = self.call("ASR_ENGINE", audio_bytes, {"lang": lang})
|
|
return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")
|
|
|
|
def ocr(self, image_bytes: bytes) -> str:
|
|
"""Convenience method para OCR"""
|
|
result = self.call("OCR_CORE", image_bytes)
|
|
return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")
|
|
|
|
def embed(self, text: str) -> list:
|
|
"""Convenience method para embeddings"""
|
|
result = self.call("EMBEDDINGS", text.encode(), {})
|
|
return result.get("output", {}).get("result", {}).get("data", {}).get("vector", [])
|
|
```
|
|
|
|
---
|
|
|
|
## CHECKLIST FINAL FASE 2
|
|
|
|
- [ ] 2.1 - Dockerfile preparado
|
|
- [ ] 2.2 - Imagen subida a registro
|
|
- [ ] 2.3 - Template creado en RunPod
|
|
- [ ] 2.4 - Endpoint serverless configurado
|
|
- [ ] 2.5 - Tests exitosos (ASR, OCR, etc.)
|
|
- [ ] 2.6 - Credenciales documentadas
|
|
- [ ] 2.7 - Cliente integrado en DECK
|
|
|
|
---
|
|
|
|
## MÉTRICAS DE ÉXITO
|
|
|
|
| Métrica | Valor Esperado |
|
|
|---------|----------------|
|
|
| Cold start | < 60s |
|
|
| Warm ASR (30s audio) | < 10s |
|
|
| Warm OCR (imagen) | < 5s |
|
|
| Disponibilidad | > 99% |
|
|
|
|
---
|
|
|
|
## COSTOS ESTIMADOS
|
|
|
|
| Uso | GPU | Costo/hora | Estimado mensual |
|
|
|-----|-----|------------|------------------|
|
|
| Bajo (10 req/día) | RTX 4090 | $0.69 | ~$5-10 |
|
|
| Medio (100 req/día) | RTX 4090 | $0.69 | ~$30-50 |
|
|
| Alto (1000 req/día) | RTX 4090 | $0.69 | ~$100-200 |
|
|
|
|
---
|
|
|
|
## SIGUIENTE FASE
|
|
|
|
Continuar con [FASE_3_FLUJO_EMPRESARIAL.md](FASE_3_FLUJO_EMPRESARIAL.md)
|