Files
system-plan/PHASES/FASE_2_PROCESAMIENTO_IA.md
ARCHITECT 73ae91d337 Auditoria completa y plan de implementacion TZZR
- ARCHITECTURE.md: Estado real de 23 repos
- IMPLEMENTATION_PLAN.md: 7 fases de implementacion
- PHASES/: Scripts detallados para cada fase

Resultado de auditoria:
- 5 repos implementados
- 4 repos parciales
- 14 repos solo documentacion
2025-12-24 08:59:14 +00:00

9.8 KiB

FASE 2: PROCESAMIENTO IA

Complejidad: Compleja Duración estimada: 3-5 días Prioridad: ALTA


OBJETIVO

Desplegar GRACE en RunPod para procesamiento de:

  • ASR (Speech-to-Text)
  • OCR (Imágenes a texto)
  • TTS (Text-to-Speech)
  • Embeddings (Vectorización semántica)
  • Face Detection
  • Avatar Generation

PREREQUISITOS

  • FASE 1 completada
  • Cuenta RunPod con créditos
  • API Key de RunPod
  • Docker Hub o registro privado para imágenes

PASO 2.1: Preparar imagen Docker para RunPod

Estructura del handler

El archivo grace/runpod/handler.py ya está implementado y soporta:

Módulo Modelo VRAM
ASR_ENGINE Faster Whisper Large V3 ~4GB
OCR_CORE GOT-OCR 2.0 ~8GB
TTS XTTS-v2 ~4GB
FACE_VECTOR InsightFace Buffalo L ~2GB
EMBEDDINGS BGE-Large ~2GB
AVATAR_GEN SDXL Base 1.0 ~8GB

Dockerfile optimizado

# grace/runpod/Dockerfile
FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04

WORKDIR /app

# Variables de entorno
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/app/models
ENV HF_HOME=/app/models

# Dependencias del sistema
RUN apt-get update && apt-get install -y \
    ffmpeg \
    libsm6 \
    libxext6 \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Dependencias Python base
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Precargar modelos para reducir cold start
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('large-v3', device='cpu', compute_type='int8')"
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('BAAI/bge-large-en-v1.5')"

# Código del handler
COPY handler.py .

# Iniciar handler de RunPod
CMD ["python", "-u", "handler.py"]

requirements.txt

runpod>=1.3.0
torch>=2.1.0
transformers>=4.36.0
faster-whisper>=0.10.0
TTS>=0.22.0
sentence-transformers>=2.2.2
insightface>=0.7.3
onnxruntime-gpu>=1.16.0
diffusers>=0.24.0
accelerate>=0.25.0
safetensors>=0.4.0
Pillow>=10.0.0
opencv-python-headless>=4.8.0
numpy>=1.24.0
boto3>=1.34.0

PASO 2.2: Construir y subir imagen

Opción A: Docker Hub

# En máquina con Docker
cd grace/runpod

# Construir
docker build -t tzzr/grace-gpu:v1.0 .

# Login a Docker Hub
docker login

# Subir
docker push tzzr/grace-gpu:v1.0

Opción B: RunPod Registry

# Usar la CLI de RunPod
runpodctl build --name grace-gpu --tag v1.0 .

PASO 2.3: Crear template en RunPod

Via Dashboard

  1. Ir a RunPod → Templates → Create Template
  2. Configurar:
    • Name: GRACE-GPU
    • Container Image: tzzr/grace-gpu:v1.0
    • Container Disk: 50GB
    • Volume Disk: 100GB (para modelos)
    • Volume Mount Path: /app/models
    • Expose HTTP Ports: 8000
    • Expose TCP Ports: (vacío)

Via API

RUNPOD_API_KEY="<tu_api_key>"

curl -X POST "https://api.runpod.io/graphql" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "query": "mutation { saveTemplate(input: { name: \"GRACE-GPU\", imageName: \"tzzr/grace-gpu:v1.0\", dockerArgs: \"\", containerDiskInGb: 50, volumeInGb: 100, volumeMountPath: \"/app/models\", ports: \"8000/http\", isServerless: true }) { id name } }"
  }'

PASO 2.4: Crear endpoint serverless

Via Dashboard

  1. Ir a Serverless → Create Endpoint
  2. Configurar:
    • Name: GRACE-Endpoint
    • Template: GRACE-GPU
    • GPU Type: RTX 4090 (24GB VRAM)
    • Min Workers: 0
    • Max Workers: 3
    • Idle Timeout: 5 segundos
    • Flash Boot: Enabled

Configuración recomendada por módulo

Módulo GPU Mínima GPU Recomendada
ASR_ENGINE RTX 3080 (10GB) RTX 4090 (24GB)
OCR_CORE RTX 3090 (24GB) RTX 4090 (24GB)
TTS RTX 3080 (10GB) RTX 4090 (24GB)
FACE_VECTOR RTX 3060 (8GB) RTX 4090 (24GB)
EMBEDDINGS RTX 3060 (8GB) RTX 4090 (24GB)
AVATAR_GEN RTX 3090 (24GB) RTX 4090 (24GB)

PASO 2.5: Probar endpoint

Test ASR_ENGINE

ENDPOINT_ID="<tu_endpoint_id>"
RUNPOD_API_KEY="<tu_api_key>"

# Crear audio de prueba (o usar uno existente)
# ffmpeg -f lavfi -i "sine=frequency=440:duration=3" -ar 16000 test.wav

# Codificar en base64
AUDIO_B64=$(base64 -w0 test.wav)

# Enviar request
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "input": {
      "contract_version": "2.1",
      "profile": "LITE",
      "envelope": {
        "trace_id": "test-asr-001"
      },
      "routing": {
        "module": "ASR_ENGINE"
      },
      "payload": {
        "type": "audio",
        "encoding": "base64",
        "content": "'$AUDIO_B64'"
      },
      "context": {
        "lang": "es"
      }
    }
  }'

Respuesta esperada

{
  "id": "...",
  "status": "COMPLETED",
  "output": {
    "contract_version": "2.1",
    "status": {"code": "SUCCESS"},
    "result": {
      "schema": "asr_output_v1",
      "data": {
        "text": "...",
        "language_detected": "es",
        "duration_seconds": 3.0,
        "segments": [...]
      }
    },
    "quality": {
      "confidence": 0.95
    }
  }
}

Test OCR_CORE

# Imagen de prueba
IMAGE_B64=$(base64 -w0 test_image.png)

curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -d '{
    "input": {
      "contract_version": "2.1",
      "routing": {"module": "OCR_CORE"},
      "payload": {
        "type": "image",
        "encoding": "base64",
        "content": "'$IMAGE_B64'"
      }
    }
  }'

PASO 2.6: Documentar endpoint

Guardar en credentials

# En ARCHITECT, actualizar repo credentials
cd /tmp && rm -rf credentials
GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" \
  git clone ssh://git@localhost:2222/tzzr/credentials.git

cd credentials

cat >> inventario/08-gpu-runpod.md << 'EOF'

## GRACE Endpoint (Actualizado 2025-12-24)

| Parámetro | Valor |
|-----------|-------|
| Endpoint ID | <endpoint_id> |
| Template | GRACE-GPU v1.0 |
| GPU | RTX 4090 |
| Max Workers | 3 |
| Idle Timeout | 5s |

### Módulos disponibles

- ASR_ENGINE (Whisper Large V3)
- OCR_CORE (GOT-OCR 2.0)
- TTS (XTTS-v2)
- FACE_VECTOR (InsightFace)
- EMBEDDINGS (BGE-Large)
- AVATAR_GEN (SDXL)

### Ejemplo de uso

```bash
curl -X POST "https://api.runpod.ai/v2/${ENDPOINT_ID}/runsync" \
  -H "Authorization: Bearer ${RUNPOD_API_KEY}" \
  -d '{"input": {...}}'

EOF

git add -A git commit -m "Documentar GRACE endpoint RunPod" GIT_SSH_COMMAND="ssh -i /home/orchestrator/.ssh/tzzr -p 2222" git push origin main


---

## PASO 2.7: Integrar con DECK

### Crear cliente GRACE en DECK

```python
# /opt/deck/grace_client.py

import os
import base64
import requests
from typing import Dict, Any, Optional

class GraceClient:
    def __init__(self):
        self.endpoint_id = os.getenv("GRACE_ENDPOINT_ID")
        self.api_key = os.getenv("RUNPOD_API_KEY")
        self.base_url = f"https://api.runpod.ai/v2/{self.endpoint_id}"

    def call(self, module: str, content: bytes, context: Dict = None) -> Dict[str, Any]:
        """Llamar a módulo GRACE"""
        payload = {
            "input": {
                "contract_version": "2.1",
                "routing": {"module": module},
                "payload": {
                    "type": self._get_type(module),
                    "encoding": "base64",
                    "content": base64.b64encode(content).decode()
                },
                "context": context or {}
            }
        }

        response = requests.post(
            f"{self.base_url}/runsync",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json=payload,
            timeout=120
        )

        return response.json()

    def _get_type(self, module: str) -> str:
        types = {
            "ASR_ENGINE": "audio",
            "OCR_CORE": "image",
            "TTS": "text",
            "FACE_VECTOR": "image",
            "EMBEDDINGS": "text",
            "AVATAR_GEN": "text"
        }
        return types.get(module, "binary")

    def transcribe(self, audio_bytes: bytes, lang: str = "es") -> str:
        """Convenience method para ASR"""
        result = self.call("ASR_ENGINE", audio_bytes, {"lang": lang})
        return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")

    def ocr(self, image_bytes: bytes) -> str:
        """Convenience method para OCR"""
        result = self.call("OCR_CORE", image_bytes)
        return result.get("output", {}).get("result", {}).get("data", {}).get("text", "")

    def embed(self, text: str) -> list:
        """Convenience method para embeddings"""
        result = self.call("EMBEDDINGS", text.encode(), {})
        return result.get("output", {}).get("result", {}).get("data", {}).get("vector", [])

CHECKLIST FINAL FASE 2

  • 2.1 - Dockerfile preparado
  • 2.2 - Imagen subida a registro
  • 2.3 - Template creado en RunPod
  • 2.4 - Endpoint serverless configurado
  • 2.5 - Tests exitosos (ASR, OCR, etc.)
  • 2.6 - Credenciales documentadas
  • 2.7 - Cliente integrado en DECK

MÉTRICAS DE ÉXITO

Métrica Valor Esperado
Cold start < 60s
Warm ASR (30s audio) < 10s
Warm OCR (imagen) < 5s
Disponibilidad > 99%

COSTOS ESTIMADOS

Uso GPU Costo/hora Estimado mensual
Bajo (10 req/día) RTX 4090 $0.69 ~$5-10
Medio (100 req/día) RTX 4090 $0.69 ~$30-50
Alto (1000 req/día) RTX 4090 $0.69 ~$100-200

SIGUIENTE FASE

Continuar con FASE_3_FLUJO_EMPRESARIAL.md