Comprehensive documentation for TZZR system v5 including: - 00_VISION: Glossary and foundational philosophy - 01_ARQUITECTURA: System overview and server specs - 02_MODELO_DATOS: Entity definitions and data planes (T0, MST, BCK) - 03_COMPONENTES: Agent docs (CLARA, MARGARET, FELDMAN, GRACE) - 04_SEGURIDAD: Threat model and secrets management - 05_OPERACIONES: Infrastructure and backup/recovery - 06_INTEGRACIONES: GPU services (RunPod status: blocked) - 99_ANEXOS: Repository inventory (24 repos) Key findings documented: - CRITICAL: UFW inactive on CORP/HST - CRITICAL: PostgreSQL 5432 exposed - CRITICAL: .env files with 644 permissions - RunPod workers not starting (code ready in R2) - Infisical designated as single source of secrets (D-001) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
360 lines
7.7 KiB
Markdown
360 lines
7.7 KiB
Markdown
# Backup y Recovery TZZR
|
|
|
|
**Versión:** 5.0
|
|
**Fecha:** 2024-12-24
|
|
|
|
---
|
|
|
|
## Estado Actual
|
|
|
|
### Backups Existentes
|
|
|
|
| Sistema | Backup | Destino | Frecuencia | Estado |
|
|
|---------|--------|---------|------------|--------|
|
|
| Gitea | Sí | R2 | Manual | Operativo |
|
|
| PostgreSQL ARCHITECT | No | - | - | **CRÍTICO** |
|
|
| PostgreSQL DECK | No | - | - | **CRÍTICO** |
|
|
| PostgreSQL CORP | No | - | - | **CRÍTICO** |
|
|
| PostgreSQL HST | No | - | - | **CRÍTICO** |
|
|
| R2 buckets | Built-in | R2 | Automático | Operativo |
|
|
|
|
---
|
|
|
|
## Plan de Backup Propuesto
|
|
|
|
### PostgreSQL - Backup Diario
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /opt/scripts/backup_postgres.sh
|
|
|
|
set -e
|
|
|
|
DATE=$(date +%F)
|
|
BACKUP_DIR="/tmp/pg_backup"
|
|
|
|
# Cargar credenciales R2
|
|
source /home/orchestrator/orchestrator/.env
|
|
export AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY"
|
|
export AWS_SECRET_ACCESS_KEY="$R2_SECRET_KEY"
|
|
|
|
R2_ENDPOINT="https://7dedae6030f5554d99d37e98a5232996.r2.cloudflarestorage.com"
|
|
|
|
mkdir -p $BACKUP_DIR
|
|
|
|
# Backup ARCHITECT
|
|
echo "Backing up ARCHITECT..."
|
|
sudo -u postgres pg_dump architect | gzip > $BACKUP_DIR/architect_$DATE.sql.gz
|
|
aws s3 cp $BACKUP_DIR/architect_$DATE.sql.gz s3://architect/backups/postgres/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
|
|
# Cleanup local
|
|
rm -rf $BACKUP_DIR
|
|
|
|
echo "Backup completado: $DATE"
|
|
```
|
|
|
|
### Cron Configuration
|
|
|
|
```bash
|
|
# /etc/cron.d/tzzr-backup
|
|
# Backup diario a las 3:00 AM
|
|
0 3 * * * orchestrator /opt/scripts/backup_postgres.sh >> /var/log/tzzr-backup.log 2>&1
|
|
```
|
|
|
|
---
|
|
|
|
## Backup por Servidor
|
|
|
|
### ARCHITECT (69.62.126.110)
|
|
|
|
```bash
|
|
# Base de datos: architect
|
|
sudo -u postgres pg_dump architect | gzip > architect_$(date +%F).sql.gz
|
|
|
|
# Subir a R2
|
|
aws s3 cp architect_$(date +%F).sql.gz s3://architect/backups/postgres/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
```
|
|
|
|
### DECK (72.62.1.113)
|
|
|
|
```bash
|
|
# Base de datos: tzzr
|
|
ssh deck 'sudo -u postgres pg_dump tzzr | gzip' > deck_tzzr_$(date +%F).sql.gz
|
|
|
|
# Subir a R2
|
|
aws s3 cp deck_tzzr_$(date +%F).sql.gz s3://architect/backups/deck/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
```
|
|
|
|
### CORP (92.112.181.188)
|
|
|
|
```bash
|
|
# Base de datos: corp
|
|
ssh corp 'sudo -u postgres pg_dump corp | gzip' > corp_$(date +%F).sql.gz
|
|
|
|
# Subir a R2
|
|
aws s3 cp corp_$(date +%F).sql.gz s3://architect/backups/corp/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
```
|
|
|
|
### HST (72.62.2.84)
|
|
|
|
```bash
|
|
# Base de datos: hst_images
|
|
ssh hst 'sudo -u postgres pg_dump hst_images | gzip' > hst_$(date +%F).sql.gz
|
|
|
|
# Subir a R2
|
|
aws s3 cp hst_$(date +%F).sql.gz s3://architect/backups/hst/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
```
|
|
|
|
---
|
|
|
|
## Gitea Backup
|
|
|
|
### Backup Manual
|
|
|
|
```bash
|
|
# En ARCHITECT
|
|
docker exec -t gitea bash -c 'gitea dump -c /data/gitea/conf/app.ini'
|
|
docker cp gitea:/app/gitea/gitea-dump-*.zip ./
|
|
|
|
# Subir a R2
|
|
aws s3 cp gitea-dump-*.zip s3://architect/backups/gitea/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
```
|
|
|
|
### Backup Automático
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /opt/scripts/backup_gitea.sh
|
|
|
|
DATE=$(date +%F_%H%M)
|
|
|
|
# Crear dump
|
|
docker exec -t gitea bash -c "gitea dump -c /data/gitea/conf/app.ini -f /tmp/gitea-dump-$DATE.zip"
|
|
|
|
# Copiar fuera del container
|
|
docker cp gitea:/tmp/gitea-dump-$DATE.zip /tmp/
|
|
|
|
# Subir a R2
|
|
source /home/orchestrator/orchestrator/.env
|
|
export AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY"
|
|
export AWS_SECRET_ACCESS_KEY="$R2_SECRET_KEY"
|
|
|
|
aws s3 cp /tmp/gitea-dump-$DATE.zip s3://architect/backups/gitea/ \
|
|
--endpoint-url https://7dedae6030f5554d99d37e98a5232996.r2.cloudflarestorage.com
|
|
|
|
# Cleanup
|
|
rm /tmp/gitea-dump-$DATE.zip
|
|
docker exec gitea rm /tmp/gitea-dump-$DATE.zip
|
|
```
|
|
|
|
---
|
|
|
|
## Estructura de Backups en R2
|
|
|
|
```
|
|
s3://architect/backups/
|
|
├── postgres/
|
|
│ ├── architect_2024-12-24.sql.gz
|
|
│ ├── architect_2024-12-23.sql.gz
|
|
│ └── ...
|
|
├── deck/
|
|
│ ├── deck_tzzr_2024-12-24.sql.gz
|
|
│ └── ...
|
|
├── corp/
|
|
│ ├── corp_2024-12-24.sql.gz
|
|
│ └── ...
|
|
├── hst/
|
|
│ ├── hst_2024-12-24.sql.gz
|
|
│ └── ...
|
|
└── gitea/
|
|
├── gitea-dump-2024-12-24_0300.zip
|
|
└── ...
|
|
```
|
|
|
|
---
|
|
|
|
## Retención de Backups
|
|
|
|
### Política Propuesta
|
|
|
|
| Tipo | Retención | Notas |
|
|
|------|-----------|-------|
|
|
| Diario | 7 días | Últimos 7 backups |
|
|
| Semanal | 4 semanas | Domingos |
|
|
| Mensual | 12 meses | Primer día del mes |
|
|
|
|
### Script de Limpieza
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /opt/scripts/cleanup_backups.sh
|
|
|
|
source /home/orchestrator/orchestrator/.env
|
|
export AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY"
|
|
export AWS_SECRET_ACCESS_KEY="$R2_SECRET_KEY"
|
|
|
|
R2_ENDPOINT="https://7dedae6030f5554d99d37e98a5232996.r2.cloudflarestorage.com"
|
|
|
|
# Eliminar backups más antiguos de 30 días
|
|
# (Implementar con lifecycle rules de R2 preferentemente)
|
|
|
|
aws s3 ls s3://architect/backups/postgres/ --endpoint-url $R2_ENDPOINT | \
|
|
while read -r line; do
|
|
createDate=$(echo $line | awk '{print $1}')
|
|
fileName=$(echo $line | awk '{print $4}')
|
|
|
|
# Comparar fechas y eliminar si > 30 días
|
|
# ...
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## Recovery
|
|
|
|
### PostgreSQL - Restaurar Base de Datos
|
|
|
|
```bash
|
|
# Descargar backup
|
|
aws s3 cp s3://architect/backups/postgres/architect_2024-12-24.sql.gz . \
|
|
--endpoint-url $R2_ENDPOINT
|
|
|
|
# Descomprimir
|
|
gunzip architect_2024-12-24.sql.gz
|
|
|
|
# Restaurar (¡CUIDADO! Sobrescribe datos existentes)
|
|
sudo -u postgres psql -d architect < architect_2024-12-24.sql
|
|
```
|
|
|
|
### PostgreSQL - Restaurar a Nueva Base
|
|
|
|
```bash
|
|
# Crear nueva base de datos
|
|
sudo -u postgres createdb architect_restored
|
|
|
|
# Restaurar
|
|
gunzip -c architect_2024-12-24.sql.gz | sudo -u postgres psql -d architect_restored
|
|
```
|
|
|
|
### Gitea - Restaurar
|
|
|
|
```bash
|
|
# Descargar backup
|
|
aws s3 cp s3://architect/backups/gitea/gitea-dump-2024-12-24_0300.zip . \
|
|
--endpoint-url $R2_ENDPOINT
|
|
|
|
# Detener Gitea
|
|
docker stop gitea
|
|
|
|
# Copiar al container
|
|
docker cp gitea-dump-2024-12-24_0300.zip gitea:/tmp/
|
|
|
|
# Restaurar
|
|
docker exec gitea bash -c "cd /tmp && unzip gitea-dump-2024-12-24_0300.zip"
|
|
# Seguir instrucciones de Gitea para restore
|
|
|
|
# Iniciar Gitea
|
|
docker start gitea
|
|
```
|
|
|
|
---
|
|
|
|
## Disaster Recovery Plan
|
|
|
|
### Escenario 1: Pérdida de ARCHITECT
|
|
|
|
1. Provisionar nuevo VPS con misma IP (si posible)
|
|
2. Instalar Ubuntu 22.04
|
|
3. Configurar usuario orchestrator
|
|
4. Restaurar PostgreSQL desde R2
|
|
5. Restaurar Gitea desde R2
|
|
6. Reinstalar Docker y servicios
|
|
7. Verificar conectividad con DECK/CORP/HST
|
|
|
|
### Escenario 2: Pérdida de DECK
|
|
|
|
1. Provisionar nuevo VPS
|
|
2. Restaurar PostgreSQL (tzzr) desde backup
|
|
3. Reinstalar CLARA, ALFRED
|
|
4. Reinstalar Mailcow (requiere backup separado)
|
|
5. Actualizar DNS si IP cambió
|
|
|
|
### Escenario 3: Pérdida de CORP
|
|
|
|
1. Provisionar nuevo VPS
|
|
2. Restaurar PostgreSQL (corp) desde backup
|
|
3. Reinstalar MARGARET, JARED, MASON, FELDMAN
|
|
4. Reinstalar Odoo, Nextcloud
|
|
5. Activar UFW (nuevo servidor)
|
|
|
|
### Escenario 4: Pérdida de R2
|
|
|
|
**IMPROBABLE** - Cloudflare tiene redundancia multi-región.
|
|
|
|
Mitigación: Backup mensual a segundo proveedor (AWS S3 Glacier).
|
|
|
|
---
|
|
|
|
## Verificación de Backups
|
|
|
|
### Test Mensual
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# /opt/scripts/verify_backup.sh
|
|
|
|
# Descargar último backup
|
|
LATEST=$(aws s3 ls s3://architect/backups/postgres/ --endpoint-url $R2_ENDPOINT | \
|
|
sort | tail -1 | awk '{print $4}')
|
|
|
|
aws s3 cp s3://architect/backups/postgres/$LATEST /tmp/ \
|
|
--endpoint-url $R2_ENDPOINT
|
|
|
|
# Verificar integridad
|
|
gunzip -t /tmp/$LATEST
|
|
if [ $? -eq 0 ]; then
|
|
echo "Backup válido: $LATEST"
|
|
else
|
|
echo "ERROR: Backup corrupto: $LATEST"
|
|
# Enviar alerta
|
|
fi
|
|
|
|
rm /tmp/$LATEST
|
|
```
|
|
|
|
### Checklist de Verificación
|
|
|
|
- [ ] Backup PostgreSQL ARCHITECT existe (< 24h)
|
|
- [ ] Backup PostgreSQL DECK existe (< 24h)
|
|
- [ ] Backup PostgreSQL CORP existe (< 24h)
|
|
- [ ] Backup Gitea existe (< 7d)
|
|
- [ ] Integridad verificada (gunzip -t)
|
|
- [ ] Restore test exitoso (mensual)
|
|
|
|
---
|
|
|
|
## Alertas
|
|
|
|
### Configuración ntfy
|
|
|
|
```bash
|
|
# Notificar si backup falla
|
|
if [ $? -ne 0 ]; then
|
|
curl -d "Backup FALLIDO: $DATE" ntfy.sh/tzzr-alerts
|
|
fi
|
|
```
|
|
|
|
### Monitoreo
|
|
|
|
```bash
|
|
# Verificar último backup
|
|
aws s3 ls s3://architect/backups/postgres/ --endpoint-url $R2_ENDPOINT | \
|
|
sort | tail -5
|
|
```
|