Docker Deployment¶
Run AI-SRE using Docker Compose on a single node. This deployment is suitable for pilots, small teams, and environments where Kubernetes is not available.
Architecture¶
Docker Compose runs two services: the ingestion server (FastAPI) and the Slack bot. Both share the same .env configuration and a persistent volume for the SQLite database.
graph TB
subgraph Docker["Docker Compose"]
ING[ingestion<br/>FastAPI :8888]
SB[slack_bot<br/>Socket Mode]
VOL[(ai_sre_data<br/>volume)]
ING --> VOL
SB --> VOL
end
MON[Alert Sources] -->|:8888| ING
SB -->|Socket Mode| SLACK[Slack API]
ING -->|K8s API| K8S[Kubernetes]
Prerequisites¶
| Requirement | Version |
|---|---|
| Docker | 24+ |
| Docker Compose | v2 (bundled with Docker Desktop) |
| LLM API key | Anthropic or OpenAI |
Quick Start¶
# Clone the repository
git clone https://github.com/aabhat-ai/AI-SRE.git
cd AI-SRE
# Configure environment
cp .env.example .env
# Edit .env -- at minimum set ANTHROPIC_API_KEY or OPENAI_API_KEY
# Start services
docker compose -f deploy/docker-compose.yml up -d
# Verify
curl http://localhost:8888/health
# {"status":"ok"}
# Seed demo data
curl -s -X POST http://localhost:8888/demo/seed | python -m json.tool
# Open operator console
open http://localhost:8888/console
Services¶
| Service | Port | Command | Health Check |
|---|---|---|---|
ingestion |
8888 | python -m src.ingestion.server |
GET /health every 30s |
slack_bot |
-- | python -m src.slack_bot.app |
Depends on ingestion health |
The Slack bot only starts when Slack credentials (SLACK_BOT_TOKEN, SLACK_APP_TOKEN) are configured in .env. Without them, the container starts but the bot exits gracefully.
Configuration¶
Docker Compose reads all configuration from your .env file. The compose file passes through every variable defined in .env.example with sensible defaults.
Key Overrides¶
# Change the exposed port
INGESTION_PORT=9090
# Switch to PostgreSQL (see below)
DATABASE_URL=postgresql+asyncpg://ai_sre:changeme@postgres:5432/ai_sre
# Enable autonomy
AUTONOMY_ENABLED=true
DEFAULT_DRY_RUN=false
Volumes¶
| Volume | Mount Point | Purpose |
|---|---|---|
ai_sre_data |
/app/data |
Persistent SQLite database |
~/.kube/config (optional) |
/root/.kube/config (read-only) |
Kubeconfig for K8s actions |
The ai_sre_data volume persists across container restarts and recreations. To reset the database, remove the volume:
Adding PostgreSQL¶
For production use, add PostgreSQL to the Docker Compose stack. Create or modify deploy/docker-compose.yml:
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ai_sre
POSTGRES_PASSWORD: changeme
POSTGRES_DB: ai_sre
volumes:
- pg_data:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ai_sre"]
interval: 10s
timeout: 5s
retries: 5
ingestion:
# ... existing config ...
depends_on:
postgres:
condition: service_healthy
environment:
DATABASE_URL: postgresql+asyncpg://ai_sre:changeme@postgres:5432/ai_sre
volumes:
pg_data:
Update .env:
Operations¶
View Logs¶
# All services
docker compose -f deploy/docker-compose.yml logs -f
# Ingestion server only
docker compose -f deploy/docker-compose.yml logs -f ingestion
# Slack bot only
docker compose -f deploy/docker-compose.yml logs -f slack_bot
Restart¶
# Restart all services
docker compose -f deploy/docker-compose.yml restart
# Restart a single service
docker compose -f deploy/docker-compose.yml restart ingestion
Stop¶
# Stop services (preserve volumes)
docker compose -f deploy/docker-compose.yml down
# Stop and remove volumes (deletes data)
docker compose -f deploy/docker-compose.yml down -v
Rebuild After Code Changes¶
Building a Custom Image¶
The Dockerfile is located at deploy/Dockerfile. To build a custom image:
# Build
docker build -t ai-sre:custom -f deploy/Dockerfile .
# Run standalone
docker run -d --name ai-sre \
--env-file .env \
-p 8888:8888 \
-v ai_sre_data:/app/data \
ai-sre:custom
Reverse Proxy¶
For production, place Docker behind a reverse proxy for TLS termination.
Traefik (Docker-Native)¶
Add Traefik labels to the ingestion service:
services:
ingestion:
labels:
- "traefik.enable=true"
- "traefik.http.routers.ai-sre.rule=Host(`ai-sre.example.com`)"
- "traefik.http.routers.ai-sre.tls.certresolver=letsencrypt"
- "traefik.http.services.ai-sre.loadbalancer.server.port=8888"
Nginx¶
See the Bare Metal deployment guide for Nginx configuration -- it works the same way when proxying to the Docker container's port.
Monitoring¶
Health Checks¶
Docker Compose uses the built-in health check on the ingestion service:
Prometheus Metrics¶
Scrape metrics from the Docker container:
# prometheus.yml
scrape_configs:
- job_name: ai-sre
static_configs:
- targets: ['host.docker.internal:8888']
Or if Prometheus is in the same Docker network:
Backup¶
SQLite (Default)¶
# Copy the database file from the volume
docker cp $(docker compose -f deploy/docker-compose.yml ps -q ingestion):/app/data/ai_sre.db ./backup.db
PostgreSQL¶
Upgrading¶
cd AI-SRE
# Pull latest code
git pull origin main
# Rebuild and restart
docker compose -f deploy/docker-compose.yml build
docker compose -f deploy/docker-compose.yml up -d
# Run migrations (PostgreSQL only)
docker compose -f deploy/docker-compose.yml exec ingestion \
alembic upgrade head
Troubleshooting¶
| Symptom | Cause | Fix |
|---|---|---|
| Container exits immediately | Missing API keys | Check .env has ANTHROPIC_API_KEY or OPENAI_API_KEY |
| Port 8888 already in use | Another service on same port | Change INGESTION_PORT in .env or map to a different host port |
| Slack bot not connecting | Missing Slack tokens | Set SLACK_BOT_TOKEN and SLACK_APP_TOKEN in .env |
| K8s actions fail | Kubeconfig not mounted | Set KUBECONFIG path and ensure the file is readable inside the container |
| Database locked | Concurrent SQLite writes | Switch to PostgreSQL for multi-process deployments |