Skip to content

Docker Deployment

Run AI-SRE using Docker Compose on a single node. This deployment is suitable for pilots, small teams, and environments where Kubernetes is not available.


Architecture

Docker Compose runs two services: the ingestion server (FastAPI) and the Slack bot. Both share the same .env configuration and a persistent volume for the SQLite database.

graph TB
    subgraph Docker["Docker Compose"]
        ING[ingestion<br/>FastAPI :8888]
        SB[slack_bot<br/>Socket Mode]
        VOL[(ai_sre_data<br/>volume)]
        ING --> VOL
        SB --> VOL
    end
    MON[Alert Sources] -->|:8888| ING
    SB -->|Socket Mode| SLACK[Slack API]
    ING -->|K8s API| K8S[Kubernetes]

Prerequisites

Requirement Version
Docker 24+
Docker Compose v2 (bundled with Docker Desktop)
LLM API key Anthropic or OpenAI

Quick Start

# Clone the repository
git clone https://github.com/aabhat-ai/AI-SRE.git
cd AI-SRE

# Configure environment
cp .env.example .env
# Edit .env -- at minimum set ANTHROPIC_API_KEY or OPENAI_API_KEY

# Start services
docker compose -f deploy/docker-compose.yml up -d

# Verify
curl http://localhost:8888/health
# {"status":"ok"}

# Seed demo data
curl -s -X POST http://localhost:8888/demo/seed | python -m json.tool

# Open operator console
open http://localhost:8888/console

Services

Service Port Command Health Check
ingestion 8888 python -m src.ingestion.server GET /health every 30s
slack_bot -- python -m src.slack_bot.app Depends on ingestion health

The Slack bot only starts when Slack credentials (SLACK_BOT_TOKEN, SLACK_APP_TOKEN) are configured in .env. Without them, the container starts but the bot exits gracefully.


Configuration

Docker Compose reads all configuration from your .env file. The compose file passes through every variable defined in .env.example with sensible defaults.

Key Overrides

# Change the exposed port
INGESTION_PORT=9090

# Switch to PostgreSQL (see below)
DATABASE_URL=postgresql+asyncpg://ai_sre:changeme@postgres:5432/ai_sre

# Enable autonomy
AUTONOMY_ENABLED=true
DEFAULT_DRY_RUN=false

Volumes

Volume Mount Point Purpose
ai_sre_data /app/data Persistent SQLite database
~/.kube/config (optional) /root/.kube/config (read-only) Kubeconfig for K8s actions

The ai_sre_data volume persists across container restarts and recreations. To reset the database, remove the volume:

docker compose -f deploy/docker-compose.yml down -v

Adding PostgreSQL

For production use, add PostgreSQL to the Docker Compose stack. Create or modify deploy/docker-compose.yml:

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: ai_sre
      POSTGRES_PASSWORD: changeme
      POSTGRES_DB: ai_sre
    volumes:
      - pg_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ai_sre"]
      interval: 10s
      timeout: 5s
      retries: 5

  ingestion:
    # ... existing config ...
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DATABASE_URL: postgresql+asyncpg://ai_sre:changeme@postgres:5432/ai_sre

volumes:
  pg_data:

Update .env:

DATABASE_URL=postgresql+asyncpg://ai_sre:changeme@postgres:5432/ai_sre

Operations

View Logs

# All services
docker compose -f deploy/docker-compose.yml logs -f

# Ingestion server only
docker compose -f deploy/docker-compose.yml logs -f ingestion

# Slack bot only
docker compose -f deploy/docker-compose.yml logs -f slack_bot

Restart

# Restart all services
docker compose -f deploy/docker-compose.yml restart

# Restart a single service
docker compose -f deploy/docker-compose.yml restart ingestion

Stop

# Stop services (preserve volumes)
docker compose -f deploy/docker-compose.yml down

# Stop and remove volumes (deletes data)
docker compose -f deploy/docker-compose.yml down -v

Rebuild After Code Changes

docker compose -f deploy/docker-compose.yml build
docker compose -f deploy/docker-compose.yml up -d

Building a Custom Image

The Dockerfile is located at deploy/Dockerfile. To build a custom image:

# Build
docker build -t ai-sre:custom -f deploy/Dockerfile .

# Run standalone
docker run -d --name ai-sre \
  --env-file .env \
  -p 8888:8888 \
  -v ai_sre_data:/app/data \
  ai-sre:custom

Reverse Proxy

For production, place Docker behind a reverse proxy for TLS termination.

Traefik (Docker-Native)

Add Traefik labels to the ingestion service:

services:
  ingestion:
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.ai-sre.rule=Host(`ai-sre.example.com`)"
      - "traefik.http.routers.ai-sre.tls.certresolver=letsencrypt"
      - "traefik.http.services.ai-sre.loadbalancer.server.port=8888"

Nginx

See the Bare Metal deployment guide for Nginx configuration -- it works the same way when proxying to the Docker container's port.


Monitoring

Health Checks

Docker Compose uses the built-in health check on the ingestion service:

# Check service health
docker compose -f deploy/docker-compose.yml ps

Prometheus Metrics

Scrape metrics from the Docker container:

# prometheus.yml
scrape_configs:
  - job_name: ai-sre
    static_configs:
      - targets: ['host.docker.internal:8888']

Or if Prometheus is in the same Docker network:

scrape_configs:
  - job_name: ai-sre
    static_configs:
      - targets: ['ingestion:8888']

Backup

SQLite (Default)

# Copy the database file from the volume
docker cp $(docker compose -f deploy/docker-compose.yml ps -q ingestion):/app/data/ai_sre.db ./backup.db

PostgreSQL

docker compose -f deploy/docker-compose.yml exec postgres \
  pg_dump -U ai_sre ai_sre > backup.sql

Upgrading

cd AI-SRE

# Pull latest code
git pull origin main

# Rebuild and restart
docker compose -f deploy/docker-compose.yml build
docker compose -f deploy/docker-compose.yml up -d

# Run migrations (PostgreSQL only)
docker compose -f deploy/docker-compose.yml exec ingestion \
  alembic upgrade head

Troubleshooting

Symptom Cause Fix
Container exits immediately Missing API keys Check .env has ANTHROPIC_API_KEY or OPENAI_API_KEY
Port 8888 already in use Another service on same port Change INGESTION_PORT in .env or map to a different host port
Slack bot not connecting Missing Slack tokens Set SLACK_BOT_TOKEN and SLACK_APP_TOKEN in .env
K8s actions fail Kubeconfig not mounted Set KUBECONFIG path and ensure the file is readable inside the container
Database locked Concurrent SQLite writes Switch to PostgreSQL for multi-process deployments