Skip to content

Quickstart

Get AI-SRE running in under 5 minutes. This guide covers three deployment paths: bare metal with SQLite (fastest for evaluation), Kubernetes via Minikube, and Docker Compose.

Prerequisites

Requirement Bare Metal Docker Kubernetes
Python 3.11+ Required -- --
Docker 24+ -- Required --
Minikube + Helm -- -- Required
LLM API key (Anthropic or OpenAI) Required Required Required

You need at least one LLM API key. AI-SRE supports Anthropic Claude (ANTHROPIC_API_KEY), OpenAI GPT (OPENAI_API_KEY), and self-hosted Ollama for air-gapped environments.


Option 1: Bare Metal with SQLite

This is the fastest path to a working AI-SRE instance. It uses SQLite for storage (created automatically) and a mock log provider so you do not need any external services.

Clone and Install

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install with development dependencies
pip install -e ".[dev]"

Configure

# Copy the example configuration
cp .env.example .env

Edit .env and set at minimum one LLM provider key:

# Choose one:
ANTHROPIC_API_KEY=sk-ant-your-key-here
# or
OPENAI_API_KEY=sk-your-key-here

# Use mock log provider for offline development
LOG_PROVIDER=mock

# SQLite database (created automatically in data/)
DATABASE_URL=sqlite+aiosqlite:///./data/ai_sre.db

Run

# Start the server (uses mock log provider, SQLite, port 8888)
make run

Or equivalently:

python -m src.ingestion.server

The server starts on http://localhost:8888.

Verify

Open a second terminal:

# Health check
curl http://localhost:8888/health
# => {"status":"ok"}

# Seed demo data (creates sample incidents, actions, deploys)
make seed
# or: curl -s -X POST http://localhost:8888/demo/seed | python -m json.tool

# List incidents
curl http://localhost:8888/incidents
# => {"incidents":[...],"count":...}

# Open the operator console in your browser
open http://localhost:8888/console

Option 2: Kubernetes via Minikube

This path deploys AI-SRE into a local Kubernetes cluster with full RBAC for cluster actions. The Helm chart handles service accounts, cluster roles, and persistence.

Prerequisites

# Ensure minikube and helm are installed
minikube version
helm version

Deploy

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Start minikube if not already running
minikube start

# Build the Docker image inside minikube and deploy via Helm
make minikube-deploy

This single command:

  1. Builds the Docker image inside the minikube Docker daemon
  2. Creates the ai-sre namespace
  3. Applies Custom Resource Definitions (CRDs)
  4. Installs the Helm chart with minikube-specific values

Access

# Get the service URL
make minikube-url

# View logs
make minikube-logs

Set LLM API Key

The minikube values file expects a Kubernetes secret. Create one before deploying:

kubectl create secret generic ai-sre-secrets \
  --namespace ai-sre \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here

Then reference it in the Helm install:

helm upgrade --install ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --set existingSecret=ai-sre-secrets

Clean Up

make minikube-delete

Option 3: Docker Compose

Docker Compose runs two services on a single node: the ingestion server and the Slack bot.

Deploy

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Configure
cp .env.example .env
# Edit .env with your API keys

# Start services
docker compose -f deploy/docker-compose.yml up -d

# View logs
docker compose -f deploy/docker-compose.yml logs -f ingestion

Services

Service Port Description
ingestion 8888 FastAPI server with all API endpoints
slack_bot -- Slack bot (requires Slack credentials in .env)

Stop

docker compose -f deploy/docker-compose.yml down

See It Working: Run the Demo

Once the server is running (any deployment method), run the interactive demo:

# Quick 30-second demo
make demo

# Auto-healing demo (shows the full alert-to-resolution loop)
make demo-autohealing

# Customer POC demo
make demo-customer

The demo seeds sample incidents and walks through the platform capabilities: alert ingestion, AI diagnosis, action execution, and postmortem generation.


Send Your First Alert

With the server running, send a test alert:

curl -X POST http://localhost:8888/webhook \
  -H "Content-Type: application/json" \
  -H "X-Source: webhook" \
  -d '{
    "title": "High CPU on payments-api",
    "description": "CPU usage > 90% for 5 minutes",
    "severity": "critical",
    "source": "prometheus",
    "namespace": "production",
    "pod": "payments-api-7d8f9c6b5-x2k4m",
    "deployment": "payments-api",
    "service": "payments-api"
  }'

Then request AI diagnosis:

curl http://localhost:8888/incidents/<incident_id>/diagnosis

The diagnosis returns a root cause analysis with confidence score, Kubernetes context, log excerpts, matched playbooks, similar incidents, and suggested actions -- each wrapped with policy engine metadata indicating whether it can be auto-executed, requires approval, or is advisory-only.


Next Steps