Quickstart¶

Get AI-SRE running in under 5 minutes. This guide covers three deployment paths: bare metal with SQLite (fastest for evaluation), Kubernetes via Minikube, and Docker Compose.

Prerequisites¶

Requirement	Bare Metal	Docker	Kubernetes
Python 3.11+	Required	--	--
Docker 24+	--	Required	--
Minikube + Helm	--	--	Required
LLM API key (Anthropic or OpenAI)	Required	Required	Required

You need at least one LLM API key. AI-SRE supports Anthropic Claude (ANTHROPIC_API_KEY), OpenAI GPT (OPENAI_API_KEY), and self-hosted Ollama for air-gapped environments.

Option 1: Bare Metal with SQLite¶

This is the fastest path to a working AI-SRE instance. It uses SQLite for storage (created automatically) and a mock log provider so you do not need any external services.

Clone and Install¶

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install with development dependencies
pip install -e ".[dev]"

Configure¶

# Copy the example configuration
cp .env.example .env

Edit .env and set at minimum one LLM provider key:

# Choose one:
ANTHROPIC_API_KEY=sk-ant-your-key-here
# or
OPENAI_API_KEY=sk-your-key-here

# Use mock log provider for offline development
LOG_PROVIDER=mock

# SQLite database (created automatically in data/)
DATABASE_URL=sqlite+aiosqlite:///./data/ai_sre.db

Run¶

# Start the server (uses mock log provider, SQLite, port 8888)
make run

Or equivalently:

python -m src.ingestion.server

The server starts on http://localhost:8888.

Verify¶

Open a second terminal:

# Health check
curl http://localhost:8888/health
# => {"status":"ok"}

# Seed demo data (creates sample incidents, actions, deploys)
make seed
# or: curl -s -X POST http://localhost:8888/demo/seed | python -m json.tool

# List incidents
curl http://localhost:8888/incidents
# => {"incidents":[...],"count":...}

# Open the operator console in your browser
open http://localhost:8888/console

Option 2: Kubernetes via Minikube¶

This path deploys AI-SRE into a local Kubernetes cluster with full RBAC for cluster actions. The Helm chart handles service accounts, cluster roles, and persistence.

Prerequisites¶

# Ensure minikube and helm are installed
minikube version
helm version

Deploy¶

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Start minikube if not already running
minikube start

# Build the Docker image inside minikube and deploy via Helm
make minikube-deploy

This single command:

Builds the Docker image inside the minikube Docker daemon
Creates the ai-sre namespace
Applies Custom Resource Definitions (CRDs)
Installs the Helm chart with minikube-specific values

Access¶

# Get the service URL
make minikube-url

# View logs
make minikube-logs

Set LLM API Key¶

The minikube values file expects a Kubernetes secret. Create one before deploying:

kubectl create secret generic ai-sre-secrets \
  --namespace ai-sre \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here

Then reference it in the Helm install:

helm upgrade --install ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --set existingSecret=ai-sre-secrets

Clean Up¶

make minikube-delete

Option 3: Docker Compose¶

Docker Compose runs two services on a single node: the ingestion server and the Slack bot.

Deploy¶

git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE

# Configure
cp .env.example .env
# Edit .env with your API keys

# Start services
docker compose -f deploy/docker-compose.yml up -d

# View logs
docker compose -f deploy/docker-compose.yml logs -f ingestion

Services¶

Service	Port	Description
`ingestion`	8888	FastAPI server with all API endpoints
`slack_bot`	--	Slack bot (requires Slack credentials in `.env`)

Stop¶

docker compose -f deploy/docker-compose.yml down

See It Working: Run the Demo¶

Once the server is running (any deployment method), run the interactive demo:

# Quick 30-second demo
make demo

# Auto-healing demo (shows the full alert-to-resolution loop)
make demo-autohealing

# Customer POC demo
make demo-customer

The demo seeds sample incidents and walks through the platform capabilities: alert ingestion, AI diagnosis, action execution, and postmortem generation.

Send Your First Alert¶

With the server running, send a test alert:

curl -X POST http://localhost:8888/webhook \
  -H "Content-Type: application/json" \
  -H "X-Source: webhook" \
  -d '{
    "title": "High CPU on payments-api",
    "description": "CPU usage > 90% for 5 minutes",
    "severity": "critical",
    "source": "prometheus",
    "namespace": "production",
    "pod": "payments-api-7d8f9c6b5-x2k4m",
    "deployment": "payments-api",
    "service": "payments-api"
  }'

Then request AI diagnosis:

curl http://localhost:8888/incidents/<incident_id>/diagnosis

The diagnosis returns a root cause analysis with confidence score, Kubernetes context, log excerpts, matched playbooks, similar incidents, and suggested actions -- each wrapped with policy engine metadata indicating whether it can be auto-executed, requires approval, or is advisory-only.

Next Steps¶

Configuration Reference -- All environment variables documented
Module Catalog -- Overview of all 54 platform modules
API Reference -- Complete endpoint documentation with examples
Deployment Guide -- Production deployment options