Quickstart¶
Get AI-SRE running in under 5 minutes. This guide covers three deployment paths: bare metal with SQLite (fastest for evaluation), Kubernetes via Minikube, and Docker Compose.
Prerequisites¶
| Requirement | Bare Metal | Docker | Kubernetes |
|---|---|---|---|
| Python 3.11+ | Required | -- | -- |
| Docker 24+ | -- | Required | -- |
| Minikube + Helm | -- | -- | Required |
| LLM API key (Anthropic or OpenAI) | Required | Required | Required |
You need at least one LLM API key. AI-SRE supports Anthropic Claude (ANTHROPIC_API_KEY), OpenAI GPT (OPENAI_API_KEY), and self-hosted Ollama for air-gapped environments.
Option 1: Bare Metal with SQLite¶
This is the fastest path to a working AI-SRE instance. It uses SQLite for storage (created automatically) and a mock log provider so you do not need any external services.
Clone and Install¶
git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install with development dependencies
pip install -e ".[dev]"
Configure¶
Edit .env and set at minimum one LLM provider key:
# Choose one:
ANTHROPIC_API_KEY=sk-ant-your-key-here
# or
OPENAI_API_KEY=sk-your-key-here
# Use mock log provider for offline development
LOG_PROVIDER=mock
# SQLite database (created automatically in data/)
DATABASE_URL=sqlite+aiosqlite:///./data/ai_sre.db
Run¶
Or equivalently:
The server starts on http://localhost:8888.
Verify¶
Open a second terminal:
# Health check
curl http://localhost:8888/health
# => {"status":"ok"}
# Seed demo data (creates sample incidents, actions, deploys)
make seed
# or: curl -s -X POST http://localhost:8888/demo/seed | python -m json.tool
# List incidents
curl http://localhost:8888/incidents
# => {"incidents":[...],"count":...}
# Open the operator console in your browser
open http://localhost:8888/console
Option 2: Kubernetes via Minikube¶
This path deploys AI-SRE into a local Kubernetes cluster with full RBAC for cluster actions. The Helm chart handles service accounts, cluster roles, and persistence.
Prerequisites¶
Deploy¶
git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE
# Start minikube if not already running
minikube start
# Build the Docker image inside minikube and deploy via Helm
make minikube-deploy
This single command:
- Builds the Docker image inside the minikube Docker daemon
- Creates the
ai-srenamespace - Applies Custom Resource Definitions (CRDs)
- Installs the Helm chart with minikube-specific values
Access¶
Set LLM API Key¶
The minikube values file expects a Kubernetes secret. Create one before deploying:
kubectl create secret generic ai-sre-secrets \
--namespace ai-sre \
--from-literal=ANTHROPIC_API_KEY=sk-ant-your-key-here
Then reference it in the Helm install:
helm upgrade --install ai-sre deploy/helm/ai-sre \
--namespace ai-sre \
--set existingSecret=ai-sre-secrets
Clean Up¶
Option 3: Docker Compose¶
Docker Compose runs two services on a single node: the ingestion server and the Slack bot.
Deploy¶
git clone https://github.com/aabhat-ai/AI-SRE.git && cd AI-SRE
# Configure
cp .env.example .env
# Edit .env with your API keys
# Start services
docker compose -f deploy/docker-compose.yml up -d
# View logs
docker compose -f deploy/docker-compose.yml logs -f ingestion
Services¶
| Service | Port | Description |
|---|---|---|
ingestion |
8888 | FastAPI server with all API endpoints |
slack_bot |
-- | Slack bot (requires Slack credentials in .env) |
Stop¶
See It Working: Run the Demo¶
Once the server is running (any deployment method), run the interactive demo:
# Quick 30-second demo
make demo
# Auto-healing demo (shows the full alert-to-resolution loop)
make demo-autohealing
# Customer POC demo
make demo-customer
The demo seeds sample incidents and walks through the platform capabilities: alert ingestion, AI diagnosis, action execution, and postmortem generation.
Send Your First Alert¶
With the server running, send a test alert:
curl -X POST http://localhost:8888/webhook \
-H "Content-Type: application/json" \
-H "X-Source: webhook" \
-d '{
"title": "High CPU on payments-api",
"description": "CPU usage > 90% for 5 minutes",
"severity": "critical",
"source": "prometheus",
"namespace": "production",
"pod": "payments-api-7d8f9c6b5-x2k4m",
"deployment": "payments-api",
"service": "payments-api"
}'
Then request AI diagnosis:
The diagnosis returns a root cause analysis with confidence score, Kubernetes context, log excerpts, matched playbooks, similar incidents, and suggested actions -- each wrapped with policy engine metadata indicating whether it can be auto-executed, requires approval, or is advisory-only.
Next Steps¶
- Configuration Reference -- All environment variables documented
- Module Catalog -- Overview of all 54 platform modules
- API Reference -- Complete endpoint documentation with examples
- Deployment Guide -- Production deployment options