On-Premises Deployment¶
Deploy AI-SRE in a customer's own data center or private cloud environment. This guide covers air-gapped installations, self-hosted LLM configuration, network requirements, and operational handoff.
Overview¶
On-premises deployment is designed for organizations that cannot send data to external cloud services. AI-SRE supports fully air-gapped operation with self-hosted LLM inference (via Ollama), on-premise log backends, and local Kubernetes clusters.
graph TB
subgraph Customer["Customer Data Center"]
subgraph K8S["Kubernetes Cluster"]
AISRE[AI-SRE<br/>Helm Chart]
PG[(PostgreSQL)]
AISRE --> PG
end
subgraph Infra["Infrastructure"]
OLL[Ollama Server<br/>Self-hosted LLM]
LOKI[Grafana Loki<br/>Log Backend]
PROM[Prometheus<br/>Monitoring]
end
subgraph Monitoring["Alert Sources"]
AM[Alertmanager]
DD[Datadog Agent]
GF[Grafana]
end
AISRE --> OLL
AISRE --> LOKI
PROM --> AISRE
Monitoring -->|webhooks| AISRE
end
style Customer fill:#1a1a2e,color:#fff
Deployment Profile¶
Set the deployment profile to on_prem to enable on-premise-specific behavior:
This setting:
- Disables features that require outbound internet access (unless explicitly configured)
- Adjusts default safety posture for enterprise environments
- Enables local-first storage and caching strategies
Prerequisites¶
Hardware Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| AI-SRE server | 2 vCPU, 2 GB RAM | 4 vCPU, 4 GB RAM |
| PostgreSQL | 2 vCPU, 4 GB RAM, 50 GB SSD | 4 vCPU, 8 GB RAM, 100 GB SSD |
| Ollama (self-hosted LLM) | 4 vCPU, 16 GB RAM | 8 vCPU, 32 GB RAM, GPU recommended |
| Storage | 20 GB for application | 100 GB for logs and database growth |
Software Requirements¶
| Software | Version | Notes |
|---|---|---|
| Kubernetes | 1.27+ | Or Docker for single-node deployment |
| Helm | 3.12+ | For Kubernetes deployment |
| PostgreSQL | 14+ | Required for production (SQLite not recommended for on-prem) |
| Ollama | Latest | For self-hosted LLM inference |
Network Requirements¶
AI-SRE needs the following network connectivity within the customer environment:
| Source | Destination | Port | Protocol | Purpose |
|---|---|---|---|---|
| AI-SRE | PostgreSQL | 5432 | TCP | Database |
| AI-SRE | Ollama | 11434 | TCP | LLM inference |
| AI-SRE | Kubernetes API | 6443 | TCP | Cluster inspection and remediation |
| AI-SRE | Loki / Elasticsearch | 3100 / 9200 | TCP | Log fetching |
| Alert sources | AI-SRE | 8888 | TCP | Webhook ingestion |
| AI-SRE | Slack / Teams (optional) | 443 | TCP | Notifications (requires outbound internet) |
Air-gapped operation
For fully air-gapped environments, AI-SRE requires no outbound internet access when using Ollama for LLM inference and an on-premise log backend. Slack and Teams notifications require outbound HTTPS -- use email (SMTP to an internal relay) for air-gapped notification delivery.
Self-Hosted LLM with Ollama¶
Install Ollama¶
On the LLM server:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (requires internet during setup)
ollama pull llama3.2
# For air-gapped: transfer the model files manually
# Models are stored in ~/.ollama/models/
Configure AI-SRE¶
Model Recommendations¶
| Model | Parameters | RAM Required | Quality | Speed |
|---|---|---|---|---|
llama3.2 |
3B | 4 GB | Good for basic diagnosis | Fast |
llama3.1 |
8B | 8 GB | Better reasoning | Moderate |
llama3.1:70b |
70B | 48 GB | Best quality | Slow (GPU recommended) |
codellama |
7B | 8 GB | Good for config analysis | Moderate |
mixtral |
8x7B | 32 GB | Strong reasoning | Moderate |
GPU acceleration
For production on-prem deployments, a GPU (NVIDIA with CUDA support) significantly improves LLM inference speed. Ollama automatically uses GPU when available.
Installation Steps¶
Step 1: Prepare the Environment¶
# On the deployment server
mkdir -p /opt/ai-sre
cd /opt/ai-sre
# For air-gapped: transfer the repository tarball
# On a connected machine: git archive --format=tar.gz HEAD > ai-sre.tar.gz
# Transfer ai-sre.tar.gz to the server
tar xzf ai-sre.tar.gz
Step 2: Install Dependencies¶
# Create namespace
kubectl create namespace ai-sre
# Create secrets
kubectl create secret generic ai-sre-secrets \
--namespace ai-sre \
--from-literal=DATABASE_URL="postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre"
# Install via Helm
helm install ai-sre deploy/helm/ai-sre \
--namespace ai-sre \
--set existingSecret=ai-sre-secrets \
--set config.DEPLOYMENT_PROFILE=on_prem \
--set config.LLM_PROVIDER=ollama \
--set config.OLLAMA_BASE_URL=http://ollama-server.internal:11434 \
--set config.OLLAMA_MODEL=llama3.2 \
--set config.LOG_PROVIDER=loki \
--set config.LOKI_URL=http://loki.internal:3100 \
--set config.DEFAULT_DRY_RUN=true \
--set config.APPROVAL_REQUIRED=true
Step 3: Configure for On-Prem¶
Create or edit .env with the on-premise configuration:
# Deployment profile
DEPLOYMENT_PROFILE=on_prem
WORKSPACE_ID=customer-prod
WORKSPACE_NAME="Customer Production"
# Self-hosted LLM
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama-server.internal:11434
OLLAMA_MODEL=llama3.2
# Production database
DATABASE_URL=postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre
# On-premise log backend
LOG_PROVIDER=loki
LOKI_URL=http://loki.internal:3100
# Kubernetes (in-cluster or explicit kubeconfig)
ACTION_PROVIDER=kubernetes
ALLOWED_NAMESPACES=production,staging
MAX_SCALE_REPLICAS=10
# Safety (conservative defaults for customer environments)
DEFAULT_DRY_RUN=true
APPROVAL_REQUIRED=true
AUTONOMY_ENABLED=false
AUTONOMOUS_ACTIONS=restart_pod
# API security
AI_SRE_API_KEYS=customer-admin-key-001,customer-ops-key-001
AI_SRE_RATE_LIMIT=120
AI_SRE_CORS_ORIGINS=https://internal-console.customer.com
# Notifications (internal SMTP relay for air-gapped)
SMTP_HOST=smtp-relay.internal
SMTP_PORT=25
SMTP_USER=
SMTP_PASSWORD=
ALERT_EMAIL_TO=oncall@customer.com
Step 4: Connect Alert Sources¶
Configure the customer's monitoring tools to send webhooks to AI-SRE:
# Alertmanager example (alertmanager.yml)
receivers:
- name: ai-sre
webhook_configs:
- url: http://ai-sre.ai-sre.svc.cluster.local:8888/webhook
http_config:
headers:
X-Source: alertmanager
X-API-Key: customer-ops-key-001
send_resolved: true
Step 5: Verify¶
# Health check
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/health
# Platform overview
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/platform/overview
# Seed demo data for validation
curl -X POST -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/demo/seed
# Check adapters are configured
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/adapters
Air-Gapped Installation¶
For environments with no internet access:
Prepare Offline Packages¶
On a connected machine:
# Clone the repo
git clone https://github.com/aabhat-ai/AI-SRE.git
cd AI-SRE
# Download Python wheels
pip download -d wheels/ -e ".[dev]"
# Download Docker images
docker pull python:3.12-slim
docker pull postgres:16-alpine
docker save python:3.12-slim postgres:16-alpine > images.tar
# Download Ollama model
ollama pull llama3.2
# Copy from ~/.ollama/models/ to a transfer medium
# Package everything
tar czf ai-sre-offline.tar.gz \
AI-SRE/ wheels/ images.tar
Install Offline¶
On the air-gapped server:
tar xzf ai-sre-offline.tar.gz
# Load Docker images
docker load < images.tar
# Install Python packages from local wheels
cd AI-SRE
python -m venv .venv
source .venv/bin/activate
pip install --no-index --find-links=../wheels/ -e ".[dev]"
# Copy Ollama models to ~/.ollama/models/
Security Hardening¶
Network Policies¶
Restrict AI-SRE's network access in Kubernetes:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ai-sre-network-policy
namespace: ai-sre
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: ai-sre
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
purpose: monitoring
ports:
- port: 8888
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
- to:
- podSelector:
matchLabels:
app: ollama
ports:
- port: 11434
- to:
- podSelector: {}
ports:
- port: 6443 # K8s API
API Key Rotation¶
Rotate API keys regularly:
- Add the new key to
AI_SRE_API_KEYSalongside the old one - Update all clients to use the new key
- Remove the old key from
AI_SRE_API_KEYS - Restart the service
Audit Trail¶
All actions are logged in the timeline. Export audit data regularly:
Operational Handoff Checklist¶
Use this checklist when handing off an on-prem deployment to the customer's operations team:
- AI-SRE server running and healthy (
GET /health) - PostgreSQL database accessible and backed up
- Ollama server running with the required model loaded
- Alert sources configured and sending webhooks
- API keys configured and distributed to operators
- RBAC roles assigned (admin, operator, viewer)
- Notification channels configured (email, Slack, or Teams)
- Namespace allowlist configured (
ALLOWED_NAMESPACES) - Safety defaults verified (
DEFAULT_DRY_RUN=true,APPROVAL_REQUIRED=true) - Prometheus scraping
/metricsendpoint - Backup schedule configured for PostgreSQL
- Runbook: How to restart AI-SRE
- Runbook: How to rotate API keys
- Runbook: How to upgrade AI-SRE
- Runbook: How to add new alert sources
- Escalation contacts documented
Upgrading On-Prem Installations¶
With Internet Access¶
cd /opt/ai-sre
git pull origin main
# Kubernetes
helm upgrade ai-sre deploy/helm/ai-sre \
--namespace ai-sre \
--reuse-values
# Bare metal
source .venv/bin/activate
pip install -e ".[dev]"
make db-migrate
sudo systemctl restart ai-sre
Air-Gapped¶
- Prepare a new offline package on a connected machine (same process as initial install)
- Transfer to the air-gapped environment
- Apply the update: