On-Premises Deployment¶

Deploy AI-SRE in a customer's own data center or private cloud environment. This guide covers air-gapped installations, self-hosted LLM configuration, network requirements, and operational handoff.

Overview¶

On-premises deployment is designed for organizations that cannot send data to external cloud services. AI-SRE supports fully air-gapped operation with self-hosted LLM inference (via Ollama), on-premise log backends, and local Kubernetes clusters.

graph TB
    subgraph Customer["Customer Data Center"]
        subgraph K8S["Kubernetes Cluster"]
            AISRE[AI-SRE<br/>Helm Chart]
            PG[(PostgreSQL)]
            AISRE --> PG
        end

        subgraph Infra["Infrastructure"]
            OLL[Ollama Server<br/>Self-hosted LLM]
            LOKI[Grafana Loki<br/>Log Backend]
            PROM[Prometheus<br/>Monitoring]
        end

        subgraph Monitoring["Alert Sources"]
            AM[Alertmanager]
            DD[Datadog Agent]
            GF[Grafana]
        end

        AISRE --> OLL
        AISRE --> LOKI
        PROM --> AISRE
        Monitoring -->|webhooks| AISRE
    end

    style Customer fill:#1a1a2e,color:#fff

Deployment Profile¶

Set the deployment profile to on_prem to enable on-premise-specific behavior:

DEPLOYMENT_PROFILE=on_prem

This setting:

Disables features that require outbound internet access (unless explicitly configured)
Adjusts default safety posture for enterprise environments
Enables local-first storage and caching strategies

Prerequisites¶

Hardware Requirements¶

Component	Minimum	Recommended
AI-SRE server	2 vCPU, 2 GB RAM	4 vCPU, 4 GB RAM
PostgreSQL	2 vCPU, 4 GB RAM, 50 GB SSD	4 vCPU, 8 GB RAM, 100 GB SSD
Ollama (self-hosted LLM)	4 vCPU, 16 GB RAM	8 vCPU, 32 GB RAM, GPU recommended
Storage	20 GB for application	100 GB for logs and database growth

Software Requirements¶

Software	Version	Notes
Kubernetes	1.27+	Or Docker for single-node deployment
Helm	3.12+	For Kubernetes deployment
PostgreSQL	14+	Required for production (SQLite not recommended for on-prem)
Ollama	Latest	For self-hosted LLM inference

Network Requirements¶

AI-SRE needs the following network connectivity within the customer environment:

Source	Destination	Port	Protocol	Purpose
AI-SRE	PostgreSQL	5432	TCP	Database
AI-SRE	Ollama	11434	TCP	LLM inference
AI-SRE	Kubernetes API	6443	TCP	Cluster inspection and remediation
AI-SRE	Loki / Elasticsearch	3100 / 9200	TCP	Log fetching
Alert sources	AI-SRE	8888	TCP	Webhook ingestion
AI-SRE	Slack / Teams (optional)	443	TCP	Notifications (requires outbound internet)

Air-gapped operation

For fully air-gapped environments, AI-SRE requires no outbound internet access when using Ollama for LLM inference and an on-premise log backend. Slack and Teams notifications require outbound HTTPS -- use email (SMTP to an internal relay) for air-gapped notification delivery.

Self-Hosted LLM with Ollama¶

Install Ollama¶

On the LLM server:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model (requires internet during setup)
ollama pull llama3.2

# For air-gapped: transfer the model files manually
# Models are stored in ~/.ollama/models/

Configure AI-SRE¶

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama-server.internal:11434
OLLAMA_MODEL=llama3.2

Model Recommendations¶

Model	Parameters	RAM Required	Quality	Speed
`llama3.2`	3B	4 GB	Good for basic diagnosis	Fast
`llama3.1`	8B	8 GB	Better reasoning	Moderate
`llama3.1:70b`	70B	48 GB	Best quality	Slow (GPU recommended)
`codellama`	7B	8 GB	Good for config analysis	Moderate
`mixtral`	8x7B	32 GB	Strong reasoning	Moderate

GPU acceleration

For production on-prem deployments, a GPU (NVIDIA with CUDA support) significantly improves LLM inference speed. Ollama automatically uses GPU when available.

Installation Steps¶

Step 1: Prepare the Environment¶

# On the deployment server
mkdir -p /opt/ai-sre
cd /opt/ai-sre

# For air-gapped: transfer the repository tarball
# On a connected machine: git archive --format=tar.gz HEAD > ai-sre.tar.gz
# Transfer ai-sre.tar.gz to the server
tar xzf ai-sre.tar.gz

Step 2: Install Dependencies¶

Kubernetes (Recommended)Docker ComposeBare Metal

# Create namespace
kubectl create namespace ai-sre

# Create secrets
kubectl create secret generic ai-sre-secrets \
  --namespace ai-sre \
  --from-literal=DATABASE_URL="postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre"

# Install via Helm
helm install ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --set existingSecret=ai-sre-secrets \
  --set config.DEPLOYMENT_PROFILE=on_prem \
  --set config.LLM_PROVIDER=ollama \
  --set config.OLLAMA_BASE_URL=http://ollama-server.internal:11434 \
  --set config.OLLAMA_MODEL=llama3.2 \
  --set config.LOG_PROVIDER=loki \
  --set config.LOKI_URL=http://loki.internal:3100 \
  --set config.DEFAULT_DRY_RUN=true \
  --set config.APPROVAL_REQUIRED=true

cp .env.example .env
# Edit .env with on-prem configuration (see below)

docker compose -f deploy/docker-compose.yml up -d

python -m venv .venv
source .venv/bin/activate

# For air-gapped: install from a local wheel cache
pip install --no-index --find-links=/opt/ai-sre/wheels -e ".[dev]"

# Or with internet access:
pip install -e ".[dev]"

cp .env.example .env
# Edit .env with on-prem configuration
make run

Step 3: Configure for On-Prem¶

Create or edit .env with the on-premise configuration:

# Deployment profile
DEPLOYMENT_PROFILE=on_prem
WORKSPACE_ID=customer-prod
WORKSPACE_NAME="Customer Production"

# Self-hosted LLM
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama-server.internal:11434
OLLAMA_MODEL=llama3.2

# Production database
DATABASE_URL=postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre

# On-premise log backend
LOG_PROVIDER=loki
LOKI_URL=http://loki.internal:3100

# Kubernetes (in-cluster or explicit kubeconfig)
ACTION_PROVIDER=kubernetes
ALLOWED_NAMESPACES=production,staging
MAX_SCALE_REPLICAS=10

# Safety (conservative defaults for customer environments)
DEFAULT_DRY_RUN=true
APPROVAL_REQUIRED=true
AUTONOMY_ENABLED=false
AUTONOMOUS_ACTIONS=restart_pod

# API security
AI_SRE_API_KEYS=customer-admin-key-001,customer-ops-key-001
AI_SRE_RATE_LIMIT=120
AI_SRE_CORS_ORIGINS=https://internal-console.customer.com

# Notifications (internal SMTP relay for air-gapped)
SMTP_HOST=smtp-relay.internal
SMTP_PORT=25
SMTP_USER=
SMTP_PASSWORD=
ALERT_EMAIL_TO=oncall@customer.com

Step 4: Connect Alert Sources¶

Configure the customer's monitoring tools to send webhooks to AI-SRE:

# Alertmanager example (alertmanager.yml)
receivers:
  - name: ai-sre
    webhook_configs:
      - url: http://ai-sre.ai-sre.svc.cluster.local:8888/webhook
        http_config:
          headers:
            X-Source: alertmanager
            X-API-Key: customer-ops-key-001
        send_resolved: true

Step 5: Verify¶

# Health check
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/health

# Platform overview
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/platform/overview

# Seed demo data for validation
curl -X POST -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/demo/seed

# Check adapters are configured
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/adapters

Air-Gapped Installation¶

For environments with no internet access:

Prepare Offline Packages¶

On a connected machine:

# Clone the repo
git clone https://github.com/aabhat-ai/AI-SRE.git
cd AI-SRE

# Download Python wheels
pip download -d wheels/ -e ".[dev]"

# Download Docker images
docker pull python:3.12-slim
docker pull postgres:16-alpine
docker save python:3.12-slim postgres:16-alpine > images.tar

# Download Ollama model
ollama pull llama3.2
# Copy from ~/.ollama/models/ to a transfer medium

# Package everything
tar czf ai-sre-offline.tar.gz \
  AI-SRE/ wheels/ images.tar

Install Offline¶

On the air-gapped server:

tar xzf ai-sre-offline.tar.gz

# Load Docker images
docker load < images.tar

# Install Python packages from local wheels
cd AI-SRE
python -m venv .venv
source .venv/bin/activate
pip install --no-index --find-links=../wheels/ -e ".[dev]"

# Copy Ollama models to ~/.ollama/models/

Security Hardening¶

Network Policies¶

Restrict AI-SRE's network access in Kubernetes:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ai-sre-network-policy
  namespace: ai-sre
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ai-sre
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              purpose: monitoring
      ports:
        - port: 8888
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: ollama
      ports:
        - port: 11434
    - to:
        - podSelector: {}
      ports:
        - port: 6443  # K8s API

API Key Rotation¶

Rotate API keys regularly:

Add the new key to AI_SRE_API_KEYS alongside the old one
Update all clients to use the new key
Remove the old key from AI_SRE_API_KEYS
Restart the service

Audit Trail¶

All actions are logged in the timeline. Export audit data regularly:

curl -H "X-API-Key: admin-key" http://ai-sre.internal:8888/metrics/export

Operational Handoff Checklist¶

Use this checklist when handing off an on-prem deployment to the customer's operations team:

Upgrading On-Prem Installations¶

With Internet Access¶

cd /opt/ai-sre
git pull origin main

# Kubernetes
helm upgrade ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --reuse-values

# Bare metal
source .venv/bin/activate
pip install -e ".[dev]"
make db-migrate
sudo systemctl restart ai-sre

Air-Gapped¶

Prepare a new offline package on a connected machine (same process as initial install)
Transfer to the air-gapped environment
Apply the update:

# Kubernetes
docker load < images.tar
helm upgrade ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --reuse-values

# Bare metal
pip install --no-index --find-links=../wheels/ -e ".[dev]"
make db-migrate
sudo systemctl restart ai-sre