Skip to content

On-Premises Deployment

Deploy AI-SRE in a customer's own data center or private cloud environment. This guide covers air-gapped installations, self-hosted LLM configuration, network requirements, and operational handoff.


Overview

On-premises deployment is designed for organizations that cannot send data to external cloud services. AI-SRE supports fully air-gapped operation with self-hosted LLM inference (via Ollama), on-premise log backends, and local Kubernetes clusters.

graph TB
    subgraph Customer["Customer Data Center"]
        subgraph K8S["Kubernetes Cluster"]
            AISRE[AI-SRE<br/>Helm Chart]
            PG[(PostgreSQL)]
            AISRE --> PG
        end

        subgraph Infra["Infrastructure"]
            OLL[Ollama Server<br/>Self-hosted LLM]
            LOKI[Grafana Loki<br/>Log Backend]
            PROM[Prometheus<br/>Monitoring]
        end

        subgraph Monitoring["Alert Sources"]
            AM[Alertmanager]
            DD[Datadog Agent]
            GF[Grafana]
        end

        AISRE --> OLL
        AISRE --> LOKI
        PROM --> AISRE
        Monitoring -->|webhooks| AISRE
    end

    style Customer fill:#1a1a2e,color:#fff

Deployment Profile

Set the deployment profile to on_prem to enable on-premise-specific behavior:

DEPLOYMENT_PROFILE=on_prem

This setting:

  • Disables features that require outbound internet access (unless explicitly configured)
  • Adjusts default safety posture for enterprise environments
  • Enables local-first storage and caching strategies

Prerequisites

Hardware Requirements

Component Minimum Recommended
AI-SRE server 2 vCPU, 2 GB RAM 4 vCPU, 4 GB RAM
PostgreSQL 2 vCPU, 4 GB RAM, 50 GB SSD 4 vCPU, 8 GB RAM, 100 GB SSD
Ollama (self-hosted LLM) 4 vCPU, 16 GB RAM 8 vCPU, 32 GB RAM, GPU recommended
Storage 20 GB for application 100 GB for logs and database growth

Software Requirements

Software Version Notes
Kubernetes 1.27+ Or Docker for single-node deployment
Helm 3.12+ For Kubernetes deployment
PostgreSQL 14+ Required for production (SQLite not recommended for on-prem)
Ollama Latest For self-hosted LLM inference

Network Requirements

AI-SRE needs the following network connectivity within the customer environment:

Source Destination Port Protocol Purpose
AI-SRE PostgreSQL 5432 TCP Database
AI-SRE Ollama 11434 TCP LLM inference
AI-SRE Kubernetes API 6443 TCP Cluster inspection and remediation
AI-SRE Loki / Elasticsearch 3100 / 9200 TCP Log fetching
Alert sources AI-SRE 8888 TCP Webhook ingestion
AI-SRE Slack / Teams (optional) 443 TCP Notifications (requires outbound internet)

Air-gapped operation

For fully air-gapped environments, AI-SRE requires no outbound internet access when using Ollama for LLM inference and an on-premise log backend. Slack and Teams notifications require outbound HTTPS -- use email (SMTP to an internal relay) for air-gapped notification delivery.


Self-Hosted LLM with Ollama

Install Ollama

On the LLM server:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model (requires internet during setup)
ollama pull llama3.2

# For air-gapped: transfer the model files manually
# Models are stored in ~/.ollama/models/

Configure AI-SRE

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama-server.internal:11434
OLLAMA_MODEL=llama3.2

Model Recommendations

Model Parameters RAM Required Quality Speed
llama3.2 3B 4 GB Good for basic diagnosis Fast
llama3.1 8B 8 GB Better reasoning Moderate
llama3.1:70b 70B 48 GB Best quality Slow (GPU recommended)
codellama 7B 8 GB Good for config analysis Moderate
mixtral 8x7B 32 GB Strong reasoning Moderate

GPU acceleration

For production on-prem deployments, a GPU (NVIDIA with CUDA support) significantly improves LLM inference speed. Ollama automatically uses GPU when available.


Installation Steps

Step 1: Prepare the Environment

# On the deployment server
mkdir -p /opt/ai-sre
cd /opt/ai-sre

# For air-gapped: transfer the repository tarball
# On a connected machine: git archive --format=tar.gz HEAD > ai-sre.tar.gz
# Transfer ai-sre.tar.gz to the server
tar xzf ai-sre.tar.gz

Step 2: Install Dependencies

# Create namespace
kubectl create namespace ai-sre

# Create secrets
kubectl create secret generic ai-sre-secrets \
  --namespace ai-sre \
  --from-literal=DATABASE_URL="postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre"

# Install via Helm
helm install ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --set existingSecret=ai-sre-secrets \
  --set config.DEPLOYMENT_PROFILE=on_prem \
  --set config.LLM_PROVIDER=ollama \
  --set config.OLLAMA_BASE_URL=http://ollama-server.internal:11434 \
  --set config.OLLAMA_MODEL=llama3.2 \
  --set config.LOG_PROVIDER=loki \
  --set config.LOKI_URL=http://loki.internal:3100 \
  --set config.DEFAULT_DRY_RUN=true \
  --set config.APPROVAL_REQUIRED=true
cp .env.example .env
# Edit .env with on-prem configuration (see below)

docker compose -f deploy/docker-compose.yml up -d
python -m venv .venv
source .venv/bin/activate

# For air-gapped: install from a local wheel cache
pip install --no-index --find-links=/opt/ai-sre/wheels -e ".[dev]"

# Or with internet access:
pip install -e ".[dev]"

cp .env.example .env
# Edit .env with on-prem configuration
make run

Step 3: Configure for On-Prem

Create or edit .env with the on-premise configuration:

# Deployment profile
DEPLOYMENT_PROFILE=on_prem
WORKSPACE_ID=customer-prod
WORKSPACE_NAME="Customer Production"

# Self-hosted LLM
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://ollama-server.internal:11434
OLLAMA_MODEL=llama3.2

# Production database
DATABASE_URL=postgresql+asyncpg://ai_sre:password@postgres.internal:5432/ai_sre

# On-premise log backend
LOG_PROVIDER=loki
LOKI_URL=http://loki.internal:3100

# Kubernetes (in-cluster or explicit kubeconfig)
ACTION_PROVIDER=kubernetes
ALLOWED_NAMESPACES=production,staging
MAX_SCALE_REPLICAS=10

# Safety (conservative defaults for customer environments)
DEFAULT_DRY_RUN=true
APPROVAL_REQUIRED=true
AUTONOMY_ENABLED=false
AUTONOMOUS_ACTIONS=restart_pod

# API security
AI_SRE_API_KEYS=customer-admin-key-001,customer-ops-key-001
AI_SRE_RATE_LIMIT=120
AI_SRE_CORS_ORIGINS=https://internal-console.customer.com

# Notifications (internal SMTP relay for air-gapped)
SMTP_HOST=smtp-relay.internal
SMTP_PORT=25
SMTP_USER=
SMTP_PASSWORD=
ALERT_EMAIL_TO=oncall@customer.com

Step 4: Connect Alert Sources

Configure the customer's monitoring tools to send webhooks to AI-SRE:

# Alertmanager example (alertmanager.yml)
receivers:
  - name: ai-sre
    webhook_configs:
      - url: http://ai-sre.ai-sre.svc.cluster.local:8888/webhook
        http_config:
          headers:
            X-Source: alertmanager
            X-API-Key: customer-ops-key-001
        send_resolved: true

Step 5: Verify

# Health check
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/health

# Platform overview
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/platform/overview

# Seed demo data for validation
curl -X POST -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/demo/seed

# Check adapters are configured
curl -H "X-API-Key: customer-admin-key-001" http://ai-sre.internal:8888/adapters

Air-Gapped Installation

For environments with no internet access:

Prepare Offline Packages

On a connected machine:

# Clone the repo
git clone https://github.com/aabhat-ai/AI-SRE.git
cd AI-SRE

# Download Python wheels
pip download -d wheels/ -e ".[dev]"

# Download Docker images
docker pull python:3.12-slim
docker pull postgres:16-alpine
docker save python:3.12-slim postgres:16-alpine > images.tar

# Download Ollama model
ollama pull llama3.2
# Copy from ~/.ollama/models/ to a transfer medium

# Package everything
tar czf ai-sre-offline.tar.gz \
  AI-SRE/ wheels/ images.tar

Install Offline

On the air-gapped server:

tar xzf ai-sre-offline.tar.gz

# Load Docker images
docker load < images.tar

# Install Python packages from local wheels
cd AI-SRE
python -m venv .venv
source .venv/bin/activate
pip install --no-index --find-links=../wheels/ -e ".[dev]"

# Copy Ollama models to ~/.ollama/models/

Security Hardening

Network Policies

Restrict AI-SRE's network access in Kubernetes:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ai-sre-network-policy
  namespace: ai-sre
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ai-sre
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              purpose: monitoring
      ports:
        - port: 8888
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: ollama
      ports:
        - port: 11434
    - to:
        - podSelector: {}
      ports:
        - port: 6443  # K8s API

API Key Rotation

Rotate API keys regularly:

  1. Add the new key to AI_SRE_API_KEYS alongside the old one
  2. Update all clients to use the new key
  3. Remove the old key from AI_SRE_API_KEYS
  4. Restart the service

Audit Trail

All actions are logged in the timeline. Export audit data regularly:

curl -H "X-API-Key: admin-key" http://ai-sre.internal:8888/metrics/export

Operational Handoff Checklist

Use this checklist when handing off an on-prem deployment to the customer's operations team:

  • AI-SRE server running and healthy (GET /health)
  • PostgreSQL database accessible and backed up
  • Ollama server running with the required model loaded
  • Alert sources configured and sending webhooks
  • API keys configured and distributed to operators
  • RBAC roles assigned (admin, operator, viewer)
  • Notification channels configured (email, Slack, or Teams)
  • Namespace allowlist configured (ALLOWED_NAMESPACES)
  • Safety defaults verified (DEFAULT_DRY_RUN=true, APPROVAL_REQUIRED=true)
  • Prometheus scraping /metrics endpoint
  • Backup schedule configured for PostgreSQL
  • Runbook: How to restart AI-SRE
  • Runbook: How to rotate API keys
  • Runbook: How to upgrade AI-SRE
  • Runbook: How to add new alert sources
  • Escalation contacts documented

Upgrading On-Prem Installations

With Internet Access

cd /opt/ai-sre
git pull origin main

# Kubernetes
helm upgrade ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --reuse-values

# Bare metal
source .venv/bin/activate
pip install -e ".[dev]"
make db-migrate
sudo systemctl restart ai-sre

Air-Gapped

  1. Prepare a new offline package on a connected machine (same process as initial install)
  2. Transfer to the air-gapped environment
  3. Apply the update:
# Kubernetes
docker load < images.tar
helm upgrade ai-sre deploy/helm/ai-sre \
  --namespace ai-sre \
  --reuse-values

# Bare metal
pip install --no-index --find-links=../wheels/ -e ".[dev]"
make db-migrate
sudo systemctl restart ai-sre