Docs/Security & Infrastructure/Kubernetes & Scaling
Infrastructure

Kubernetes Orchestration & Auto-Scaling

Production-grade Kubernetes deployment with horizontal, vertical, and event-driven auto-scaling for both the control plane and agent runtime.

Cluster Architecture#

Lobstack runs on a dedicated Kubernetes cluster with strict namespace isolation. The control plane (API server) and agent runtime (individual AI agent pods) are separated into distinct namespaces with independent scaling policies, resource quotas, and security contexts.

Namespace Layout
lobstack-control-plane    # Lobstack API (Next.js) — 3-20 replicas
lobstack-agents           # Agent pods (gVisor sandbox) — 0-100+ pods
lobstack-vault            # HashiCorp Vault HA cluster — 3-5 replicas
lobstack-monitoring       # Prometheus, Falco, audit collection
lobstack-ingress          # Istio ingress gateway
istio-system              # Istio control plane (istiod)
💡

Pod Security Standards

All Lobstack namespaces enforce the restricted Pod Security Standard — the most restrictive level. This requires non-root users, dropped capabilities, seccomp profiles, and read-only root filesystems.

Control Plane Deployment#

The Lobstack API runs as a multi-replica Deployment with anti-affinity rules to spread pods across availability zones. This ensures no single zone failure can take down the platform.

PropertyValuePurpose
Min Replicas3Always-on availability across zones
Max Replicas20 (50 in production)Handle traffic spikes
StrategyRollingUpdate (maxSurge: 1, maxUnavailable: 0)Zero-downtime deploys
PDBminAvailable: 2Survive node drains and upgrades
Topology SpreadmaxSkew: 1 per zoneEven distribution across AZs
Startup Probe5s interval, 12 failures60s grace period for cold starts
Security ContextrunAsNonRoot, readOnlyRootFilesystem, drop ALLMinimal attack surface
Control Plane Resource Allocation
resources:
  requests:
    cpu: 250m        # Guaranteed baseline
    memory: 512Mi
  limits:
    cpu: "1"         # Burst up to 1 vCPU
    memory: 1Gi

# Production override:
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: "2"
    memory: 2Gi

Agent Runtime#

Each AI agent runs in its own isolated Kubernetes pod with a gVisor sandbox runtime. Agents are created dynamically when a user provisions a new agent and destroyed on teardown. The orchestrator manages the full lifecycle:

🚀

Dynamic Pod Creation

The K8s orchestrator creates a dedicated pod + ClusterIP service per agent with Vault-injected secrets.

🛡️

gVisor Sandbox

Every agent pod runs with RuntimeClass: gvisor (runsc handler) — an application-level kernel that intercepts syscalls.

📦

Resource Isolation

CPU/memory limits enforced per tier (starter: 1 vCPU/2GB → enterprise: 8 vCPU/16GB). ResourceQuota caps the namespace.

🔒

Network Isolation

NetworkPolicies prevent inter-agent communication. Each pod can only reach the Lobstack API and external AI APIs.

💾

Ephemeral Workspace

Agent workspace is an emptyDir volume with size limits per tier (5GB → 50GB). Data is ephemeral to the pod lifecycle.

TierCPU RequestCPU LimitMemoryWorkspace
Starter250m1 vCPU512Mi → 2Gi5 Gi
Standard500m2 vCPU1Gi → 4Gi10 Gi
Performance1 vCPU4 vCPU2Gi → 8Gi20 Gi
Enterprise2 vCPU8 vCPU4Gi → 16Gi50 Gi

Auto-Scaling#

Lobstack uses three layers of auto-scaling to handle variable load efficiently — from steady-state traffic to sudden spikes in agent provisioning.

Horizontal Pod Autoscaler (HPA)#

The Lobstack API scales horizontally based on CPU utilization, memory utilization, and HTTP request rate.

HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    kind: Deployment
    name: lobstack-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70    # Scale up at 70% CPU
    - type: Resource
      resource:
        name: memory
        target:
          averageUtilization: 80    # Scale up at 80% memory
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          averageValue: "100"       # Scale up at 100 RPS/pod

KEDA Event-Driven Scaling#

Agent pods are scaled by KEDA based on the number of pending provisioning requests in the database. When users request new agents, KEDA detects the queue depth and spins up pods proactively.

KEDA ScaledObject
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef:
    name: agent-pool
  minReplicaCount: 0       # Scale to zero when idle
  maxReplicaCount: 100     # 200 in production
  pollingInterval: 15      # Check every 15 seconds
  cooldownPeriod: 120      # Wait 2 min before scaling down
  triggers:
    - type: postgresql
      metadata:
        query: "SELECT COUNT(*) FROM agent_instances WHERE status = 'provisioning'"
        targetQueryValue: "1"    # 1 pod per pending request
    - type: cpu
      metadata:
        value: "75"              # Also scale on CPU pressure

Vertical Pod Autoscaler (VPA)#

VPA right-sizes resource requests based on actual usage patterns. It monitors CPU and memory consumption over time and adjusts requests to eliminate waste while preventing OOM kills.

Cluster Autoscaler#

When pods can't be scheduled due to insufficient node capacity, the cluster autoscaler provisions new nodes from the cloud provider. It uses a least-waste expander strategy and scales down idle nodes after 5 minutes.

ParameterValueDescription
scale-down-unneeded-time5 minutesHow long a node must be idle before removal
scale-down-utilization-threshold0.5Nodes below 50% utilization are candidates for removal
max-node-provision-time10 minutesTimeout for new node to become ready
balance-similar-node-groupstrueEven distribution across node pools

Node Pool Architecture#

The cluster uses dedicated node pools for different workload types, ensuring agents don't compete for resources with the control plane.

Node PoolServer TypeCountPurpose
Control Planecpx31 (4 vCPU, 8 GB)3K8s API server, etcd, scheduler
API Workerscpx21 (3 vCPU, 4 GB)3+Lobstack API, dashboard serving
Agent Workerscpx41 (8 vCPU, 16 GB)5+gVisor-enabled agent pods

Terraform managed

All node pools are provisioned via Terraform modules at infra/terraform/modules/k8s-cluster/. Changes to cluster size are made through terraform plan and terraform apply.