Kubernetes Orchestration & Auto-Scaling
Production-grade Kubernetes deployment with horizontal, vertical, and event-driven auto-scaling for both the control plane and agent runtime.
Cluster Architecture#
Lobstack runs on a dedicated Kubernetes cluster with strict namespace isolation. The control plane (API server) and agent runtime (individual AI agent pods) are separated into distinct namespaces with independent scaling policies, resource quotas, and security contexts.
lobstack-control-plane # Lobstack API (Next.js) — 3-20 replicas
lobstack-agents # Agent pods (gVisor sandbox) — 0-100+ pods
lobstack-vault # HashiCorp Vault HA cluster — 3-5 replicas
lobstack-monitoring # Prometheus, Falco, audit collection
lobstack-ingress # Istio ingress gateway
istio-system # Istio control plane (istiod)Pod Security Standards
Control Plane Deployment#
The Lobstack API runs as a multi-replica Deployment with anti-affinity rules to spread pods across availability zones. This ensures no single zone failure can take down the platform.
| Property | Value | Purpose |
|---|---|---|
| Min Replicas | 3 | Always-on availability across zones |
| Max Replicas | 20 (50 in production) | Handle traffic spikes |
| Strategy | RollingUpdate (maxSurge: 1, maxUnavailable: 0) | Zero-downtime deploys |
| PDB | minAvailable: 2 | Survive node drains and upgrades |
| Topology Spread | maxSkew: 1 per zone | Even distribution across AZs |
| Startup Probe | 5s interval, 12 failures | 60s grace period for cold starts |
| Security Context | runAsNonRoot, readOnlyRootFilesystem, drop ALL | Minimal attack surface |
resources:
requests:
cpu: 250m # Guaranteed baseline
memory: 512Mi
limits:
cpu: "1" # Burst up to 1 vCPU
memory: 1Gi
# Production override:
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 2GiAgent Runtime#
Each AI agent runs in its own isolated Kubernetes pod with a gVisor sandbox runtime. Agents are created dynamically when a user provisions a new agent and destroyed on teardown. The orchestrator manages the full lifecycle:
Dynamic Pod Creation
The K8s orchestrator creates a dedicated pod + ClusterIP service per agent with Vault-injected secrets.
gVisor Sandbox
Every agent pod runs with RuntimeClass: gvisor (runsc handler) — an application-level kernel that intercepts syscalls.
Resource Isolation
CPU/memory limits enforced per tier (starter: 1 vCPU/2GB → enterprise: 8 vCPU/16GB). ResourceQuota caps the namespace.
Network Isolation
NetworkPolicies prevent inter-agent communication. Each pod can only reach the Lobstack API and external AI APIs.
Ephemeral Workspace
Agent workspace is an emptyDir volume with size limits per tier (5GB → 50GB). Data is ephemeral to the pod lifecycle.
| Tier | CPU Request | CPU Limit | Memory | Workspace |
|---|---|---|---|---|
| Starter | 250m | 1 vCPU | 512Mi → 2Gi | 5 Gi |
| Standard | 500m | 2 vCPU | 1Gi → 4Gi | 10 Gi |
| Performance | 1 vCPU | 4 vCPU | 2Gi → 8Gi | 20 Gi |
| Enterprise | 2 vCPU | 8 vCPU | 4Gi → 16Gi | 50 Gi |
Auto-Scaling#
Lobstack uses three layers of auto-scaling to handle variable load efficiently — from steady-state traffic to sudden spikes in agent provisioning.
Horizontal Pod Autoscaler (HPA)#
The Lobstack API scales horizontally based on CPU utilization, memory utilization, and HTTP request rate.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
kind: Deployment
name: lobstack-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up at 70% CPU
- type: Resource
resource:
name: memory
target:
averageUtilization: 80 # Scale up at 80% memory
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
averageValue: "100" # Scale up at 100 RPS/podKEDA Event-Driven Scaling#
Agent pods are scaled by KEDA based on the number of pending provisioning requests in the database. When users request new agents, KEDA detects the queue depth and spins up pods proactively.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
scaleTargetRef:
name: agent-pool
minReplicaCount: 0 # Scale to zero when idle
maxReplicaCount: 100 # 200 in production
pollingInterval: 15 # Check every 15 seconds
cooldownPeriod: 120 # Wait 2 min before scaling down
triggers:
- type: postgresql
metadata:
query: "SELECT COUNT(*) FROM agent_instances WHERE status = 'provisioning'"
targetQueryValue: "1" # 1 pod per pending request
- type: cpu
metadata:
value: "75" # Also scale on CPU pressureVertical Pod Autoscaler (VPA)#
VPA right-sizes resource requests based on actual usage patterns. It monitors CPU and memory consumption over time and adjusts requests to eliminate waste while preventing OOM kills.
Cluster Autoscaler#
When pods can't be scheduled due to insufficient node capacity, the cluster autoscaler provisions new nodes from the cloud provider. It uses a least-waste expander strategy and scales down idle nodes after 5 minutes.
| Parameter | Value | Description |
|---|---|---|
| scale-down-unneeded-time | 5 minutes | How long a node must be idle before removal |
| scale-down-utilization-threshold | 0.5 | Nodes below 50% utilization are candidates for removal |
| max-node-provision-time | 10 minutes | Timeout for new node to become ready |
| balance-similar-node-groups | true | Even distribution across node pools |
Node Pool Architecture#
The cluster uses dedicated node pools for different workload types, ensuring agents don't compete for resources with the control plane.
| Node Pool | Server Type | Count | Purpose |
|---|---|---|---|
| Control Plane | cpx31 (4 vCPU, 8 GB) | 3 | K8s API server, etcd, scheduler |
| API Workers | cpx21 (3 vCPU, 4 GB) | 3+ | Lobstack API, dashboard serving |
| Agent Workers | cpx41 (8 vCPU, 16 GB) | 5+ | gVisor-enabled agent pods |
Terraform managed
infra/terraform/modules/k8s-cluster/. Changes to cluster size are made through terraform plan and terraform apply.