Skip to main content
This guide covers a production (or evaluation) Nebula install on on-premises Kubernetes — bare-metal clusters, VMware Tanzu, OpenStack, or any CNCF-conformant cluster without a public cloud identity layer (no IRSA, no Workload Identity). Secrets are managed inline or via a private vault. Storage is local-path, Longhorn, or TopoLVM.

Prereqs

Cluster

  • Kubernetes 1.26+ (matches the chart’s kubeVersion minimum)
  • kubectl access with permission to create namespaces, Deployments, StatefulSets, PVCs, and Ingresses

Addons + controllers

ComponentPurposeNotes
ingress-nginxHTTP/HTTPS ingresskubernetes.github.io/ingress-nginx
cert-managerTLS from Let’s Encrypt or internal CAcert-manager.io/docs
Local storage provisionerPVCs for graph-engine, compactor, Postgres, RabbitMQlocal-path (k3s default), Longhorn, or TopoLVM
External Secrets Operator (optional)Sync from HashiCorp Vault or other backendOnly needed if you have a private secrets store
Storage class: the chart defaults to the cluster’s default storage class when storageClass.name is empty. For on-prem clusters that ship with local-path (k3s, RKE2), leave storageClass.name unset. For Longhorn or TopoLVM, set storageClass.name to the provisioner’s class name (e.g. longhorn or topolvm-provisioner). TLS: if your cluster is internal-only and you have a corporate CA, configure cert-manager with a ClusterIssuer backed by your CA’s private key. For clusters with internet access, use the standard Let’s Encrypt ACME ClusterIssuer.

Postgres

For evaluation, the chart ships a single-replica Postgres StatefulSet (postgres.mode: bundled). This is safe for testing but not for production — the bundled StatefulSet has no HA, no automated backup, and no streaming replication. For production, use an external PostgreSQL 16 server. If you have an empty external server and admin credentials, the bundle can create the Nebula/Hatchet roles, logical databases, required extensions, and chart credential Secrets:
./nebula-enterprise postgres provision \
  --namespace nebula \
  --admin-url "postgresql://postgres@pg.example.internal:5432/postgres?sslmode=require"
Set PGPASSWORD for the admin role. Pass --nebula-host / --hatchet-host only when application pods should connect through a hostname different from the --admin-url host. If your platform team provisions databases separately, mirror the same contract: distinct Nebula and Hatchet users, distinct logical databases, required extensions in the Nebula database, and a Hatchet database_url Secret that is already URL-encoded. pg_cron also requires shared_preload_libraries=pg_cron and cron.database_name set to the Nebula database name before bootstrap. Then run the read-only verifier before pointing postgres.mode: external at it:
./nebula-enterprise postgres verify \
  --namespace nebula \
  --admin-url "postgresql://postgres@pg.example.internal:5432/postgres?sslmode=require"

Install

1. Load images from the bundle

tar -xzf nebula-enterprise-<version>.tar.gz
cd nebula-enterprise-<version>/
sha256sum -c checksums.txt
docker load -i images.tar
For an air-gapped cluster with a private registry, retag and push to your internal registry:
REGISTRY=registry.corp.example.com

docker tag nebula:enterprise-<version>              "${REGISTRY}/nebula/nebula-runtime:<version>"
docker tag nebula-graph-engine:enterprise-<version> "${REGISTRY}/nebula/graph-engine:<version>"
docker tag nebula-postgres:enterprise-<version>     "${REGISTRY}/nebula/postgres:<version>"
docker push "${REGISTRY}/nebula/nebula-runtime:<version>"
docker push "${REGISTRY}/nebula/graph-engine:<version>"
docker push "${REGISTRY}/nebula/postgres:<version>"
For third-party images (Hatchet, Hatchet Postgres, RabbitMQ, busybox), push to the same registry:
docker tag ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.79.0 "${REGISTRY}/hatchet-engine:v0.79.0"
docker tag ghcr.io/hatchet-dev/hatchet/hatchet-admin:v0.79.0  "${REGISTRY}/hatchet-admin:v0.79.0"
docker tag ghcr.io/hatchet-dev/hatchet/hatchet-migrate:v0.79.0 "${REGISTRY}/hatchet-migrate:v0.79.0"
docker tag docker.io/library/postgres:16.6                         "${REGISTRY}/hatchet-postgres:16.6"
docker tag rabbitmq:3.13.7-management                           "${REGISTRY}/rabbitmq:3.13.7-management"
docker tag public.ecr.aws/docker/library/busybox:1.37.0       "${REGISTRY}/busybox:1.37.0"
docker push "${REGISTRY}/hatchet-engine:v0.79.0"
docker push "${REGISTRY}/hatchet-admin:v0.79.0"
docker push "${REGISTRY}/hatchet-migrate:v0.79.0"
docker push "${REGISTRY}/hatchet-postgres:16.6"
docker push "${REGISTRY}/rabbitmq:3.13.7-management"
docker push "${REGISTRY}/busybox:1.37.0"
Then set the mirrored repositories in your values file:
image:
  hatchetEngine:
    repository: hatchet-engine
  hatchetAdmin:
    repository: hatchet-admin
  hatchetMigrate:
    repository: hatchet-migrate
  hatchetPostgres:
    repository: hatchet-postgres
  rabbitmq:
    repository: rabbitmq
  busybox:
    repository: busybox

2. Provision secrets

Option A: inline Kubernetes Secrets (simplest, not recommended for production) Use secrets.backend: raw and put secret values directly in your values file:
secrets:
  backend: raw
  values:
    OPENAI_API_KEY: "sk-..."
    NEBULA_SECRET_KEY: "<random 32 bytes hex>"
    NEBULA_SERVICE_API_KEY: "<random 32 bytes hex>"
    NEBULA_WEBHOOK_HMAC_SECRET: "<random 32 bytes hex>"
    NEBULA_JWT_PRIVATE_KEY_PEM: |
      -----BEGIN PRIVATE KEY-----
      ...
      -----END PRIVATE KEY-----
    NEBULA_JWT_KID: "<stable per-deployment value>"
    NEBULA_JWT_RETIRED_PUBLIC_KEYS_JSON: "[]"
    NEBULA_INTERNAL_WAKE_TOKEN: "<random 32 bytes hex>"
    NEBULA_VECTOR_BUILD_HATCHET_TRIGGER_TOKEN: "<random 32 bytes hex>"
NEBULA_JWT_RETIRED_PUBLIC_KEYS_JSON can stay [] on a fresh install. Populate it only during JWT signing-key rotation; see Service authentication. Option B: HashiCorp Vault via ESO Install ESO, configure a ClusterSecretStore pointing at your Vault instance, then use secrets.backend: eso-vault:
secrets:
  backend: eso-vault
  esoVault:
    secretStoreRef:
      name: vault-backend
      kind: ClusterSecretStore
    vaultPath: secret/data/nebula
    refreshInterval: 5m

3. Copy + fill the reference values file

The bundle ships helm/examples/onprem/values.yaml with sensible on-prem defaults (bundled Postgres for evaluation, local-path storage, nginx ingress, raw secrets). Copy it, fill in the <placeholder> markers (domain name, object storage endpoint, LLM API base), and save as your-values.yaml. For a production on-prem install with external Postgres:
  • Set postgres.mode: external and fill in .host, .port, .database, and .credentialsSecret
  • Set hatchetPostgres.mode: external similarly
  • Remove or comment out the bundled Postgres persistence blocks
If you use ./nebula-enterprise postgres provision, set postgres.credentialsSecret: nebula-postgres-credentials and hatchetPostgres.credentialsSecret: hatchet-postgres-credentials unless you passed custom secret names to the helper.

4. Object storage

On-premises S3-compatible object storage options:
  • MinIO (recommended for simplicity): run MinIO alongside the cluster or as a StatefulSet inside it. Set objectStorage.endpoint: http://minio.minio.svc:9000, forcePathStyle: true, and store MinIO root credentials in objectStorage.credentialsSecret.
  • Ceph RGW: configure Ceph’s Rados Gateway. Set the RGW endpoint, region (or empty string), and HMAC credentials.
  • Cloudflare R2 / Wasabi: external but S3-compatible. Set the appropriate endpoint; forcePathStyle depends on the provider.

5. Install

helm install nebula ./helm/nebula-<version>.tgz \
  -n nebula --create-namespace \
  -f your-values.yaml
The chart runs schema migrations and catalog-apply automatically via a per-revision Job (<release>-nebula-migrations-<revision>); API and worker pods gate startup on an init container that polls public.nebula_release_contract for the install’s release row. releaseContract.releaseId and releaseContract.gitSha are stamped by bundle.sh and consumed automatically.

6. Verify

kubectl -n nebula get pods
kubectl -n nebula get ingress nebula
curl -fsS https://nebula.<your-domain>/v1/health

Upgrade

Pull the new bundle, load/push new images, then:
helm upgrade nebula ./helm/nebula-<new-version>.tgz \
  -n nebula \
  -f your-values.yaml

Sizing reference

WorkloadStarterWhen to scale
API2 replicas, 1 CPU / 2-4 GBHPA on CPU >70% sustained
Worker2 replicas, 2 CPU / 4-8 GBHPA on queue depth (Hatchet metric)
Graph engine2 replicas, 2 CPU / 4-8 GBManual; restart-sensitive (WAL replay)
Compactor1 replica, 1 CPU / 2-4 GBSingle-writer; do not scale horizontally
RabbitMQ1 replica, 8 GB PVCSingle-broker is fine up to ~10k workflows/min
For an evaluation single-node cluster, reducing to replicas: 1 on all workloads and using postgres.mode: bundled keeps the footprint under 16 GB RAM total. For production deploys the bundle ships a shared sizing overlay at helm/examples/_common/production-sizing.yaml (the same overlay used by EKS/AKS/GKE). Stack it before your on-prem values file to get production-shape replicas and resource requests:
helm install nebula ./helm/nebula-<version>.tgz \
  -n nebula --create-namespace \
  -f helm/examples/_common/production-sizing.yaml \
  -f your-values.yaml

Pod Security Admission

The Nebula-built workloads (api, worker, graph-engine, graph-engine-compactor, the migration Job, and the vLLM sub-chart Deployments) comply with the restricted Pod Security Standard out of the box: non-root user, dropped capabilities, seccompProfile: RuntimeDefault, no privilege escalation. The bundled third-party dependencies — postgres-statefulset (postgres.mode: bundled), hatchet-postgres-statefulset, hatchet-engine-deployment, hatchet-rabbitmq-statefulset — inherit their upstream images’ default security contexts and do not carry the restricted-required fields today. Labeling the release namespace as restricted before validating these pods can reject them at admission time. Recommended approach:
  1. For production deployments, swap postgres.mode: external and hatchetPostgres.mode: external so the bundled StatefulSets are not rendered at all. The chart’s external-mode path doesn’t ship the third-party deps; you bring your own (compliant) Postgres + Hatchet.
  2. If you need the bundled deps for evaluation, label the namespace at baseline rather than restricted (or use PSA’s warn / audit modes to surface the issues without blocking install).
  3. Only enable restricted enforcement after validating each bundled dep’s security context against your cluster’s policy.
# Evaluation-friendly: warn on violations, don't enforce.
kubectl label namespace nebula \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/audit=restricted
The chart deliberately does not label the namespace itself: Helm’s --create-namespace does not own pre-existing namespaces reliably, and adding namespace ownership to the chart conflicts with operators who manage namespaces separately (GitOps, vCluster, kiosk, etc.).

Prometheus metrics

Pods expose Prometheus-compatible /metrics endpoints and carry prometheus.io/scrape: "true" annotations for clusters that use annotation-based scrape discovery. For clusters running prometheus-operator / kube-prometheus-stack, enable native ServiceMonitor objects:
monitoring:
  serviceMonitor:
    enabled: true
    # Many operator installs key off a `release` label on ServiceMonitors;
    # set it to match your prometheus-operator's serviceMonitorSelector.
    additionalLabels:
      release: kube-prometheus-stack
Default off because ServiceMonitor is a monitoring.coreos.com/v1 CRD — rendering it on a cluster without prometheus-operator fails helm install with no matches for kind "ServiceMonitor".

Troubleshooting

Check that a storage class exists and is set as default: kubectl get storageclass. If using local-path, the provisioner must be running: kubectl -n local-path-storage get pods. Set storageClass.name in your values file to the exact class name if there is no cluster default.
On a fresh install with postgres.mode: bundled, the Postgres StatefulSet must be ready before the API Deployment. Check kubectl -n nebula get pods — the Postgres pod must be Running before API pods reach Ready. The chart renders a readiness probe on the API that retries for 5 minutes, which is usually enough for bundled Postgres to start. If the pod restarts before Postgres is ready, describe the pod for the specific connect error.
The Let’s Encrypt ACME HTTP-01 challenge requires the domain to be publicly reachable. For internal-only clusters, either use a DNS-01 challenge (configure cert-manager with DNS provider credentials) or provision certificates from a corporate CA ClusterIssuer. The ingress.tls.secretName in your values file must match the Certificate resource name cert-manager will populate.
The graph-engine replays its WAL on startup — duration scales with segment count and is expected. A single-node cluster that reboots may take 30-120 seconds per replica before graph-engine is fully Ready. Add initialDelaySeconds: 120 to the graph-engine readiness probe via workloads.graphEngine overrides if the default timeouts are too tight for your node restart time.