Prereqs
- Linux host with
docker(24+) anddocker compose(v2.30+, ideally v2.40+ for healthcheck and profile semantics) - ~50 GB free disk (graph-engine WAL + compactor scratch + Postgres + RabbitMQ)
- 16 GB RAM minimum, 32 GB recommended for production
- OpenAI, Anthropic, or Azure OpenAI credentials. For air-gapped deploys with no cloud LLM, use the
compose.vllm.yamloverlay (below)
enterprise/bootstrap.sh auto-detects which in-stack services to skip based on the *_HOST values in your .env.enterprise.
Install
vector, pg_partman, pg_cron), and applies the catalog contract. End-to-end first boot takes 2-4 minutes; subsequent re-runs are a no-op if everything is already healthy.
What runs in the stack
| Service | Role | Externalizable? |
|---|---|---|
nebula | API + ingest pipeline | — |
nebula-worker | Hatchet worker pool | — |
graph-engine | Rust graph + vector store | — |
postgres | Nebula application database | Yes → RDS |
hatchet-postgres | Hatchet workflow database | Yes → RDS (same instance, separate DB) |
minio + minio-init | S3-compatible object storage | Yes → S3 |
hatchet-engine + hatchet-dashboard | Hatchet control plane | No |
hatchet-rabbitmq | Hatchet task queue | No |
Air-gapped: local LLM with vLLM
For deployments with no internet egress, stack thecompose.vllm.yaml overlay on top of compose.enterprise.yaml by passing --overlay to bootstrap.sh:
bootstrap.sh (rather than calling docker compose -f compose.enterprise.yaml -f compose.vllm.yaml up -d directly) is required so the in-stack postgres, hatchet-postgres, and minio services activate via COMPOSE_PROFILES — those services are profile-gated and won’t start under a raw docker compose up.
The overlay runs vLLM for completions and TEI for embeddings, then flips every Nebula service to NEBULA_CONFIG_NAME=onprem_local. Completion calls route to http://vllm-instruction:8000/v1; embedding calls route to http://tei-embedding:8000/v1. No OPENAI_API_KEY required.
GPU prereqs: NVIDIA Container Toolkit installed on the host; GPU capacity for the default Qwen3.5 completion model plus Qwen3 embedding model.
Upgrade
Stopping the stack
COMPOSE_PROJECT_NAME from env/.env.enterprise (default
nebula_enterprise). Use a unique project name for separate deployments on
the same host; keep it stable when upgrading a deployment that should reuse
existing data. If an older bundle created unprefixed volumes such as
postgres_data, bootstrap.sh fails before starting services. Remove those
volumes for a fresh deployment, or manually copy them into the matching
project-prefixed volumes only after verifying the source deployment and secrets
belong to this upgrade:
postgres:16.6 is included in images.tar; if you are running from a source
checkout instead of a bundle, pull it first. Include any other legacy volumes
that bootstrap reports.
Troubleshooting
bootstrap.sh fails on 'NEBULA_POSTGRES_PASSWORD must be set'
bootstrap.sh fails on 'NEBULA_POSTGRES_PASSWORD must be set'
enterprise/generate-secrets.sh didn’t run, or env/.env.enterprise is missing the secrets section. Re-run ./enterprise/generate-secrets.sh ./env/.env.enterprise and try again. The script refuses to overwrite an existing file, so delete it first if you intend to regenerate (warning: this rotates every secret).catalog-bootstrap fails with 'pgvector extension not available'
catalog-bootstrap fails with 'pgvector extension not available'
The bundled Postgres image carries
vector, pg_partman, and pg_cron, so this only happens if you’ve pointed at an external Postgres missing required extensions. On RDS/Aurora, make all three extensions available to the master user; if your parameter group restricts extension installs with rds.allowed_extensions, include them there. Set shared_preload_libraries=pg_cron and cron.database_name to the Nebula database name before bootstrap.bootstrap exits with 'No valid client credentials found'
bootstrap exits with 'No valid client credentials found'
Select a config that matches your provider credentials:
full_openai with OPENAI_API_KEY, full_anthropic with ANTHROPIC_API_KEY, or full_azure with AZURE_API_KEY and AZURE_API_BASE.login succeeds but token issuance fails with 'no RS256 signing key configured'
login succeeds but token issuance fails with 'no RS256 signing key configured'
The JWT signing key is missing from the deployment. Re-run
./enterprise/generate-secrets.sh ./env/.env.enterprise on a clean env directory, or restore the secrets/enterprise-jwt.pem file and NEBULA_JWT_KID value from your previous deployment. See Service authentication before rotating JWT keys.graph-engine fails to start, logs show 'S3 access denied'
graph-engine fails to start, logs show 'S3 access denied'
For in-stack MinIO: check that
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in .env.enterprise match MINIO_ROOT_USER and MINIO_ROOT_PASSWORD. generate-secrets.sh populates all four to the same value; if you edited any of them manually, restore the match.Hatchet dashboard loads but shows 'connection refused'
Hatchet dashboard loads but shows 'connection refused'
The Hatchet engine takes 30-60s to fully start after migrations. Wait for
docker compose -f compose.enterprise.yaml logs -f hatchet-engine to show successfully booted, then refresh.