Quick Start
Managed service
The fastest way to use GrubCrawler is the managed instance. No Docker required.
https://grub.nuts.services
Get an API token from nuts.services and pass it as a Bearer token.
Self-hosted
Pull and run the Docker image. No build step required.
docker run -p 8766:6792 \
-e DISABLE_AUTH=true \
deepbluedynamics/grubcrawler
Port 8766 maps to :6792 inside the container. Set DISABLE_AUTH=true for local use — tokens are not required.
Persist nothing by default — GrubCrawler is stateless. Pass a volume to enable local storage:
docker run -p 8766:6792 \
-e DISABLE_AUTH=true \
-e RUNNING_IN_CLOUD=false \
-e STORAGE_PATH=/data \
-v grub-data:/data \
deepbluedynamics/grubcrawler
Key environment variables
Server
| Variable | Default | Purpose |
|---|---|---|
| PORT | 6792 | Listening port inside container |
| DEBUG | false | Verbose logging |
Storage
| Variable | Default | Purpose |
|---|---|---|
| STORAGE_PATH | ./storage | Local storage directory |
| RUNNING_IN_CLOUD | false | Use GCS instead of local disk |
| GCS_BUCKET_NAME | — | GCS bucket (cloud mode only) |
Authentication
| Variable | Default | Purpose |
|---|---|---|
| DISABLE_AUTH | false | Skip all token validation ⚠️ |
| GNOSIS_AUTH_URL | https://auth.nuts.services | N.U.T.S. auth endpoint |
Crawling
| Variable | Default | Purpose |
|---|---|---|
| MAX_CONCURRENT_CRAWLS | 5 | Parallel crawl limit |
| CRAWL_TIMEOUT | 30 | Seconds before timeout |
| BROWSER_HEADLESS | true | Run browser without display |
| ENABLE_JAVASCRIPT | true | Use Playwright for JS pages |
Verify
curl http://localhost:8766/health
# {"status":"ok","version":"..."}