GrubCrawler
Open source web crawler for AI agents.
Two ways to run
Same image, two endpoints. Pick whichever fits your use case:
| Mode | Base URL | Auth |
|---|---|---|
| Managed | https://grub.nuts.services | Bearer ahp_ token from auth.nuts.services |
| Self-hosted | http://localhost:6792 | DISABLE_AUTH=true by default — pass customer_id per-request |
Self-hosted run:
docker run -p 6792:6792 deepbluedynamics/grubcrawler
For production self-host with auth, set DISABLE_AUTH=false and point GNOSIS_AUTH_URL at an auth.nuts.services instance.
What it is
GrubCrawler is a Playwright-backed web crawler built for AI agents. It extracts clean HTML, converts to markdown, batch-crawls URLs, solves Cloudflare and Incapsula challenges, rotates fingerprints, and simulates human behavior to get past bot detection. It runs as a single Docker container — self-hosted or at grub.nuts.services.
How it works
Request
|
HTTP pre-check (fast, no browser)
|
└─ needs browser? ──yes──→ Playwright page load
|
challenge? ──yes──→ challenge solver
| (Cloudflare / Incapsula)
extract HTML
|
convert to markdown
|
Response
The HTTP pre-check avoids spinning up a browser for pages that don't need JavaScript. For simple HTML responses the pre-check returns directly.
When a page requires JavaScript rendering, GrubCrawler launches a Playwright browser with fingerprint spoofing, stealth patches, and human behavior simulation.
When Cloudflare or Incapsula challenge pages are detected, the challenge solver kicks in — either waiting for JS challenges to resolve or simulating mouse movement and scrolling to satisfy behavioral checks.
Full source on GitHub — open source, MIT licensed. The managed instance at grub.nuts.services is available now.