DeepBlue Dynamics / Docs / GrubCrawler

GrubCrawler

Open source web crawler for AI agents.

Two ways to run

Same image, two endpoints. Pick whichever fits your use case:

Mode Base URL Auth
Managed https://grub.nuts.services Bearer ahp_ token from auth.nuts.services
Self-hosted http://localhost:6792 DISABLE_AUTH=true by default — pass customer_id per-request

Self-hosted run:

docker run -p 6792:6792 deepbluedynamics/grubcrawler

For production self-host with auth, set DISABLE_AUTH=false and point GNOSIS_AUTH_URL at an auth.nuts.services instance.

What it is

GrubCrawler is a Playwright-backed web crawler built for AI agents. It extracts clean HTML, converts to markdown, batch-crawls URLs, solves Cloudflare and Incapsula challenges, rotates fingerprints, and simulates human behavior to get past bot detection. It runs as a single Docker container — self-hosted or at grub.nuts.services.

How it works

Request
   |
HTTP pre-check (fast, no browser)
   |
   └─ needs browser? ──yes──→ Playwright page load
                                    |
                               challenge? ──yes──→ challenge solver
                                    |                   (Cloudflare / Incapsula)
                               extract HTML
                                    |
                               convert to markdown
                                    |
                               Response

The HTTP pre-check avoids spinning up a browser for pages that don't need JavaScript. For simple HTML responses the pre-check returns directly.

When a page requires JavaScript rendering, GrubCrawler launches a Playwright browser with fingerprint spoofing, stealth patches, and human behavior simulation.

When Cloudflare or Incapsula challenge pages are detected, the challenge solver kicks in — either waiting for JS challenges to resolve or simulating mouse movement and scrolling to satisfy behavioral checks.

Full source on GitHub — open source, MIT licensed. The managed instance at grub.nuts.services is available now.

Dive deeper

Quick Start
Up and running fast
Docker, ports, environment variables.
API Reference
Every endpoint
Every endpoint, request/response examples.
Agent Mode
LLM-backed loop
LLM-backed agent loop, Ghost Protocol, tool chains.
Authentication
Tokens & sessions
N.U.T.S. tokens, DISABLE_AUTH, session scoping.