DeepBlue Dynamics / Docs / GrubCrawler

GrubCrawler

Open source web crawler for AI agents.

Two ways to run

Same image, two endpoints. Pick whichever fits your use case:

Mode	Base URL	Auth
Managed	https://grub.nuts.services	Bearer `ahp_` token from auth.nuts.services
Self-hosted	http://localhost:6792	`DISABLE_AUTH=true` by default — pass `customer_id` per-request

Self-hosted run:

docker run -p 6792:6792 deepbluedynamics/grubcrawler

For production self-host with auth, set DISABLE_AUTH=false and point GNOSIS_AUTH_URL at an auth.nuts.services instance.

What it is

GrubCrawler is a Playwright-backed web crawler built for AI agents. It extracts clean HTML, converts to markdown, batch-crawls URLs, solves Cloudflare and Incapsula challenges, rotates fingerprints, and simulates human behavior to get past bot detection. It runs as a single Docker container — self-hosted or at grub.nuts.services.

How it works

Request
   |
HTTP pre-check (fast, no browser)
   |
   └─ needs browser? ──yes──→ Playwright page load
                                    |
                               challenge? ──yes──→ challenge solver
                                    |                   (Cloudflare / Incapsula)
                               extract HTML
                                    |
                               convert to markdown
                                    |
                               Response

The HTTP pre-check avoids spinning up a browser for pages that don't need JavaScript. For simple HTML responses the pre-check returns directly.

When a page requires JavaScript rendering, GrubCrawler launches a Playwright browser with fingerprint spoofing, stealth patches, and human behavior simulation.

When Cloudflare or Incapsula challenge pages are detected, the challenge solver kicks in — either waiting for JS challenges to resolve or simulating mouse movement and scrolling to satisfy behavioral checks.

Full source on GitHub — open source, MIT licensed. The managed instance at grub.nuts.services is available now.

Dive deeper

Quick Start

Up and running fast

Docker, ports, environment variables.

API Reference

Every endpoint

Every endpoint, request/response examples.

Agent Mode

LLM-backed loop

LLM-backed agent loop, Ghost Protocol, tool chains.

Authentication

Tokens & sessions

N.U.T.S. tokens, DISABLE_AUTH, session scoping.