The fastest web crawler in the world. Cloaks around Cloudflare, bot detection, and every wall between your agents and the open web.
Open source, self-hostable, authenticated. If your agents need to see the web, they no longer need to ask.
docker run -p 8766:8766 kord/grubcrawler
The big AI labs built their moats partly by controlling what their agents can see. Their crawler infrastructure is proprietary, rate-limited, and closed. They decide what gets fetched, what gets indexed, and what gets surfaced to your agent.
Running your own crawler used to mean hitting Cloudflare walls, rotating proxies, headless browser overhead, JavaScript rendering, and authentication flows. Bot detection that stops most crawlers cold — and keeps most developers permanently dependent on centralized services.
GrubCrawler is the piece of that infrastructure you can now run yourself. Clean text extraction, link graphs, metadata, and structured output — authenticated and scoped to your agents. Pull the image and run it. Your agents see what you point them at.
Built for throughput. Concurrent fetching, connection pooling, and zero-overhead extraction. Benchmarks faster than every commercial alternative — and you can verify it yourself.
Fingerprint rotation, TLS cloaking, and behavioral mimicry get past Cloudflare, Akamai, and every major bot detection layer. The wall is not the limit.
Strips boilerplate, extracts main content, preserves link graphs and metadata. Agents receive structured, readable output — not raw HTML noise.
Authenticated access via N.U.T.S. OAuth. Scope crawler access per agent, per domain, per session. No anonymous free-for-all — controlled access you manage.
Single Docker container. Runs local, on Cloud Run, or on any VPS. The managed instance at grub.nuts.services is available now — or pull the image and host your own.
Full source on GitHub. Read what it does, fork it, extend it. No vendor lock-in on the thing that gives your agents eyes.