the arsenal

55 tools

the non-browser scraping and automation tooling, grouped by category: http clients, frameworks, ai scrapers, managed apis, parsers, captcha solvers, interception tools, distributed runners, browser infra, proxy tooling, patches and tls cyclers. read the presents line first, it is the forensic hook: what fingerprint each tool shows a detector, and what gives it away. the browser-engine subset lives in the browser reference catalog, joined here by name where a tool is also a driven browser. star counts are dated and drift weekly; vendor success claims stay flagged speculative.

http clients (no browser) · 8

curl_cffi

python

stars: stars not tracked
confidence: confirmed

python bindings over curl-impersonate: borrows a real chrome tls/ja4+ and http-2 fingerprint at the wire, the fastest way to look like a browser without one.

presents

genuine chrome ja4+ tls and h2 frame order, but no js runtime, so any detector that runs one line of javascript sees an empty answer, the structural fail.

detectable by

joins

curl_cffi / curl-impersonate → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

httpx

python

stars: 14,000★ · as-of 2026-06
confidence: confirmed

an async python http client with http-2 support; the modern requests successor, popular for high-throughput scraping.

presents

a default python tls ja4 (not a browser's) even with h2 enabled, so the ua claims chrome but the handshake says python, a cross-band lie.

detectable by

requests

python

stars: 52,000★ · as-of 2026-06
confidence: confirmed

the python baseline http client: synchronous, http-1.1 only, no browser surfaces. the thing every tls gate is tuned to catch first.

presents

default python/openssl tls ja4, no http-2, no js, a trivially flagged non-browser fingerprint, the canonical control for what naive scraping looks like.

detectable by

tls-client

gopython

stars: stars not tracked
confidence: confirmed

a go tls library with python bindings that lets you select a target browser's ja3/ja4 profile per request, built for cycling tls fingerprints.

presents

a chosen browser's ja3/ja4 on the handshake, but still no js runtime, and a mismatched h2 frame order can betray the impersonation even when the cipher list matches.

detectable by

scrapling

python

stars: 5,000★ · as-of 2026-06
confidence: confirmed

a dual http/browser python library with adaptive selectors that survive layout changes; the http path is stealth-tuned and it can drive a browser for turnstile solving.

presents

on the http path, an impersonated browser tls with no js; on the browser path, a driven chromium posture, so what it presents depends on the mode you pick.

detectable by

webclaw

rust

stars: stars not tracked
confidence: speculative

a rust http client positioned for high-performance scraping with browser-impersonation tls.

presents

an impersonated browser tls from a rust stack with no js runtime; the rust http library's own tls quirks can leak a non-browser handshake under close inspection.

detectable by

got-scraping

node

stars: 500★ · as-of 2026-06
confidence: confirmed

a node http client (the got wrapper used by crawlee) that generates browser-like header sets and tls fingerprints for scraping.

presents

browser-shaped headers and a generated tls fingerprint, but a node runtime, no js execution of the page, and header ORDER that can still diverge from a real browser.

detectable by

cycleTLS

nodego

stars: 1,000★ · as-of 2026-06
confidence: confirmed

a node library backed by a go process that cycles ja3 tls fingerprints per request; doubles as a tls-fingerprint tool (see tls-tool).

presents

a rotating ja3 per request to dodge tls-reputation gates, but no js runtime and a go-backed handshake whose http-2 details can still read as non-browser.

detectable by

scraping frameworks · 9

scrapy

python

stars: 52,000★ · as-of 2026-06
confidence: confirmed

the canonical python scraping framework: an async engine with middlewares, item pipelines and built-in throttling. the http core is non-browser by default.

presents

by default a twisted/python tls fingerprint with no js, so the framework itself is detectable on transport unless paired with an impersonation downloader or a browser engine bridge.

detectable by

crawlee

nodepython

stars: 15,000★ · as-of 2026-06
confidence: confirmed

apify's dual node/python crawling framework: a unified api over plain-http and headless-browser crawlers with auto-scaling and session pools.

presents

the posture of whichever crawler you select, http (got-scraping fingerprint) or browser (playwright/puppeteer), so its tell is the underlying engine, not the framework.

detectable by

colly

go

stars: 24,000★ · as-of 2026-06
confidence: confirmed

the popular go scraping framework: fast, concurrent, callback-driven. pure http, no browser.

presents

a go net/http tls fingerprint with no js runtime, distinct from any browser handshake, so a chrome ua over colly is a cross-band lie.

detectable by

scrapy-poet

python

stars: 300★ · as-of 2026-06
confidence: confirmed

a scrapy extension bringing dependency-injection and page-object models, decoupling extraction logic from the spider; a zyte project.

presents

inherits scrapy's transport posture exactly, no browser, no js; it changes code structure, not the wire fingerprint, so it presents whatever the scrapy downloader does.

detectable by

no-js-runtime

scrapy-camoufox

python

stars: stars not tracked
confidence: speculative

a scrapy engine bridge that routes selected requests through camoufox, a genuine anti-detect firefox, per-request engine switching.

presents

for bridged requests, a genuine gecko transport and js surface (camoufox), the thing chromium tools cannot fake; un-bridged requests fall back to scrapy's bare http.

detectable by

bidi-attach-residue

joins

Camoufox (anti-detect Firefox) → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

scrapy-nodriver

python

stars: stars not tracked
confidence: speculative

a scrapy engine bridge routing selected requests through nodriver's minimal raw-cdp driver for the requests that need a real chrome.

presents

for bridged requests, nodriver's minimal-cdp chrome posture (very little framework residue); bare scrapy http otherwise, so coverage depends on which requests you bridge.

detectable by

navigator.webdriver

joins

nodriver (minimal raw CDP) → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

scrapy-stealth

python

stars: stars not tracked
confidence: speculative

a scrapy middleware layer that switches the underlying engine per request to apply stealth selectively rather than across the whole crawl.

presents

varies by the engine it selects per request; the stealth is only as good as the chosen downloader, so on un-bridged requests it still presents plain scrapy http.

detectable by

no-js-runtime

katana

go

stars: 13,000★ · as-of 2026-06
confidence: confirmed

projectdiscovery's fast go crawler / spider, built for security recon: link discovery, js-aware parsing, headless option.

presents

a go http tls fingerprint in default mode (no browser), distinct from any browser handshake; the optional headless mode shifts it to a chromium posture with cdp tells.

detectable by

rod

go

stars: 5,500★ · as-of 2026-06
confidence: confirmed

a go devtools-protocol (cdp) driver for chrome, a high-level browser-automation library in the puppeteer/playwright family.

presents

a chromium-over-cdp posture from go, so it carries the raw-cdp tells (navigator.webdriver, cdp residue) unless paired with stealth patches.

detectable by

ai / llm scrapers · 5

firecrawl

apinode

stars: 111,000★ · as-of 2026-06
confidence: confirmed

a hosted (and self-hostable) api that crawls a site and returns clean llm-ready markdown; popular as an agent retrieval backend.

presents

from the target's view, the fingerprint of firecrawl's own fetcher fleet (browser or http per page), so the tell is the service's exit ips and engine, not your code.

detectable by

crawl4ai

python

stars: 60,000★ · as-of 2026-06
confidence: confirmed

an open-source python crawler that drives a local playwright chromium and emits llm-ready markdown / structured output; self-hosted, no api key.

presents

a local playwright chromium posture, so it inherits raw-cdp tells (navigator.webdriver, cdp residue) unless layered with stealth patches.

detectable by

joins

Playwright (CDP launch) → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

scrapegraphai

python

stars: 18,000★ · as-of 2026-06
confidence: confirmed

a python library that builds extraction pipelines as graphs from a natural-language prompt, wiring an llm to a scraper backend.

presents

the posture of whichever backend fetcher it drives (http or playwright); the llm shapes extraction, not the wire fingerprint, so the tell is the underlying fetch.

detectable by

jina reader

api

stars: stars not tracked
confidence: confirmed

a hosted endpoint: prefix any url with r.jina.ai/ and get back clean markdown for llm consumption, no setup.

presents

jina's own fetcher fingerprint and exit ips to the target, not yours; the target sees jina datacenter traffic, your own posture is invisible to it.

detectable by

datacenter-asn

steel

apinode

stars: 5,000★ · as-of 2026-06
confidence: confirmed

a self-hostable browser api for ai agents: managed sessions, an mcp server, and a hosted cloud option over real chromium.

presents

a managed chromium session posture; self-hosted you carry the cdp tells, on the cloud you present steel's fleet fingerprint and exit ips to the target.

detectable by

managed apis · 10

bright data

api

stars: stars not tracked
confidence: speculative

the broadest managed-scraping vendor: a web-unlocker / scraping-browser api on top of the largest proxy estate (datacenter / isp / residential / mobile).

presents

to the target, residential or mobile exit ips with real-browser fingerprints from its unblocker fleet; the tell is the provider's pool reputation, not a static fingerprint.

detectable by

joins

bright-data → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

oxylabs

api

stars: stars not tracked
confidence: speculative

an enterprise scraping-api + proxy vendor with a large residential pool and an ai-assisted scraper (oxycopilot).

presents

residential / mobile exit ips and managed browser fingerprints from its scraper-api fleet; pool reputation and exit-ip class are the corroborating signals.

detectable by

joins

oxylabs → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

zyte

api

stars: stars not tracked
confidence: speculative

the scrapy company's smart-proxy-manager and autoextract api: automatic unblocking plus llm-style structured extraction.

presents

managed exit ips with auto-rotated browser fingerprints; its published ~93% success figure is vendor marketing (speculative), corroborate before quoting.

detectable by

joins

zyte → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

apify

apinode

stars: stars not tracked
confidence: confirmed

a scraping cloud / actor marketplace: 10k+ prebuilt actors, a proxy layer, and an mcp server for agent use.

presents

the fingerprint of the specific actor you run on apify's infra plus its proxy exit ips; posture varies per actor (http vs crawlee browser).

detectable by

scrapingbee

api

stars: stars not tracked
confidence: speculative

a simple rest scraping api with js rendering and proxy rotation built in; a free tier and a low-friction onboarding.

presents

rendered-page output via its own headless fleet over rotating proxies; the target sees scrapingbee's exit ips and fleet fingerprint, not yours.

detectable by

joins

scrapingbee → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

scraperapi

api

stars: stars not tracked
confidence: speculative

a proxy + js-render scraping api positioned for simplicity and scale, with a free tier and automatic retries.

presents

rotated proxy exit ips with optional js rendering; the corroborating tell is the exit-ip class and the fleet's request cadence, not a fixed fingerprint.

detectable by

datacenter-asn

joins

scraperapi → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

serpapi

api

stars: stars not tracked
confidence: confirmed

a specialized search-results api covering 80+ engines (google, bing, maps, shopping, …) with structured json output.

presents

serpapi's own fetcher fleet and exit ips against the search engines; from your side it is a plain rest call, the scraping posture is entirely theirs.

detectable by

datacenter-asn

scrapebadger

api

stars: stars not tracked
confidence: speculative

a pay-per-success scraping api positioned on a billing model where you only pay for unblocked responses.

presents

managed exit ips with auto-unblocking; the pay-per-success framing is a billing claim, not a fingerprint property, treat capability as speculative.

detectable by

datadome

browserbase

apinode

stars: stars not tracked
confidence: confirmed

managed headless browsers in the cloud: connect playwright/puppeteer over cdp to a hosted, scalable browser fleet with stealth and proxy options.

presents

a remote chromium fleet posture; over plain config the cdp attach residue and datacenter exit ips are the tells, stealth/proxy options shift but do not erase them.

detectable by

decodo

api

stars: stars not tracked
confidence: speculative

the rebrand of smartproxy: an affordable proxy + scraping-api vendor with datacenter, residential and mobile pools.

presents

rotating residential / mobile exit ips with a scraping-api front; pool reputation and exit class are the corroborating signals, the same as the larger vendors.

detectable by

joins

decodo → runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

parsers · 6

selectolax

python

stars: 1,000★ · as-of 2026-06
confidence: confirmed

a python html parser binding the lexbor / modest c engines; positioned as a far faster beautifulsoup alternative.

presents

nothing on the wire, it is an offline parser; the network fetch upstream is what a detector sees, the parser is invisible to it.

lxml

python

stars: 2,800★ · as-of 2026-06
confidence: confirmed

the c-backed (libxml2 / libxslt) python parser: the fastest mature option for html/xml with full xpath support.

presents

no network presence, an in-process parser; it shapes how you extract data, not how the request looks to a detector.

parsel

python

stars: 1,100★ · as-of 2026-06
confidence: confirmed

scrapy's standalone selector library: xpath + css selectors over lxml, usable outside the scrapy engine.

presents

offline only; no fingerprint of its own, the upstream fetcher is what a detector evaluates.

beautifulsoup4

python

stars: 10,000★ · as-of 2026-06
confidence: confirmed

the beginner-friendly python html parser: forgiving, well-documented, slower than the c-backed options but ubiquitous.

presents

no wire presence, a pure parsing layer; it is paired with requests or httpx, and that client is the thing a tls gate fingerprints.

chompjs

python

stars: 300★ · as-of 2026-06
confidence: confirmed

a parser that turns javascript-object literals (the __NEXT_DATA__ / inline state blobs framework sites embed) into python dicts.

presents

offline; relevant because it extracts the embedded json a site renders, no request fingerprint, but it is how scrapers harvest hydration state without a browser.

w3lib

python

stars: 400★ · as-of 2026-06
confidence: confirmed

scrapy's low-level web utility library: url canonicalization, html entity decoding, encoding detection, the plumbing under the parsers.

presents

no network presence; a utility layer for url/html handling, invisible to any detector.

captcha solvers · 3

capsolver

api

stars: stars not tracked
confidence: speculative

an ai-driven captcha-solving service covering recaptcha, hcaptcha, turnstile and image challenges via api.

presents

returns a solve token (g-recaptcha-response, cf-turnstile-response, …); the token rides your session, but a token minted from a mismatched ip/fingerprint is itself the tell.

detectable by

2captcha

api

stars: stars not tracked
confidence: speculative

a long-running human + ai captcha-solving farm: you submit the challenge, workers (or models) solve it and return the token.

presents

a human-solved token for your session; the solve is genuine, so the residual tell is session coherence (does the token's origin match your exit ip and fingerprint).

detectable by

anti-captcha

api

stars: stars not tracked
confidence: speculative

a human + ai captcha-solving service in the same class as 2captcha, api-driven, broad challenge coverage.

presents

a returned solve token; like any solver farm, the weakness is binding, a token solved elsewhere must still match the session presenting it.

detectable by

reverse-engineering / interception · 4

mitmproxy

python

stars: 37,000★ · as-of 2026-06
confidence: confirmed

an interactive https intercepting proxy: inspect, modify and replay flows, scriptable in python. the standard tool for reading what a site's loader sends.

presents

not a client posture, a tap, it sits between client and server; its tell is that it terminates tls, so a pinned app refuses its injected ca.

detectable by

ssl-pinning

burp suite

multi

stars: stars not tracked
confidence: confirmed

the portswigger security-testing platform: an intercepting proxy plus scanner and repeater; the pro edition has an mcp integration.

presents

an interception proxy, not a fingerprint; like mitmproxy it breaks tls in the middle, which cert pinning is designed to detect.

detectable by

ssl-pinning

frida

multi

stars: 16,000★ · as-of 2026-06
confidence: confirmed

a dynamic instrumentation toolkit: hook and rewrite functions in a running process, the standard way to bypass ssl-pinning and trace native calls.

presents

it instruments a process from inside rather than presenting a network identity; the tell is on-device, anti-tamper and frida-detection checks look for its agent.

detectable by

http toolkit

multi

stars: 7,000★ · as-of 2026-06
confidence: confirmed

an open-source intercepting proxy with one-click setup for capturing mobile-app and desktop api traffic.

presents

an interception layer, not a client; same structural tell as any mitm, it terminates tls and is defeated by certificate pinning.

detectable by

ssl-pinning

distributed · 3

scrapyd

python

stars: 3,000★ · as-of 2026-06
confidence: confirmed

a service to deploy and run scrapy spiders as managed jobs: schedule, queue and monitor crawls over an http api.

presents

orchestration only, no fingerprint of its own; each spider it runs presents scrapy's transport, so coverage and tells are inherited, not added.

detectable by

no-js-runtime

scrapy-redis

python

stars: 5,000★ · as-of 2026-06
confidence: confirmed

a scrapy extension that puts the request queue and dedupe filter in redis, so many workers share one distributed crawl frontier.

presents

coordination only; the worker fleet presents scrapy http, but the distribution can concentrate many requests on shared exit ips, a velocity tell.

detectable by

scrapy-cluster

python

stars: 1,100★ · as-of 2026-06
confidence: confirmed

a redis + kafka + zookeeper distributed crawling architecture for coordinating scrapy at large scale across many machines.

presents

infrastructure, not a wire identity; the same velocity concern as any distributed fleet, many workers can cluster on a narrow exit-ip range a detector correlates.

detectable by

browser infra · 3

splash

docker

stars: 4,000★ · as-of 2026-06
confidence: confirmed

a scriptable (lua) headless-browser rendering service in a docker container, a scrapinghub project; dated, webkit-based, largely superseded by playwright.

presents

an old webkit/qt engine fingerprint that no longer matches any shipping browser, so its rendered output reads as an outdated, non-standard client.

detectable by

gologin

multi

stars: stars not tracked
confidence: speculative

a cloud anti-detect browser-profile service: each profile carries a managed, coherent fingerprint and a bound proxy, with an automation api. (also in browsers.ts.)

presents

a per-profile coherent fingerprint over its orbita chromium fork; capability is presented empirically where measurable, the marketing claims stay speculative.

detectable by

navigator.webdriver

joins

GoLogin → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

multilogin

multi

stars: stars not tracked
confidence: speculative

a commercial multi-account anti-detect browser: managed profiles, each with its own coherent fingerprint and proxy, chromium (mimic) and firefox (stealthfox) variants. (also in browsers.ts.)

presents

per-profile coherent fingerprints with a bound proxy; what holds under measurement is the open gap, so capability is speculative pending harness data.

detectable by

navigator.webdriver

joins

Multilogin → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

proxy tooling · 2

swiftshadow

python

stars: 200★ · as-of 2026-06
confidence: confirmed

a python library that scrapes and validates free public proxy lists into a rotating pool, zero cost, zero guarantees.

presents

free-pool exit ips with poor reputation, often datacenter or already-blocklisted, so they fail ip-reputation gates before any fingerprint is even read.

detectable by

requests-ip-rotator

python

stars: 1,700★ · as-of 2026-06
confidence: confirmed

a python library that routes requests through aws api gateway to rotate the source ip across aws's address space for free.

presents

exit ips inside aws's published ranges, an obvious datacenter / hosting asn, so rotation does not help against any gate that weights hosting-asn reputation.

detectable by

patch injection · 1

rebrowser-patches

node

stars: stars not tracked
confidence: confirmed

a patch set over playwright / puppeteer that closes the loudest cdp tells, chiefly the Runtime.enable execution-context leak, shipped as swap-in packages. (also in browsers.ts.)

presents

a patched chromium that no longer leaks the Runtime.enable tell, but the transport is unchanged headless chromium and the behavioural kinematics stay synthetic.

detectable by

joins

rebrowser-patches → also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

tls fingerprint · 1

cycleTLS (tls cycling)

nodego

stars: 1,000★ · as-of 2026-06
confidence: confirmed

the tls-fingerprint face of cycleTLS: rotate the ja3 handshake per request to spread tls reputation across many apparent clients.

presents

a different ja3 each request to dodge per-fingerprint reputation, but the rotation pattern itself, and the absent js runtime, can read as non-browser to a stateful gate.

detectable by