the arsenal

55 tools

the non-browser scraping and automation tooling, grouped by category: http clients, frameworks, ai scrapers, managed apis, parsers, captcha solvers, interception tools, distributed runners, browser infra, proxy tooling, patches and tls cyclers. read the presents line first, it is the forensic hook: what fingerprint each tool shows a detector, and what gives it away. the browser-engine subset lives in the browser reference catalog, joined here by name where a tool is also a driven browser. star counts are dated and drift weekly; vendor success claims stay flagged speculative.

http clients (no browser) · 8

curl_cffi

python
source ↗
stars
stars not tracked
confidence
confirmed

python bindings over curl-impersonate: borrows a real chrome tls/ja4+ and http-2 fingerprint at the wire, the fastest way to look like a browser without one.

presents

genuine chrome ja4+ tls and h2 frame order, but no js runtime, so any detector that runs one line of javascript sees an empty answer, the structural fail.

joins

httpx

python
source ↗
stars
14,000★ · as-of 2026-06
confidence
confirmed

an async python http client with http-2 support; the modern requests successor, popular for high-throughput scraping.

presents

a default python tls ja4 (not a browser's) even with h2 enabled, so the ua claims chrome but the handshake says python, a cross-band lie.

requests

python
source ↗
stars
52,000★ · as-of 2026-06
confidence
confirmed

the python baseline http client: synchronous, http-1.1 only, no browser surfaces. the thing every tls gate is tuned to catch first.

presents

default python/openssl tls ja4, no http-2, no js, a trivially flagged non-browser fingerprint, the canonical control for what naive scraping looks like.

tls-client

gopython
source ↗
stars
stars not tracked
confidence
confirmed

a go tls library with python bindings that lets you select a target browser's ja3/ja4 profile per request, built for cycling tls fingerprints.

presents

a chosen browser's ja3/ja4 on the handshake, but still no js runtime, and a mismatched h2 frame order can betray the impersonation even when the cipher list matches.

scrapling

python
source ↗
stars
5,000★ · as-of 2026-06
confidence
confirmed

a dual http/browser python library with adaptive selectors that survive layout changes; the http path is stealth-tuned and it can drive a browser for turnstile solving.

presents

on the http path, an impersonated browser tls with no js; on the browser path, a driven chromium posture, so what it presents depends on the mode you pick.

webclaw

rust
source ↗
stars
stars not tracked
confidence
speculative

a rust http client positioned for high-performance scraping with browser-impersonation tls.

presents

an impersonated browser tls from a rust stack with no js runtime; the rust http library's own tls quirks can leak a non-browser handshake under close inspection.

got-scraping

node
source ↗
stars
500★ · as-of 2026-06
confidence
confirmed

a node http client (the got wrapper used by crawlee) that generates browser-like header sets and tls fingerprints for scraping.

presents

browser-shaped headers and a generated tls fingerprint, but a node runtime, no js execution of the page, and header ORDER that can still diverge from a real browser.

cycleTLS

nodego
source ↗
stars
1,000★ · as-of 2026-06
confidence
confirmed

a node library backed by a go process that cycles ja3 tls fingerprints per request; doubles as a tls-fingerprint tool (see tls-tool).

presents

a rotating ja3 per request to dodge tls-reputation gates, but no js runtime and a go-backed handshake whose http-2 details can still read as non-browser.

scraping frameworks · 9

scrapy

python
source ↗
stars
52,000★ · as-of 2026-06
confidence
confirmed

the canonical python scraping framework: an async engine with middlewares, item pipelines and built-in throttling. the http core is non-browser by default.

presents

by default a twisted/python tls fingerprint with no js, so the framework itself is detectable on transport unless paired with an impersonation downloader or a browser engine bridge.

crawlee

nodepython
source ↗
stars
15,000★ · as-of 2026-06
confidence
confirmed

apify's dual node/python crawling framework: a unified api over plain-http and headless-browser crawlers with auto-scaling and session pools.

presents

the posture of whichever crawler you select, http (got-scraping fingerprint) or browser (playwright/puppeteer), so its tell is the underlying engine, not the framework.

colly

go
source ↗
stars
24,000★ · as-of 2026-06
confidence
confirmed

the popular go scraping framework: fast, concurrent, callback-driven. pure http, no browser.

presents

a go net/http tls fingerprint with no js runtime, distinct from any browser handshake, so a chrome ua over colly is a cross-band lie.

scrapy-poet

python
source ↗
stars
300★ · as-of 2026-06
confidence
confirmed

a scrapy extension bringing dependency-injection and page-object models, decoupling extraction logic from the spider; a zyte project.

presents

inherits scrapy's transport posture exactly, no browser, no js; it changes code structure, not the wire fingerprint, so it presents whatever the scrapy downloader does.

detectable by

scrapy-camoufox

python
source ↗
stars
stars not tracked
confidence
speculative

a scrapy engine bridge that routes selected requests through camoufox, a genuine anti-detect firefox, per-request engine switching.

presents

for bridged requests, a genuine gecko transport and js surface (camoufox), the thing chromium tools cannot fake; un-bridged requests fall back to scrapy's bare http.

detectable by

joins

scrapy-nodriver

python
source ↗
stars
stars not tracked
confidence
speculative

a scrapy engine bridge routing selected requests through nodriver's minimal raw-cdp driver for the requests that need a real chrome.

presents

for bridged requests, nodriver's minimal-cdp chrome posture (very little framework residue); bare scrapy http otherwise, so coverage depends on which requests you bridge.

detectable by

joins

scrapy-stealth

python
source ↗
stars
stars not tracked
confidence
speculative

a scrapy middleware layer that switches the underlying engine per request to apply stealth selectively rather than across the whole crawl.

presents

varies by the engine it selects per request; the stealth is only as good as the chosen downloader, so on un-bridged requests it still presents plain scrapy http.

detectable by

katana

go
source ↗
stars
13,000★ · as-of 2026-06
confidence
confirmed

projectdiscovery's fast go crawler / spider, built for security recon: link discovery, js-aware parsing, headless option.

presents

a go http tls fingerprint in default mode (no browser), distinct from any browser handshake; the optional headless mode shifts it to a chromium posture with cdp tells.

stars
5,500★ · as-of 2026-06
confidence
confirmed

a go devtools-protocol (cdp) driver for chrome, a high-level browser-automation library in the puppeteer/playwright family.

presents

a chromium-over-cdp posture from go, so it carries the raw-cdp tells (navigator.webdriver, cdp residue) unless paired with stealth patches.

ai / llm scrapers · 5

firecrawl

apinode
source ↗
stars
111,000★ · as-of 2026-06
confidence
confirmed

a hosted (and self-hostable) api that crawls a site and returns clean llm-ready markdown; popular as an agent retrieval backend.

presents

from the target's view, the fingerprint of firecrawl's own fetcher fleet (browser or http per page), so the tell is the service's exit ips and engine, not your code.

crawl4ai

python
source ↗
stars
60,000★ · as-of 2026-06
confidence
confirmed

an open-source python crawler that drives a local playwright chromium and emits llm-ready markdown / structured output; self-hosted, no api key.

presents

a local playwright chromium posture, so it inherits raw-cdp tells (navigator.webdriver, cdp residue) unless layered with stealth patches.

joins

  • Playwright (CDP launch) also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

scrapegraphai

python
source ↗
stars
18,000★ · as-of 2026-06
confidence
confirmed

a python library that builds extraction pipelines as graphs from a natural-language prompt, wiring an llm to a scraper backend.

presents

the posture of whichever backend fetcher it drives (http or playwright); the llm shapes extraction, not the wire fingerprint, so the tell is the underlying fetch.

jina reader

api
source ↗
stars
stars not tracked
confidence
confirmed

a hosted endpoint: prefix any url with r.jina.ai/ and get back clean markdown for llm consumption, no setup.

presents

jina's own fetcher fingerprint and exit ips to the target, not yours; the target sees jina datacenter traffic, your own posture is invisible to it.

detectable by

steel

apinode
source ↗
stars
5,000★ · as-of 2026-06
confidence
confirmed

a self-hostable browser api for ai agents: managed sessions, an mcp server, and a hosted cloud option over real chromium.

presents

a managed chromium session posture; self-hosted you carry the cdp tells, on the cloud you present steel's fleet fingerprint and exit ips to the target.

managed apis · 10

bright data

api
source ↗
stars
stars not tracked
confidence
speculative

the broadest managed-scraping vendor: a web-unlocker / scraping-browser api on top of the largest proxy estate (datacenter / isp / residential / mobile).

presents

to the target, residential or mobile exit ips with real-browser fingerprints from its unblocker fleet; the tell is the provider's pool reputation, not a static fingerprint.

joins

  • bright-data runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

oxylabs

api
source ↗
stars
stars not tracked
confidence
speculative

an enterprise scraping-api + proxy vendor with a large residential pool and an ai-assisted scraper (oxycopilot).

presents

residential / mobile exit ips and managed browser fingerprints from its scraper-api fleet; pool reputation and exit-ip class are the corroborating signals.

detectable by

joins

  • oxylabs runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

zyte

api
source ↗
stars
stars not tracked
confidence
speculative

the scrapy company's smart-proxy-manager and autoextract api: automatic unblocking plus llm-style structured extraction.

presents

managed exit ips with auto-rotated browser fingerprints; its published ~93% success figure is vendor marketing (speculative), corroborate before quoting.

joins

  • zyte runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

apify

apinode
source ↗
stars
stars not tracked
confidence
confirmed

a scraping cloud / actor marketplace: 10k+ prebuilt actors, a proxy layer, and an mcp server for agent use.

presents

the fingerprint of the specific actor you run on apify's infra plus its proxy exit ips; posture varies per actor (http vs crawlee browser).

scrapingbee

api
source ↗
stars
stars not tracked
confidence
speculative

a simple rest scraping api with js rendering and proxy rotation built in; a free tier and a low-friction onboarding.

presents

rendered-page output via its own headless fleet over rotating proxies; the target sees scrapingbee's exit ips and fleet fingerprint, not yours.

joins

  • scrapingbee runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

scraperapi

api
source ↗
stars
stars not tracked
confidence
speculative

a proxy + js-render scraping api positioned for simplicity and scale, with a free tier and automatic retries.

presents

rotated proxy exit ips with optional js rendering; the corroborating tell is the exit-ip class and the fleet's request cadence, not a fixed fingerprint.

detectable by

joins

  • scraperapi runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

serpapi

api
source ↗
stars
stars not tracked
confidence
confirmed

a specialized search-results api covering 80+ engines (google, bing, maps, shopping, …) with structured json output.

presents

serpapi's own fetcher fleet and exit ips against the search engines; from your side it is a plain rest call, the scraping posture is entirely theirs.

detectable by

scrapebadger

api
source ↗
stars
stars not tracked
confidence
speculative

a pay-per-success scraping api positioned on a billing model where you only pay for unblocked responses.

presents

managed exit ips with auto-unblocking; the pay-per-success framing is a billing claim, not a fingerprint property, treat capability as speculative.

detectable by

browserbase

apinode
source ↗
stars
stars not tracked
confidence
confirmed

managed headless browsers in the cloud: connect playwright/puppeteer over cdp to a hosted, scalable browser fleet with stealth and proxy options.

presents

a remote chromium fleet posture; over plain config the cdp attach residue and datacenter exit ips are the tells, stealth/proxy options shift but do not erase them.

decodo

api
source ↗
stars
stars not tracked
confidence
speculative

the rebrand of smartproxy: an affordable proxy + scraping-api vendor with datacenter, residential and mobile pools.

presents

rotating residential / mobile exit ips with a scraping-api front; pool reputation and exit class are the corroborating signals, the same as the larger vendors.

detectable by

joins

  • decodo runs its own proxy pool. the class taxonomy and exit-ip tells live in proxy & network identity.

parsers · 6

selectolax

python
source ↗
stars
1,000★ · as-of 2026-06
confidence
confirmed

a python html parser binding the lexbor / modest c engines; positioned as a far faster beautifulsoup alternative.

presents

nothing on the wire, it is an offline parser; the network fetch upstream is what a detector sees, the parser is invisible to it.

lxml

python
source ↗
stars
2,800★ · as-of 2026-06
confidence
confirmed

the c-backed (libxml2 / libxslt) python parser: the fastest mature option for html/xml with full xpath support.

presents

no network presence, an in-process parser; it shapes how you extract data, not how the request looks to a detector.

parsel

python
source ↗
stars
1,100★ · as-of 2026-06
confidence
confirmed

scrapy's standalone selector library: xpath + css selectors over lxml, usable outside the scrapy engine.

presents

offline only; no fingerprint of its own, the upstream fetcher is what a detector evaluates.

beautifulsoup4

python
source ↗
stars
10,000★ · as-of 2026-06
confidence
confirmed

the beginner-friendly python html parser: forgiving, well-documented, slower than the c-backed options but ubiquitous.

presents

no wire presence, a pure parsing layer; it is paired with requests or httpx, and that client is the thing a tls gate fingerprints.

chompjs

python
source ↗
stars
300★ · as-of 2026-06
confidence
confirmed

a parser that turns javascript-object literals (the __NEXT_DATA__ / inline state blobs framework sites embed) into python dicts.

presents

offline; relevant because it extracts the embedded json a site renders, no request fingerprint, but it is how scrapers harvest hydration state without a browser.

w3lib

python
source ↗
stars
400★ · as-of 2026-06
confidence
confirmed

scrapy's low-level web utility library: url canonicalization, html entity decoding, encoding detection, the plumbing under the parsers.

presents

no network presence; a utility layer for url/html handling, invisible to any detector.

captcha solvers · 3

capsolver

api
source ↗
stars
stars not tracked
confidence
speculative

an ai-driven captcha-solving service covering recaptcha, hcaptcha, turnstile and image challenges via api.

presents

returns a solve token (g-recaptcha-response, cf-turnstile-response, …); the token rides your session, but a token minted from a mismatched ip/fingerprint is itself the tell.

2captcha

api
source ↗
stars
stars not tracked
confidence
speculative

a long-running human + ai captcha-solving farm: you submit the challenge, workers (or models) solve it and return the token.

presents

a human-solved token for your session; the solve is genuine, so the residual tell is session coherence (does the token's origin match your exit ip and fingerprint).

anti-captcha

api
source ↗
stars
stars not tracked
confidence
speculative

a human + ai captcha-solving service in the same class as 2captcha, api-driven, broad challenge coverage.

presents

a returned solve token; like any solver farm, the weakness is binding, a token solved elsewhere must still match the session presenting it.

reverse-engineering / interception · 4

mitmproxy

python
source ↗
stars
37,000★ · as-of 2026-06
confidence
confirmed

an interactive https intercepting proxy: inspect, modify and replay flows, scriptable in python. the standard tool for reading what a site's loader sends.

presents

not a client posture, a tap, it sits between client and server; its tell is that it terminates tls, so a pinned app refuses its injected ca.

detectable by

burp suite

multi
source ↗
stars
stars not tracked
confidence
confirmed

the portswigger security-testing platform: an intercepting proxy plus scanner and repeater; the pro edition has an mcp integration.

presents

an interception proxy, not a fingerprint; like mitmproxy it breaks tls in the middle, which cert pinning is designed to detect.

detectable by

frida

multi
source ↗
stars
16,000★ · as-of 2026-06
confidence
confirmed

a dynamic instrumentation toolkit: hook and rewrite functions in a running process, the standard way to bypass ssl-pinning and trace native calls.

presents

it instruments a process from inside rather than presenting a network identity; the tell is on-device, anti-tamper and frida-detection checks look for its agent.

http toolkit

multi
source ↗
stars
7,000★ · as-of 2026-06
confidence
confirmed

an open-source intercepting proxy with one-click setup for capturing mobile-app and desktop api traffic.

presents

an interception layer, not a client; same structural tell as any mitm, it terminates tls and is defeated by certificate pinning.

detectable by

distributed · 3

scrapyd

python
source ↗
stars
3,000★ · as-of 2026-06
confidence
confirmed

a service to deploy and run scrapy spiders as managed jobs: schedule, queue and monitor crawls over an http api.

presents

orchestration only, no fingerprint of its own; each spider it runs presents scrapy's transport, so coverage and tells are inherited, not added.

detectable by

scrapy-redis

python
source ↗
stars
5,000★ · as-of 2026-06
confidence
confirmed

a scrapy extension that puts the request queue and dedupe filter in redis, so many workers share one distributed crawl frontier.

presents

coordination only; the worker fleet presents scrapy http, but the distribution can concentrate many requests on shared exit ips, a velocity tell.

scrapy-cluster

python
source ↗
stars
1,100★ · as-of 2026-06
confidence
confirmed

a redis + kafka + zookeeper distributed crawling architecture for coordinating scrapy at large scale across many machines.

presents

infrastructure, not a wire identity; the same velocity concern as any distributed fleet, many workers can cluster on a narrow exit-ip range a detector correlates.

browser infra · 3

splash

docker
source ↗
stars
4,000★ · as-of 2026-06
confidence
confirmed

a scriptable (lua) headless-browser rendering service in a docker container, a scrapinghub project; dated, webkit-based, largely superseded by playwright.

presents

an old webkit/qt engine fingerprint that no longer matches any shipping browser, so its rendered output reads as an outdated, non-standard client.

gologin

multi
source ↗
stars
stars not tracked
confidence
speculative

a cloud anti-detect browser-profile service: each profile carries a managed, coherent fingerprint and a bound proxy, with an automation api. (also in browsers.ts.)

presents

a per-profile coherent fingerprint over its orbita chromium fork; capability is presented empirically where measurable, the marketing claims stay speculative.

detectable by

joins

  • GoLogin also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

multilogin

multi
source ↗
stars
stars not tracked
confidence
speculative

a commercial multi-account anti-detect browser: managed profiles, each with its own coherent fingerprint and proxy, chromium (mimic) and firefox (stealthfox) variants. (also in browsers.ts.)

presents

per-profile coherent fingerprints with a bound proxy; what holds under measurement is the open gap, so capability is speculative pending harness data.

detectable by

joins

  • Multilogin also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

proxy tooling · 2

swiftshadow

python
source ↗
stars
200★ · as-of 2026-06
confidence
confirmed

a python library that scrapes and validates free public proxy lists into a rotating pool, zero cost, zero guarantees.

presents

free-pool exit ips with poor reputation, often datacenter or already-blocklisted, so they fail ip-reputation gates before any fingerprint is even read.

requests-ip-rotator

python
source ↗
stars
1,700★ · as-of 2026-06
confidence
confirmed

a python library that routes requests through aws api gateway to rotate the source ip across aws's address space for free.

presents

exit ips inside aws's published ranges, an obvious datacenter / hosting asn, so rotation does not help against any gate that weights hosting-asn reputation.

patch injection · 1

rebrowser-patches

node
source ↗
stars
stars not tracked
confidence
confirmed

a patch set over playwright / puppeteer that closes the loudest cdp tells, chiefly the Runtime.enable execution-context leak, shipped as swap-in packages. (also in browsers.ts.)

presents

a patched chromium that no longer leaks the Runtime.enable tell, but the transport is unchanged headless chromium and the behavioural kinematics stay synthetic.

joins

  • rebrowser-patches also a driven browser. its full posture, what it fixes and still leaks, lives in the browser reference catalog.

tls fingerprint · 1

cycleTLS (tls cycling)

nodego
source ↗
stars
1,000★ · as-of 2026-06
confidence
confirmed

the tls-fingerprint face of cycleTLS: rotate the ja3 handshake per request to spread tls reputation across many apparent clients.

presents

a different ja3 each request to dodge per-fingerprint reputation, but the rotation pattern itself, and the absent js runtime, can read as non-browser to a stateful gate.