CASE STUDY

StackBadger: the tool that outlived its sprint

JUNE 11, 2026 · 5 MIN · STACKBADGER
TL;DR

Sometimes the most durable thing a product sprint produces isn't the product — it's the tooling you built along the way. Extract it deliberately.

StackBadger is the only thing on this register with a public repository, and it started as an internal weapon: a harness for attacking my own product before anyone else could.

Born inside a sprint

While building TariffRefunded — a tariff-refund product handling importers' customs data — I wanted security testing that was repeatable, not a one-time review. So I built a black-box harness: it probes a deployed app the way an outside attacker would, with no source access, and writes up what it finds. Thirteen attack-category modules — authentication bypass (forging or tampering with the JSON Web Token that proves who you are), IDOR (reaching another tenant's records by guessing their IDs), row-level-security bypass straight through the REST API, storage path-traversal, webhook signature spoofing, injection, malicious file upload — plus an OWASP ZAP scan orchestrated alongside, its findings deduplicated against the harness's own.

Then the product sprint wound down. And the commit history showed something worth noticing: the harness was the only part of the codebase still getting work. The product had stopped; the tool hadn't. That's a signal worth acting on.

Generalizing it

The harness knew too much about my specific stack to help anyone else, so the extraction was mostly removal: everything product-specific moved into a YAML profile — which login provider, which database, which payment processor — and the tests became generic. I researched the authentication quirks of seven providers and shipped adapters for four of them: Clerk, Firebase Auth, Supabase's GoTrue, and NextAuth. A profile describes a stack; the harness adapts to whatever the profile says.

One result from that work deserves naming because it's negative, and negative results rarely get published. I tried to make the tool auto-detect a Supabase Auth target by scanning an app's shipped JavaScript bundle, and concluded it reliably can't: supabase-js statically bundles the same auth client whether or not you use it, and the request paths that would give it away are built at runtime, never written as literals a scanner could find — so a Supabase-Auth target is indistinguishable from "Clerk plus a Supabase database" from the outside. The tool requires you to declare the stack with a --profile flag instead, and the limitation is documented in the README and pinned by a test rather than papered over. A security tool honest about its blind spots is worth more than one that demos well.

Publishing without leaking

Releasing security tooling extracted from a real product is itself a security exercise. The export followed a written playbook: take only the repository's tracked history via git archive from origin/main (so nothing untracked rides along), scrub every product and brand reference, then decode anything shaped like a credential — every eyJ… string, the standard opening of a JSON Web Token — to confirm it's synthetic test data, and finally a clean git init with a human, not an agent, pushing it public.

StackBadger · safe-by-default posture
DEFAULTread-only probes — no writes to the target
WRITESrequire --full + @pytest.mark.write_probe (opt-in, per probe)
PREFLIGHTCONFIRM_TARGET · CONFIRM_AUTHORIZED (doctor.py)

publishing a weaponAn attack tool you hand to strangers needs guardrails the original never did: writes are off unless explicitly armed, and it refuses to run until you've affirmed you own — or are authorized in writing to test — the target.

Bet on your own recurring problems

Products are bets on a market; tooling is a bet on problems you know you'll have again — and the second bet pays off more reliably. When a sprint ends, audit what's left with one question: what here is still alive? In my case it was the thing built to attack everything else, which, as a career security person, feels about right.

"A written rule is a suggestion. A gate is a control."
The operating principle behind every project here. The same bug shipped three times past written rules — and zero times past a CI gate. Deterministic enforcement beats advisory documentation, in agent harnesses and security programs alike.