- MIT-licensed open source — the harness ships as a public GitHub repo, README and SECURITY policy in the tab bar.
- No server-side secrets required: point it at a URL, give it two test accounts, and run one command.
- Responsible-use authorization is its own README section, not a footnote — the tool calls itself an active offensive scanner.
- Read-only by default; --full mode is required for state-changing writes, and written authorization is a stated precondition.
Overview
A black-box security test harness for AI-built apps: point it at a deployed product and it probes authentication, data access, and payment webhooks the way an outside attacker would, then writes up findings. Thirteen test modules plus an automated scanner, driven by a simple profile describing the target's stack.
Project Design
Born inside TariffRefunded as an internal weapon, then generalized: seven login providers researched, four auth adapters shipped — Clerk, Firebase, Supabase/GoTrue, NextAuth — behind one shared abstract base class. Published via a written export playbook: archive only tracked files, scrub every brand reference, decode anything credential-shaped to prove it's synthetic, fresh repository, human pushes.
Key modules
Discovery engine
Fingerprints a target's auth/DB/storage stack from live bundles or source — no provider named by hand.
Auth adapter factory
One abstract base with four pluggable provider sign-ins (Clerk, Firebase, Supabase GoTrue, NextAuth).
Profile loader
Merges live discovery with optional YAML overrides into a runtime profile carrying no pre-authored secrets.
Attack suite
Thirteen attack-category modules — auth bypass, IDOR, RLS/storage misconfig, injection, webhook spoofing, file upload.
Secret scrubber
Two-layer redaction strips seeded credentials, then Bearer/JWT/cookie patterns, before any report is written.
Run orchestrator
A bash driver that runs preflight, signs in, gates write probes, scans, and aggregates the reports.
Key features
Profile-driven scanning — the profile is the contract
StackBadger never hardcodes a target. Every probe reads its endpoints, tables, and provider choices from a profile — so one test suite runs against four auth providers (Clerk, Firebase, Supabase GoTrue, NextAuth), two databases, four storage backends, and three payment processors. Tests declare what stack they need with pytest markers, and the harness skips anything the active profile doesn't have. Live discovery fingerprints most of the stack automatically from the running site, and an explicit YAML profile unlocks the deeper, endpoint-specific probes. Adding a new provider is one adapter class plus a registry entry — not a rewrite of the tests.
Security & ops decisions
- Read-only probing is the default; write probes require both a command-line flag and an explicit per-test marker.
- Twin confirmation gates — CONFIRM_TARGET and CONFIRM_AUTHORIZED — plus a preflight doctor check stand between the operator and any probe.
- Written authorization is a stated precondition in the security policy, not a footnote.
- Exclusion paths and tables are on by default across every probe seam, tightened through two rounds of code review.
Builder notes
- 13 attack-category modules: auth bypass, IDOR, RLS bypass, storage, webhook spoofing, injection, file upload abuse, and more.
- Four auth adapters — Clerk, Firebase, Supabase/GoTrue, NextAuth — behind a shared abstract base and factory registry.
- Stack fingerprinting works black-box from a URL or white-box from source, and the README documents what it deliberately can't detect — pinned by a test.
- Dual report output: HTML for humans, JSON for agents, with an evidence-scrubbing layer.
Lessons learned
- The tool outlived the sprint that created it — it was the only code still earning commits after product work stopped. That's a signal worth acting on.
- Negative results belong in the README: one auto-detection approach was investigated, proven infeasible, and documented as a limitation instead of papered over.
- Publishing security tooling extracted from a real codebase is itself a security exercise — treat the release like an incident response, with a checklist.
What carried forward
The extract → scrub → review release playbook, now the standard path for anything leaving a private repo.