Fifteen years of security consulting taught me to distrust a control that only looks like a control. When I started running AI coding agents against real codebases, I turned that lens on my own tooling first — and the very first thing I audited, the safety guardrails I'd built to stop the agents from destroying work, turned out to be mostly theater.
So this isn't a list of clever tricks. It's the operating practice underneath eight products: name the real technique, explain why a security person reaches for it, and show where it broke before it earned its keep. The thread is the one consulting and building taught me twice — a written rule is a suggestion; a gate is a control.
-
Audit your own guardrails
Most controls are theater until you red-team them — so red-team your own.
Running AI coding agents against a real codebase all day means they occasionally do destructive things, so over ten weeks I built about sixteen PreToolUse hooks — checks that intercept a dangerous command before it runs.
Then I audited those hooks the way fifteen years of consulting taught me to audit a client's controls: not "do they exist?" but "is there any evidence one of them ever worked?" The keeper rubric is the transferable part — a guardrail earns its place only if it blocks instead of warns, gets attacked on day one, and survives without constant false-positive patches.
Where it brokeOne of the sixteen was provably effective. Three misfired on legitimate work and needed escape-hatch patches; roughly nine were warn-only theater nobody acted on. Almost every hook had been added right after the incident it would have prevented — I started calling them tombstone controls.
1 of 16: auditing my own guardrails -
Audit the design before any code
The cheapest defect to fix is the one you catch in the design doc.
Before writing code for SafeCircleOps — an investigation tool where a wrong answer could mislead a real case — I audited the design document against the actual tools it would orchestrate.
Design-time review is where a security reviewer earns the most: a flaw in a diagram costs a comment, the same flaw in production costs an incident. Every finding became an enforced rule rather than a recommendation — the distance between documented and enforced is the whole job.
Where it brokeThe audit caught five build-breaking errors and three architectural defects before any code existed. SafeCircleOps stays private and this stays at the pattern level — it holds a real case.
SafeCircleOps write-up -
Treat publishing as a security exercise
Releasing code from a real product is incident response, not a git push.
StackBadger began as an internal harness for attacking my own product. When it outlived the sprint that created it, I generalized it across stacks and open-sourced it — which meant a written export playbook, not a push.
Publishing security tooling extracted from a private codebase is itself a security exercise, so I ran the release like an incident: archive only tracked history (git archive from origin/main, so nothing untracked rides along), scrub every brand reference, decode anything credential-shaped — every eyJ… string, the standard opening of a JSON Web Token — to confirm it is synthetic, then a fresh git init with a human, not an agent, pushing it public.
Where it brokeHonest engineering includes the negative result: I tried to auto-detect a Supabase-Auth target from a site's shipped JavaScript and proved it reliably can't be done. The blind spot is documented in the README and pinned by a test rather than papered over.
StackBadger: the tool that outlived its sprint -
Coordinate agents by filesystem convention
The cheapest multi-agent architecture is a folder with rules.
The most useful AI system I have built has almost no code — a folder structure with rules that multiple agents work inside overnight without colliding. Coordination happens by filesystem convention: write-once research files, an append-only reference library, and a handover note between sessions, not an orchestration framework.
Separation of duties is a security control, and it transfers to agents: the agent that writes research is never the agent that grades it, and the auditing steps are forbidden from generating content. Discipline that lives in conventions instead of memory is discipline you can actually enforce.
Where it brokeWithout that separation, an agent grading its own research is just marking its own homework. The enforced rule — writer is never grader, auditors are non-generative — is what keeps nothing from quietly approving itself.
Interview prep as an agent workspace -
Isolate every parallel agent
One agent, one branch, one worktree, one PR — blast radius by design.
ReadySetBind was built with around thirty-five parallel Claude Code agents, each in its own git worktree — a separate working copy of the repo. A PreToolUse hook blocks an agent from switching branches, so the rule holds itself up: one branch, one worktree, one pull request per agent.
Parallel agents sharing a checkout step on each other, and a destructive command in a shared tree has a blast radius across everyone's work. Isolating each agent in its own worktree, enforced by a hook rather than a written convention, is least privilege applied to automation.
Where it brokeThe protocol came after collisions: a written "don't switch branches" rule didn't hold, so enforcement moved into a hook that blocks the switch outright. The same class of bug shipped three times past written rules and zero times past an automated check.
ReadySetBind write-up -
Make knowledge compound
If the fix isn't written where the next session looks, you didn't fix it — you scheduled it.
I analyzed fifty-two of my own agent coding sessions and found the same six friction patterns recurring more than twenty times. Every session solved them; every solution died when the session ended.
Agents start each session cold — and so does a team when the person who knew the workaround leaves. The fix is the same in both: the moment a workaround proves out, capture it as a short, findable record — symptom, root cause, fix — in a place the next session reads before it starts. Tribal knowledge is just a workaround wearing a process costume.
Where it brokeOne platform quirk was being re-discovered at roughly nineteen wasted attempts per session; one recurring permissions issue ate forty to sixty percent of a code-review agent's entire budget — all because the answer lived in a chat log, not a briefing.
Knowledge compounds, workarounds don't