1 of 16: auditing my own guardrails
Most controls are added after the incident they would have prevented — and most of them are theater. Audit your own guardrails as ruthlessly as you'd audit a client's.
- Over ten weeks I built ~16 PreToolUse hooks to stop my AI coding agents from running destructive commands.
- A formal self-audit found: 1 provably effective, 3 that misfired on legitimate work, and ~9 that were warnings nobody acted on — pure theater.
- Almost every hook was added right after the incident it would have prevented. I started calling these “tombstone controls.”
- The keeper rubric: a guardrail earns its place only if it blocks instead of warns, gets attacked on day one, and survives without constant false-positive patches.
Fifteen years of security consulting means I've audited a lot of other people's controls. This spring I turned the same lens on my own — the safety guardrails I'd built to stop AI coding agents from damaging my codebase. The results were humbling.
The setup
When you run AI agents against a real codebase all day, they occasionally do destructive things. The one that got my attention: a git worktree remove --force that a background agent was still using zeroed out a file and erased about 1,570 lines of work. So I started adding guardrails — PreToolUse hooks, checks that intercept a dangerous command before it runs. By the end I had roughly sixteen.
The audit
When I shelved the product, I audited the hooks the way I'd audit a client's control environment — not "do these exist?" but "is there any evidence one of them ever worked?"
hard block on `git stash` — zero recurrences afterblocked legitimate work — needed escape-hatch patcheswarn-only — no commit evidence any caught a defect1 of 16Only the hook that blocked a specific destructive command earned its keep. The rest either warned and got ignored, or misfired and got disabled.
- One was provably effective: a hard block on
git stash(agents kept stashing uncommitted work, then losing it). Zero recurrences after it shipped. - Three were brittle — they blocked legitimate work often enough that I patched in escape hatches, which is how a control starts to die.
- Roughly nine were theater: warnings that fired, scrolled past, and changed nothing. No commit evidence that any of them prevented a single defect.
The settings file holding these hooks was the most-edited file in the entire project — about thirty changes, more than half of them fixes to the hooks themselves, including fixes for false alarms the hooks were raising. A control that needs constant maintenance to stop attacking your own work is not yet a control.
Tombstone controls
The pattern underneath was the uncomfortable part: nearly every hook was added immediately after the incident it would have prevented. Each one is a tombstone — a marker of where something already died. That's not defense in depth; it's grief with a config file.
Enterprises do exactly this. The post-breach security stack is full of tools bought the week after an incident, each aimed at yesterday's attack, most never tuned, many generating alerts nobody reads. My sixteen hooks were a one-person version of the same ritual.
The rubric that survived
The audit ended with a bar for any future guardrail — and by that bar I should have built three, not sixteen:
- It blocks; it doesn't warn. A warning is a suggestion, and suggestions don't stop incidents.
- It gets attacked on day one. If I can't watch it catch the bad thing before I rely on it, it isn't a control yet.
- It survives without false-positive patches. A guardrail that keeps blocking real work will get an exception carved into it — and exceptions are how controls quietly stop existing.
The lesson isn't "build fewer guardrails." It's that a control without evidence is just a belief — and beliefs don't stop incidents, in an agent harness or an enterprise security program.