1 of 16: auditing my own guardrails

Fifteen years of security consulting means I've audited a lot of other people's controls. This spring I turned the same lens on my own — the safety guardrails I'd built to stop AI coding agents from damaging my codebase. The results were humbling.

The setup

When you run AI agents against a real codebase all day, they occasionally do destructive things. The one that got my attention: a git worktree remove --force that a background agent was still using zeroed out a file and erased about 1,570 lines of work. So I started adding guardrails — PreToolUse hooks, checks that intercept a dangerous command before it runs. By the end I had roughly sixteen.

The audit

When I shelved the product, I audited the hooks the way I'd audit a client's control environment — not "do these exist?" but "is there any evidence one of them ever worked?"

~16 agent guardrails, audited for evidence of working

1 · KEPThard block on `git stash` — zero recurrences after

3 · BRITTLEblocked legitimate work — needed escape-hatch patches

~9 · THEATERwarn-only — no commit evidence any caught a defect

1 of 16Only the hook that blocked a specific destructive command earned its keep. The rest either warned and got ignored, or misfired and got disabled.

One was provably effective: a hard block on git stash (agents kept stashing uncommitted work, then losing it). Zero recurrences after it shipped.
Three were brittle — they blocked legitimate work often enough that I patched in escape hatches, which is how a control starts to die.
Roughly nine were theater: warnings that fired, scrolled past, and changed nothing. No commit evidence that any of them prevented a single defect.

The settings file holding these hooks was the most-edited file in the entire project — about thirty changes, more than half of them fixes to the hooks themselves, including fixes for false alarms the hooks were raising. A control that needs constant maintenance to stop attacking your own work is not yet a control.

Tombstone controls

The pattern underneath was the uncomfortable part: nearly every hook was added immediately after the incident it would have prevented. Each one is a tombstone — a marker of where something already died. That's not defense in depth; it's grief with a config file.

Enterprises do exactly this. The post-breach security stack is full of tools bought the week after an incident, each aimed at yesterday's attack, most never tuned, many generating alerts nobody reads. My sixteen hooks were a one-person version of the same ritual.

The rubric that survived

The audit ended with a bar for any future guardrail — and by that bar I should have built three, not sixteen:

It blocks; it doesn't warn. A warning is a suggestion, and suggestions don't stop incidents.
It gets attacked on day one. If I can't watch it catch the bad thing before I rely on it, it isn't a control yet.
It survives without false-positive patches. A guardrail that keeps blocking real work will get an exception carved into it — and exceptions are how controls quietly stop existing.

The lesson isn't "build fewer guardrails." It's that a control without evidence is just a belief — and beliefs don't stop incidents, in an agent harness or an enterprise security program.