The email that shipped three times
Don't let AI rebuild infrastructure you can rent. And when a mistake repeats, automate the rule — don't re-document it.
- My app marked emails “sent” that were never sent — the same bug, three times in sixteen days.
- Writing the rule down (twice, in the files my agents read) didn't stop it from coming back.
- Root cause: a custom wrapper over Resend returned a 200 for non-deliveries, and `functions.invoke` returns `any`, so the compiler couldn't force callers to check the real status.
- What worked: a 79-line fail-closed classifier plus an ast-grep CI gate that fails the build when code ignores an email's delivered status. Documentation describes a rule; a gate enforces it.
I shipped the same bug three times in sixteen days. Not three similar bugs — the same one, twice after I'd fixed it and written a rule telling my AI coding agents never to do it again.
The setup
ReadySetBind automates insurance paperwork: a quote PDF goes in, and a bind request comes out the other end — an email to an underwriter that actually puts coverage in force. That email is the entire point of the pipeline. If the system is ever wrong about whether it went out, a deal sits dead while everyone believes it's moving.
The bug
That's exactly what it got wrong. AI coding tools make it feel free to build your own version of everything, so instead of keeping my email code paper-thin over Resend — a service whose whole job is sending mail reliably and reporting what happened — I let the agents build a custom wrapper with its own logic for suppressed addresses, send cutoffs, and duplicate sends.
The catch: Supabase's function-invoke call returns any (TypeScript's "I give up, this could be anything" type), so the compiler couldn't force callers to inspect the result. The wrapper returned a 200 even when nothing was delivered — suppressed, disabled, a replayed duplicate — and every caller read "no error" as "sent." The app marked records sent, advanced the deal, and wrote audit rows for emails that never existed.
Fix, document, repeat
I fixed it on day two. It came back the next day in a new piece of code. So I did what fifteen years of security consulting trains you to do: I documented it — once in the instructions every agent session reads at startup, again in a dedicated rules file.
It came back a third time anyway, in code written with both rules sitting right there.
That's not an AI problem. Human teams work the same way: the engineer who got burned remembers, the new hire doesn't. If written rules stopped recurring mistakes, the security industry's policy binders would have ended phishing a decade ago.
- Ship 1bug ships
- Fixedday 2
- Ship 2recurs next day
- Rule writtentwice, in agent files
- Ship 3recurs anyway
- ast-grep gate0 after
What actually worked
- Stop rebuilding what you can rent. The bug lived entirely in code I never needed to write. Resend already knows whether a message was delivered; my wrapper was a second opinion that was sometimes wrong. I cut it to a thin pass-through built on one rule: report Resend's real status, and treat anything unclear — unknown, null, empty — as a failure, never a success.
- Turn the written rule into a gate. A static-analysis check — an ast-grep rule, which matches the structure of the code rather than its text — now runs on every build: if any caller invokes the email function without branching on the delivered status, the build fails.
no invoke error ⇒ treat as deliveredunknown | null | "" | error ⇒ "failed", never deliveredthe gateA 79-line classifier makes "unclear" mean "not sent," and an ast-grep CI rule fails the build if a caller ignores it. The rule itself had to be hardened after a stray req.status === 200 fooled the first version into thinking the status was checked.
The bigger lesson
Security teams have language for this. A policy that says "laptops must be encrypted" and software that refuses to boot an unencrypted disk are different kinds of control: one documents intent, the other makes the bad state impossible. I spent fifteen years writing the first kind for clients and auditing the gap between the two. It took two weeks of my own product to re-teach me which one stops a repeat.
And the AI corollary: when agents make building custom infrastructure feel free, "should I build this at all?" matters more, not less. The fastest code to debug is the code you didn't write.