Compile-green, deploy-broken
“Passes every check” and “works where it actually runs” are different claims. Budget for the gap.
- A core feature — placing signature fields on a PDF — worked perfectly on my machine and returned a generic 502 in production.
- It took four separate fixes, all in how `pdfjs-dist` (Mozilla's pdf.js) behaves in a serverless runtime. Each failure was invisible until the previous one was fixed, and all four threw the identical error.
- The cost was real: while it was broken the operator threw out all seventeen predicted fields on one placement and redid them by hand.
- Lesson: production is a different machine with different rules. Test there early, and assume the first production failure is hiding more behind it.
One route in ReadySetBind reads a quote PDF and works out where each signature field belongs on the page — it turns Claude's text-anchor detections into pixel coordinates with pdfjs-dist, Mozilla's pdf.js (the same PDF engine that renders documents in your browser). On my Windows laptop it was flawless. Every automated check: green. In production it returned a 502 — a generic server error — and stayed dead.
Why this one hurt
While the route was broken, the app fell back to asking a human to place every field by hand. One real placement logged seventeen of the predicted fields deleted and eighteen re-added manually — the operator threw out every prediction the feature existed to produce. A broken feature is one thing; a feature actively wasting the time of the person it was built for is worse.
Four failures in a trench coat
Debugging took four rounds, and here's the part worth keeping: each failure was invisible until the one before it was fixed, and all four produced the identical 502. Fixing three of four looked exactly like fixing none.
workerSrcand thec:protocol. pdf.js v4 is ESM (the modern JavaScript module format) and rejects a bare filesystem path for its worker file — it threwReceived protocol 'c:'. Worse, my try/catch guardedrequire.resolve()(which succeeds) while the real throw happened later insidegetDocument, so the fallback branch was unreachable. Fix: the legacy build with noworkerSrcat all — plus extracting only the ~4 pages I needed (0.7s instead of 12s) to fit under Netlify's ~10-second function ceiling. (maxDuration, the knob that would raise that ceiling, is Vercel-only;@netlify/plugin-nextjsignores it.)- Fonts that only exist on my laptop.
useSystemFonts: trueloads fonts through@napi-rs/canvas, a native module present on Windows and absent from Netlify's runtime. Fix:useSystemFonts: false, disableFontFace: true. Field positions come from the PDF's content stream, not from rendered glyphs — I never needed the fonts. - The bundler broke the library's internals. Next.js packed
pdfjs-distinto a webpack chunk, which broke its own internal dynamicimport('pdf.worker.mjs'). Fix:serverExternalPackages: ["pdfjs-dist"]— tell the bundler to leave it alone. - The deploy shipped without the worker (the real fix). Next's file tracer —
@vercel/nft, which decides which files actually get uploaded — follows direct imports but not a dependency's internal dynamic import, so the 2.3 MB worker file was never deployed. Fix:outputFileTracingIncludes, to force it in.
Any one of the four, alone, reproduced the full outage. Net result once all four landed: twelve of eighteen fields auto-place, the rest fall to manual — which is fine.
"/api/placements/[id]/resolve-anchors": [ … pdf.worker.mjs … ]"/api/placements/*/resolve-anchors": [ … pdf.worker.mjs … ]the real fixThat config key is a picomatch glob, not a literal route path — so the [id] segment had been matching by luck until it didn't, and the worker was silently dropped from the deploy. One character class (*) fixed it — but only after the three failures above were peeled off first.
What changed in how I work
I stopped treating local success as evidence of production success for anything touching files, fonts, or the runtime. Serverless platforms make deployment effortless by running your code somewhere that is emphatically not your machine — that's the trade, and the trade has fine print: the Deno runtime rejecting ?? mixed with || without parentheses, a bundler dropping a non-literal dynamic import. Same class of failure, different week.
And when production breaks while dev stays green, I now assume multiple stacked causes until proven otherwise. The urge to declare victory after the first fix is exactly what stretched this from an afternoon into days. All four fixes are written into the project's knowledge base now, because none of them is guessable from the symptom — the next product on this stack inherits the answers on day one.
Test the running system
In security work we'd never accept "the policy passed review" as proof a control actually holds in production — we test the running system, under its real configuration. Code deserves the same skepticism. Compile-green is a claim about your machine. Deploy-broken is a fact about the real one. Budget for the gap between them.