Overview
Pre-MLS lead intelligence for San Diego County real-estate wholesaling: tax delinquency, pre-foreclosure, probate, absentee ownership, and code violations aggregated into one operator dashboard with explainable lead scoring — intelligence only, no automated outreach. 130 commits, 26 database migrations, and five external data integrations in 7 days; paused before launch.
Project Design
Planned through the same adversarial-review pipeline as the projects before it — the PRD reached v4 before any code, and the review reshaped the product: the scoring model was swapped for an explainable one, and an entire signal class was removed on fair-housing grounds. When the county-scale tax scraper kept dying as a local script, it moved to a serverless function with cursor checkpoints, circuit breakers, and sanity checks against silent data corruption.
Key modules
Scoring engine
Additive, explainable lead score (0–100) from weighted distress signals with confidence multipliers.
Entity resolution
Deduplicates the same parcel arriving as APN, address variant, or trust/LLC name on bulk ingest.
Ingest pipelines
Five edge workers populate and refresh properties from parcel, tax, and lead sources.
Tax-delinquency scraper
A county-scale scraper on pg_cron with cursor checkpoints and a circuit breaker against corrupt data.
Compliance & audit
Append-only audit trail, per-user RLS, and DNC screening before any contact export.
Property dashboard
Lead table, signal filters, map view, notes, and an activity timeline for operators.
Key features
County-scale scraping that fails safe
The strongest distress signal — prior-year property-tax delinquency — has no API; it lives behind a county lookup page spanning roughly 566,000 parcels. The first version was a local script that died every time the laptop slept. It moved to a scheduled serverless function that walks parcels by cursor (not offset, so a restart resumes exactly where it stopped), runs every few minutes, and writes results in batches. Two guards keep bad data out: a circuit breaker trips after 50 consecutive failures, and a sanity check halts the run if an implausibly high share of a batch comes back delinquent — the signature of a stale or corrupted source page.
Security & ops decisions
- Fair-housing compliance was a design-time constraint, not a retrofit: the owner-age signal was removed from scoring entirely, and divorce signals require per-deal human review.
- AI-inferred property-condition scores ship labeled experimental, weight-capped, dated, and feedback-instrumented.
- Scoring is explainable by construction: an additive model with confidence multipliers, weights locked in migration code, so "why did this lead score 78?" always has an answer.
- Each of 11 data sources carries an explicit freshness class so the product can't imply timeliness it doesn't have.
Builder notes
- The 566K-parcel tax-delinquency scraper moved from a laptop CLI to an edge function on cron, with cursor-based pagination, a circuit breaker, and a data-corruption sanity check.
- Entity resolution is load-bearing: the same parcel arrives as an APN, two address spellings, an LLC, and a family trust — an APN-primary resolution layer keeps scoring sane.
- 26 idempotent migrations (~4.1k lines of SQL) with row-level security and an append-only audit layer supported a seven-day build without data-wipe risk.
Lessons learned
- Compliance can shape the MVP at design time instead of arriving as a retrofit: an owner-age signal was deleted and divorce demoted to manual-only review before a line of code existed.
- Explainable beats clever: exponential scoring was replaced with an additive model and confidence weights so an operator can answer “why did this lead score 78?”
- AI-inferred signals ship labeled and capped — the experimental property-condition scan shows its image age and limitations, and its score weight is deliberately small. The safeguard is transparency, not absence.
- Fragile scrapers need built-in fault detection, not retry loops: resumable checkpoints, a circuit breaker, and a sanity check that distrusts its own output.
What carried forward
Compliance shaping scope at design time — a fair-housing review removed an entire signal class before any code existed — and the checkpoint-and-circuit-breaker pattern for fragile, county-scale scrapers.
Posts from this project
Case study in progress.