Week of March 8, 2026
Scoreboard of Shame + Pipeline Migration
Built prediction accountability system with Brier scores per detector. Ran antifragile gauntlet, fixed 3 critical + 4 high bugs. Migrated daily pipeline from GitHub Actions to self-hosted Windmill. Via negativa: subtracted vendor CI dependency entirely.
Scoreboard of Shame: outcome tracking (lib/outcomes.ts) snapshots YES prices for each detector's predictions, then resolves hit/miss after 7 days or market close. Brier scoring (lib/scoring.ts) computes mean((p-o)^2) per detector with 90-day rolling window. Track Record section on /scanner shows hit rates, Brier scores, and skull icons for culled detectors.
Auto-cull API (POST /api/signals/cull). Resolves outcomes, recomputes scores, zeros detectors worse than random (Brier > 0.25) for 4+ weeks with 10+ predictions. Pipeline integration runs this as Step 4b after Taleb scan.
Antifragile gauntlet caught 3 critical bugs: cullUnderperformers WHERE clause was inverted (culled dormant detectors, not active bad ones), predictedProbability defaulted to 1.0 instead of 0.5 (every miss scored max penalty), and fetchMarketStatus double-fetched the same Polymarket slug.
Hardened scoreboard: auth checked before rate-limit in cull endpoint, stuck-row abandonment after 30 days, 90-day rolling window on Brier scores, batched N+1 queries with concurrency limits in outcome tracking.
Migrated daily pipeline from GitHub Actions to self-hosted Windmill (wm.mk23.cc). New "omen" workspace with isolated secrets. Pipeline script clones repo via SSH deploy key, runs fetch_polymarket, run_pipeline, and resolve_omens. Telegram notifications on success/failure/warning.
Windmill hardening: rotated default admin credentials, dynamic Python runtime discovery (survives Windmill updates), per-run temp directories (no races), secret cleanup in finally block, timeouts on all pipeline steps. GitHub Actions secrets revoked and workflow files deleted.
The ralph loop (headless claude --print) shipped 928 lines across 9 files in one shot, then needed 5 fix commits. Fat Tony lesson: build one piece, test it, move on. The loop needs test gates between steps, not hope.
Windmill's sys.executable is a wrapper python, not the runtime python. Packages installed via runtime pip are invisible to the wrapper. Dynamic glob discovery of /tmp/windmill/cache/py_runtime/cpython-*/bin/python3 is the robust pattern.