LaunchVerdict · live demo (seeded)

Three releases. Three verdicts.

You shipped three times this week. Instead of a 40-widget dashboard, here is one card per release: did it help the flow that matters, or hurt it — and what to do now. One clearly regressed (roll back), one dropped on too thin a sample to trust yet (hold and watch), one improved (keep shipping). All computed live from seeded telemetry by the same engine that runs on real repos.

This week's verdicts

↩ ROLL IT BACKacme/checkout-web · r-1042

Onboarding completion fell 71%→53% after r-1042

onboarding · before

71%

→

after

53%

after-rate 95% CI: [48%, 58%]

Causechanged app/onboarding/Step2.tsx; Novus flagged: Step 2 'Continue' button moved below the fold on 375px viewports

Do thisRevert r-1042 or fix: Step 2 'Continue' button moved below the fold on 375px viewports

high confidenceCorrelation under a known cause (the diff), not proven causation. Short windows and confounders apply.

⏸ HOLDacme/checkout-web · r-1044

Signup completion fell 80%→45% after r-1044 — hold and watch (thin sample)

signup · before

80%

→

after

45%

after-rate 95% CI: [32%, 60%]

Causechanged app/signup/Plan.tsx; Novus flagged: Plan toggle hidden behind a tooltip on first paint

Do thisDon't roll back yet — the drop is real but the sample is thin. Hold r-1044 and recheck once more users hit the flow or fix: Plan toggle hidden behind a tooltip on first paint.

low confidenceCorrelation under a known cause (the diff), not proven causation. Short windows and confounders apply.

✓ KEEP SHIPPINGacme/checkout-web · r-1039

Search completion rose 60%→70% after r-1039 — keep it

search · before

60%

→

after

70%

after-rate 95% CI: [67%, 73%]

Causechanged app/search/ranking.ts

Do thisKeep shipping. No action needed.

high confidenceCorrelation under a known cause (the diff), not proven causation. Short windows and confounders apply.

How the call is made

We treat a release as the cut point on the event timeline. We compare each flow's completion rate in the 7 days before vs after the cut with a two-proportion z-test, take the most-moved flow, and require both significance (p<0.05) and a meaningful drop before we say roll back. The number is a correlation under a known cause — the diff — not a proof. We label the confidence and never fabricate a call when the data is thin.