Evidence docket

Case library

Work the full 25-case suite in order, or return through focused routes when you want concentrated practice with measurement, causal design, operational models, or AI risk.

Case library

Evidence docket

Start
Case 01Unopened

The Dashboard Spike

A launch-week chart jumps, a deck is due, and several teams have reasons to claim the movement.

Product analytics8 min
chartaudiolog
Case 02Unopened

The Checkout Readout

A checkout readout lands just before planning closes, and different artifacts point toward different launch stories.

Experimentation10 min
chartaudiotable
Case 03Unopened

The Churn Model Pitch

A polished retention model arrives with a renewal deadline, a crowded outreach queue, and a promise that the save team can act sooner.

ML evaluation12 min
model-outputtablememo
Case 04Unopened

The Inspection Queue

A city inspection team has a new routing screen, a long backlog, and one week to decide how much authority the score should have.

Public policy analytics12 min
mappolicytable
Case 05Unopened

The Spring Tutoring Brief

A district impact brief is headed to a funding vote after students who used a tutoring platform show stronger spring gains.

Education analytics11 min
press-releasecharttable
Case 06Unopened

The Winter Shelter Forecast

A city housing office must set winter overflow capacity from a forecast that fits ordinary nights better than pressure weeks.

Public service forecasting12 min
charttimelinetable
Case 07Unopened

The Benefits Queue Score

A state benefits agency wants to use a verification score to cut backlog, but the burden may land unevenly on applicants with messier administrative records.

Government benefits analytics12 min
model-outputpolicytable
Case 08Unopened

The Claimant Chatbot

A benefits agency chatbot handles routine questions well, but evaluation logs show confident wrong answers on high-stakes claim situations.

Public sector AI evaluation12 min
transcriptrubricmemo
Case 09Unopened

The Payment Hold Dial

The same benefits agency must choose a payment-hold threshold that catches fraud without turning suspicion into broad payment delay.

Government risk operations12 min
chartsimulatortable
Case 10Unopened

The Clearance Rate Metric

The benefits modernization program changes its executive metric, and the new dashboard may reward faster closure while hiding reopened cases and payment delay.

Public administration analytics11 min
tablememochart
Case 11Unopened

The Survey Sample Mirage

A customer research survey appears decisive until response patterns reveal who never had a real chance to answer.

Survey analytics12 min
tablechartmemo
Case 12Unopened

The Bed-Ready Field

A familiar hospital operations field powers a clean improvement story while source systems leave conflicting traces.

Healthcare operations12 min
tablememotimeline
Case 13Unopened

The Missingness Report

A clinical risk report looks stable after dropping incomplete records, but missingness follows staffing, language access, and acuity.

Clinical analytics12 min
heatmaptablememo
Case 14Unopened

The Privacy-Safe Export

A de-identified public health export clears a checklist, but linkage, consent scope, and lifecycle controls make the release less simple.

Data governance12 min
policymemotable
Case 15Unopened

The Board Slide

A board packet turns an early operational shift into a dramatic story, and the chart frame is doing more work than it first appears.

Executive reporting9 min
chartmemotable
Case 16Unopened

The Geo Test Winner

A regional media test appears to win, but market matching, spillover, seasonality, and operational changes keep the counterfactual unsettled.

Retail media12 min
charttabletimeline
Case 17Unopened

The Parallel Trends Slide

A policy brief claims a workforce pilot raised employment, but the comparison group was already drifting away before launch.

Labor policy12 min
charttabletimeline
Case 18Unopened

The Cutoff Policy Claim

An eligibility cutoff seems to prove a rental-assistance navigator prevented evictions, until sorting around the threshold weakens the design.

Benefits eligibility13 min
charttabletimeline
Case 19Unopened

The QuickStart Readout

A product experiment gets a fast no-go recommendation, but the exposure record and interval width leave more than one interpretation alive.

Product experimentation12 min
charttabletimeline
Case 20Unopened

The Short-Term Lift

A subscription checkout test lifts paid starts, but refunds, retention, and support burden make the growth claim less settled.

Subscription growth12 min
charttabletimeline
Case 21Unopened

The Discharge Score

A hospital readmission score looks unusually strong in validation, and the launch team wants it in the discharge workflow next month.

Hospital readmission13 min
charttabletimeline
Case 22Unopened

The Labeling Vendor Benchmark

A moderation model beats the old rules engine on a vendor benchmark, but the benchmark labels may be measuring vendor behavior more than policy truth.

Trust and safety13 min
charttablememo
Case 23Unopened

The Drift Alarm Nobody Owned

An ETA model drift alarm is real, but the deeper failure is that monitoring is not connected to owned operational response.

Logistics ETA13 min
charttabletimeline
Case 24Unopened

The Holiday Override

A holiday replenishment model looks accurate enough to override planners, but store operations leave clues that demand may not be fully visible.

Retail supply chain13 min
charttabletimeline
Case 25Unopened

The DealDesk Pilot

An enterprise assistant performs well on clean sales workflows, and revenue operations wants tool-enabled expansion before renewal season.

Enterprise AI13 min
charttablememo

Focused routes

Case sequences for return visits

View pathways
Focused route 5 cases

Data Provenance and Measurement Integrity

Practice judging whether the data, sample, field definitions, privacy posture, and visual frame are trustworthy enough to reason from.

This route strengthens the foundation under later causal, model, and decision cases.

  1. Case 11: The Survey Sample Mirage Survey analytics evidence file with competing signals.
  2. Case 12: The Bed-Ready Field Healthcare operations evidence file with competing signals.
  3. Case 13: The Missingness Report Clinical analytics evidence file with competing signals.
  4. Case 14: The Privacy-Safe Export Data governance evidence file with competing signals.
  5. Case 15: The Board Slide Executive reporting evidence file with competing signals.
Focused route 5 cases

Causal Designs Beyond the A/B Test

Move past simple experiment validity into spillovers, quasi-experiments, timing, power, and claim wording.

  1. Case 16: The Geo Test Winner Retail media evidence file with competing signals.
  2. Case 17: The Parallel Trends Slide Labor policy evidence file with competing signals.
  3. Case 18: The Cutoff Policy Claim Benefits eligibility evidence file with competing signals.
  4. Case 19: The QuickStart Readout Product experimentation evidence file with competing signals.
  5. Case 20: The Short-Term Lift Subscription growth evidence file with competing signals.
Focused route 5 cases

Operational Models and AI Risk

Practice deciding whether a model or AI system is ready for operational use after performance claims meet real workflows.

  1. Case 21: The Discharge Score Hospital readmission evidence file with competing signals.
  2. Case 22: The Labeling Vendor Benchmark Trust and safety evidence file with competing signals.
  3. Case 23: The Drift Alarm Nobody Owned Logistics ETA evidence file with competing signals.
  4. Case 24: The Holiday Override Retail supply chain evidence file with competing signals.
  5. Case 25: The DealDesk Pilot Enterprise AI evidence file with competing signals.