Case 22

The Labeling Vendor Benchmark

Open the file, inspect the artifacts, and decide what the evidence can support before the replay appears.

Case intake

The Labeling Vendor Benchmark

A marketplace trust-and-safety team trained ShieldRank, a model that flags prohibited listings before peak season. Vendor A labeled 40,000 historical listings and reports a 94 percent QA pass rate. On that benchmark, ShieldRank reaches F1 0.91, beating the old rules engine by a wide margin. Leadership wants to auto-remove high-confidence violations next month.

You are reviewing the benchmark and launch plan. Decide whether the label evidence is strong enough for automated enforcement, what risks remain, and what validation work should come before launch.

Open audio file
Benchmark launch voicemailTrust and Safety Product Lead
View transcript

The model finally gives us a clean benchmark story. If we can say it beats the rules engine by this much, I want high-confidence auto-removal live before the listing surge.

Evidence board

Work the scene

Inspect artifacts in any order. Sort what each one does, cite the few you would rely on, then assemble a final judgment with confidence.

0/12 viewed