TL;DR
- The Bullshit Index extracts every claim made by an agent and verifies it in real time against your evidence pack, the public web, and the agent's own earlier turns.
- Hallucinated citations, drifted positions, false precision, and contradicted statements all push the meter higher.
- It runs inside the debate loop — not after the fact — so the arbiter sees fragility before locking the verdict.
- False-positive rate currently 0.04% on internal benchmarks; average latency 1.2s per claim.
What it actually measures
Most fact-checking layers score final outputs. The Bullshit Index scores the reasoning. It tracks four signals simultaneously:
- Hallucinated citations — references to papers, studies, dates, or statistics that do not exist in the evidence pack or on the public web.
- Position drift— silent contradictions between an agent's current turn and their earlier turns, without explicit acknowledgment of the reversal.
- False precision — confidently-stated numbers, percentages, or named entities with no traceable source.
- Unsupported assertions — claims marked as factual that cannot be grounded in the supplied evidence or external sources.
Why “Bullshit” instead of “Hallucination”?
Hallucination is the term of art in LLM research, and we use it in the technical write-ups. The Bullshit Index name picks up something the academic term misses: indifference to truth. An LLM does not lie — it generates plausible continuations. Bullshit, in the precise sense Harry Frankfurt defined in On Bullshit, is speech produced without regard for whether it is true. That is exactly what the meter detects.
How it works in a debate run
- Every agent turn is parsed for atomic claims — extracted into a structured ledger keyed to the turn that introduced them.
- Each claim is verified against the shared evidence pack first (highest weight), then web cross-check (medium weight), then prior-turn consistency (drift detection).
- Verified claims pass through. Hedged claims are flagged but not penalized. Contradicted or fabricated claims push the meter and dock the agent's evidence and calibration dimension scores.
- The arbiter sees the per-agent Bullshit Index alongside the dimension scores when assembling the final verdict.
How it integrates with M-MAD scoring
The Bullshit Index is not a replacement for the M-MAD arbiter — it is a feeder. M-MAD scores debates across independent dimensions (correctness, evidence use, responsiveness, calibration, citation quality). The Bullshit Index produces evidence for two of those dimensions: evidence use and citation quality. A high Bullshit Index does not mean the verdict is wrong; it means the reasoning chain is fragile and should not be cited without manual review.
Frequently asked
What is the Bullshit Index?
The Bullshit Index is MAD Studio's real-time hallucination meter. It extracts every claim made by an agent and cross-references it against your evidence pack, the public web, and the agent's earlier turns. Hallucinated citations, drifted positions, false precision, and contradicted statements all push the meter higher.
What does a high Bullshit Index score mean?
A high score means a session contains many unsupported, fabricated, or contradicted claims. It does not necessarily mean the verdict is wrong — but it means the reasoning chain is fragile and should not be cited without manual review. The transcript shows exactly which claims triggered the score.
Can the Bullshit Index be wrong?
Yes — current false-positive rate on internal benchmarks is 0.04%, and false negatives are higher (hallucinations the layer misses). Treat it as a strong prior, not a verdict. Every flag links to the specific turn, claim span, and verification source so reviewers can audit any disagreement.