When a Big Four Firm Got Caught Hallucinating: Lessons from a $2 Million AI Disaster
In October 2025, law professor Chris Rudge was reading a government report on Australia’s welfare compliance system when something caught his eye. The report cited a book called The Rule of Law and Administrative Justice in the Welfare State by Professor Lisa Burton Crawford.1
There was just one problem: the book doesn’t exist.
“I instantaneously knew it was either hallucinated by AI or the world’s best kept secret,” Rudge told the Associated Press.1 What he uncovered next would spark an international scandal and raise serious questions about how even the world’s most trusted consulting firms are using artificial intelligence.
The $290,000 Report Full of Fabrications
A Big Four consulting firm’s Australian office had been hired by the Department of Employment and Workplace Relations for approximately $290,000 USD to conduct an independent review of the Targeted Compliance Framework—an automated system that penalizes jobseekers who miss welfare obligations.2 The report was published in July 2025.
Professor Rudge catalogued roughly 20 errors, including ten references to that nonexistent book and—perhaps most troublingly—a fabricated quote attributed to Federal Court Judge Jennifer Davies (whose name was also misspelled as “Davis” in the report).1
“They’ve totally misquoted a court case then made up a quotation from a judge,” Rudge said. “That’s about misstating the law to the Australian government in a report that they rely on.”1
Senator Barbara Pocock put it bluntly: “The kinds of things that a first-year university student would be in deep trouble for.”2
Then It Happened Again
Just six weeks later, history repeated itself—this time in Canada.
A $1.6 million report from the same firm on healthcare workforce planning for Newfoundland and Labrador was found to contain multiple fabricated citations.3 The 526-page document cited researchers on papers they never wrote, referenced academic journals that couldn’t be found in any database, and included fictional collaborations between researchers who had never worked together.4
Gail Tomblin Murphy, an adjunct professor at Dalhousie University, discovered she had been cited on a paper that “does not exist.”4 Her reaction was telling: “It sounds like if you’re coming up with things like this, they may be pretty heavily using AI to generate work.”4
Where They Went Wrong
After both incidents, the firm acknowledged using OpenAI’s GPT-4o to assist with portions of the reports.2 Their defense? “AI was not used to write the report; it was selectively used to support a small number of research citations.”3
But that defense reveals the core problem: citations are precisely where hallucinations matter most.
When you ask off-the-shelf AI like ChatGPT to generate or verify citations, you’re asking it to do something it’s fundamentally unreliable at—retrieve specific, verifiable facts about the existence of academic papers, their authors, and their contents. These tools don’t search databases; they predict plausible-sounding text. And a plausible-sounding citation to a nonexistent paper is worthless at best, dangerous at worst.
Here’s what likely went wrong in their process:
1. Using off-the-shelf AI for the wrong task. Citation work requires retrieval from verified sources—databases, library systems, actual documents. Consumer AI tools are prediction engines, not retrieval engines. They excel at synthesis and drafting, not at confirming whether a specific paper exists in a specific journal.
2. No source verification workflow. There’s no indication that anyone clicked through to verify the cited sources actually existed before publication. A single Google Scholar search would have revealed that Lisa Burton Crawford never wrote a book on welfare state administrative justice.
3. The “looks right” problem. AI-generated citations are dangerous precisely because they look legitimate. They follow proper formatting, include realistic author names, and reference plausible publication venues. Without systematic verification, they pass casual review.
4. No traceability to source documents. When a human researcher cites a source, they typically have the actual document open, highlighting the specific passage. When consumer AI generates a citation, there’s no underlying document—just a statistically likely string of text.
The Deeper Issue: Detection Took Months
Perhaps the most alarming aspect of these incidents is the timeline. The Australian report was published in July 2025. The errors weren’t discovered until October—three months later—and only because a subject matter expert happened to recognize that a cited book in his field didn’t exist.1
How many AI hallucinations in professional reports go undetected because no expert happens to scrutinize them? How many fabricated citations are sitting in government policy documents, legal briefs, and corporate analyses right now?
This is the real risk. Off-the-shelf AI hallucinations are systematically undetectable without rigorous source verification processes.
How Document Intelligence Should Work
At Statvis, we built our document processing pipeline around a simple principle: every insight must trace back to a specific source, on a specific page, in a specific document.
Our hybrid search architecture combines the pattern-matching power of AI with structured retrieval from verified document sets. When our system surfaces information, it doesn’t generate plausible-sounding citations—it shows you exactly where on the source document that information came from.
This matters for three reasons:
Verification is instant. When you can click through to see the exact page and passage, confirming accuracy takes seconds, not hours of database searches.
Hallucinations become obvious. If the system claims something came from a document, but you can see the actual document doesn’t say that, the error is immediately apparent. There’s no possibility of a “plausible-sounding but nonexistent” citation.
Trust is auditable. Every output maintains a clear chain of custody back to source materials. When stakeholders ask “where did this come from?”—and they will—you have a concrete answer.
The Big Four firm’s AI mishaps came from building workflows that severed the connection between outputs and sources. When your process allows AI to generate citations without any mechanism to verify they correspond to real documents, you’ve created the conditions for exactly these kinds of failures.
The Bottom Line
The Big Four consulting firms aren’t going to stop using AI—nor should they. These tools genuinely accelerate research and analysis when used appropriately. But “appropriately” means maintaining rigorous traceability between AI-assisted insights and verified source documents.
The question every organization should ask before publishing AI-assisted analysis: Can we show exactly where every claim came from?
If the answer is no, you’re one curious law professor away from your own $2 million headline.
Yes, we used AI to help write this blog post. No, we didn’t let it make up the citations. You can click every single one.
Footnotes
-
Above the Law. “Law Professor Catches Deloitte Using Made-Up AI Hallucinations In Government Report.” October 2025. ↩ ↩2 ↩3 ↩4 ↩5
-
Fortune. “Deloitte was caught using AI in $290,000 report to help the Australian government.” October 7, 2025. ↩ ↩2 ↩3
-
Fortune. “Deloitte allegedly cited AI-generated research in a million-dollar report for a Canadian provincial government.” November 25, 2025. ↩ ↩2
-
The Independent (Newfoundland). “Major N.L. healthcare report contains errors likely generated by A.I.” November 2025. ↩ ↩2 ↩3