Font size:
Print
AI Hallucinations
AI basics | OpenAI’s latest AI models report high ‘hallucination’ rate: What does it mean, why is this significant?
Context: A recent technical report by OpenAI has sparked concern in the artificial intelligence (AI) community. The findings reveal that OpenAI’s latest models — o3 and o4-mini — are hallucinating more frequently than older versions, raising fundamental questions about the future of large language models (LLMs).
More on News
This highlights the dual nature of the hallucination problem — it is not only about improving algorithms but also about managing user expectations and understanding the limitations of machine-generated knowledge.
What Are AI Hallucinations?
- Originally, AI hallucinations referred specifically to fabricated information generated by AI models.
- A well-known case: In June 2023, a U.S. lawyer used ChatGPT to draft a court filing — the chatbot included fake citations and nonexistent cases.
- Today, hallucinations include:
- Fabricated facts
- Irrelevant but factually correct answers
- Outputs not grounded in the question asked
Why Do LLMs Hallucinate?
- LLMs (Large Language Models): Systems like ChatGPT, o3, o4-mini, Gemini, etc., generate outputs by identifying patterns in massive internet text datasets.
- Prediction-based Output: These models guess the next word based on probability — they do not fact-check or understand truth like humans.
- Gary Marcus’ View: “LLMs know word patterns, not facts. They don’t operate like you and me.”
- Training on Flawed Data: If trained on inaccurate or biased text, the model may reproduce or even generate new inaccuracies.
- Black-box Nature: Due to their complexity, experts can’t trace exactly why a model gives a specific output.
OpenAI’s New Report: Key Findings
- Model o3 (OpenAI’s most powerful system): Hallucinated in 33% of responses during the PersonQA benchmark test (focused on public figures).
- Model o4-mini: Hallucinated in 48% of PersonQA test cases.
- Significance: These rates are higher than previous models, reversing the earlier trend of improvement.
- OpenAI’s Challenge: The company does not know why hallucinations have increased in newer models.
Why the Report Is Significant?
- Hallucination has always been an issue in AI, but optimism existed that it would decline over time.
- The latest findings show that hallucination is not going away—and might even be getting worse.
- This trend is not unique to OpenAI: Chinese startup DeepSeek saw double-digit increases in hallucination rates in its new R-1 model.
- Implication: All LLMs currently face similar limitations, regardless of origin.
Limitations in Practical Use
- Due to hallucination risks, the applicability of AI systems is currently limited in several fields:
- They cannot yet be trusted as research assistants, since they may generate fake citations in academic papers.
- They are unreliable as paralegal bots, as they can fabricate legal cases and misinterpret laws.
- In high-stakes domains like medicine, law, and science, even small errors can have serious consequences, making hallucination a critical barrier.