Anthropic CEO: Humans May Hallucinate More Than Advanced AI Systems

Dario Amodei, CEO of Anthropic, recently made bold claims at significant tech events, including VivaTech 2025 in Paris and the “Inaugural Code with Claude” developer day. He announced that modern AI models, particularly the newly released Claude 4 series, can exhibit superior factual accuracy compared to humans in structured scenarios and may hallucinate less frequently when addressing factual and structured questions.
In the realm of artificial intelligence, "hallucination" describes instances where AI tools like ChatGPT, Gemini, Copilot, or Claude misinterpret commands, data, or context. This misinterpretation leads to the AI filling knowledge gaps with assumptions that are not always factual or real, effectively generating fabricated information. Amodei suggests that recent advancements have potentially reversed this situation, primarily within controlled conditions, with humans now potentially hallucinating more than AI in such settings.
To support his assertions, Amodei referenced Anthropic’s internal testing during his keynote at VivaTech. These tests involved Claude 3.5 participating in structured factual quizzes against human participants. The results reportedly demonstrated a significant shift in reliability, indicating Claude 3.5's enhanced precision in straightforward question-answer tasks.
Further elaborating at the developer-focused “Code with Claude” event, where the Claude Opus 4 and Claude Sonnet 4 models were unveiled, Amodei emphasized that the factual accuracy of AI models is heavily dependent on factors such as prompt design, the context provided, and the specific domain of application. This is particularly critical in high-stakes environments such as legal filings or healthcare. He stressed this statement whilst acknowledging a recent legal dispute involving confabulations made by Claude.
Despite these advancements, Amodei promptly admitted that AI hallucinations have not been completely eradicated. He acknowledged that the models remain vulnerable to errors but asserted that they can be used with optimum accuracy if fed the right information and used correctly. The key to minimizing errors lies in careful and proper use.
While modern AI models like the new Claude 4 series are making steady progress towards factual precision, especially in structured tasks, their overall reliability still hinges on appropriate and careful application. As Amodei suggested, effective prompt design and consideration of domain context are critical. In the ongoing interplay between human and artificial intelligence, it is clear that the pursuit of truth and accuracy is a shared endeavor between humans and machines.