Why superintelligent AI isn't taking over anytime soon
A primary requirement for being a leader in AI these days is to be a herald of the impending arrival of our digital messiah: superintelligent AI.
For Dario Amodei of Anthropic, Demis Hassabis of Google and Sam Altman of OpenAI, it isn’t enough to claim that their AI is the best. All three have recently insisted that it’s going to be so good, it will change the very fabric of society.
Even Meta—whose chief AI scientist has been famously dismissive of this talk—wants in on the action. The company confirmed it is spending $14 billion to bring in a new leader for its AI efforts who can realize Mark Zuckerberg’s dream of AI superintelligence—that is, an AI smarter than we are.
“Humanity is close to building digital superintelligence," Altman declared in an essay this week, and this will lead to “whole classes of jobs going away" as well as “a new social contract." Both will be consequences of AI-powered chatbots taking over all our white-collar jobs, while AI-powered robots assume the physical ones.
Before you get nervous about all the times you were rude to Alexa, know this: A growing cohort of researchers who build, study and use modern AI aren’t buying all that talk.
The title of a fresh paper from Apple says it all: “The Illusion of Thinking." In it, a half-dozen top researchers probed reasoning models—large language models that “think" about problems longer, across many steps—from the leading AI labs, including OpenAI, DeepSeek and Anthropic. They found little evidence that these are capable of reasoning anywhere close to the level their makers claim.
Generative AI can be quite useful in specific applications, and a boon to worker productivity. OpenAI claims 500 million monthly active ChatGPT users—astonishingly far reach and fast growth for a service released just 2½ years ago. But these critics argue there is a significant hazard in overestimating what it can do, and making business plans, policy decisions and investments based on pronouncements that seem increasingly disconnected from the products themselves.
Apple’s paper builds on previous work from many of the same engineers, as well as notable research from both academia and other big tech companies, including Salesforce. These experiments show that today’s “reasoning" AIs—hailed as the next step toward autonomous AI agents and, ultimately, superhuman intelligence—are in some cases worse at solving problems than the plain-vanilla AI chatbots that preceded them. This work also shows that whether you’re using an AI chatbot or a reasoning model, all systems fail utterly at more complex tasks.
Apple’s researchers found “fundamental limitations" in the models. When taking on tasks beyond a certain level of complexity, these AIs suffered “complete accuracy collapse." Similarly, engineers at Salesforce AI Research concluded that their results “underscore a significant gap between current LLM capabilities and real-world enterprise demands."
Importantly, the problems these state-of-the-art AIs couldn’t handle are logic puzzles that even a precocious child could solve, with a little instruction. What’s more, when you give these AIs that same kind of instruction, they can’t follow it.
Apple’s paper has set off a debate in tech’s halls of power—Signal chats, Substack posts and X threads—pitting AI maximalists against skeptics.
“People could say it’s sour grapes, that Apple is just complaining because they don’t have a cutting-edge model," says Josh Wolfe, co-founder of venture firm Lux Capital. “But I don’t think it’s a criticism so much as an empirical observation."
The reasoning methods in OpenAI’s models are “already laying the foundation for agents that can use tools, make decisions, and solve harder problems," says an OpenAI spokesman. “We’re continuing to push those capabilities forward."
The debate over this research begins with the implication that today’s AIs aren’t thinking, but instead are creating a kind of spaghetti of simple rules to follow in every situation covered by their training data.
Gary Marcus, a cognitive scientist who sold an AI startup to Uber in 2016, argued in an essay that Apple’s paper, along with related work, exposes flaws in today’s reasoning models, suggesting they’re not the dawn of human-level ability but rather a dead end. “Part of the reason the Apple study landed so strongly is that Apple did it," he says. “And I think they did it at a moment in time when people have finally started to understand this for themselves."
In areas other than coding and mathematics, the latest models aren’t getting better at the rate that they once did. And the newest reasoning models actually hallucinate more than their predecessors.
“The broad idea that reasoning and intelligence come with greater scale of models is probably false," says Jorge Ortiz, an associate professor of engineering at Rutgers, whose lab uses reasoning models and other cutting-edge AI to sense real-world environments. Today’s models have inherent limitations that make them bad at following explicit instructions—the opposite of what you’d expect from a computer, he adds.
It’s as if the industry is creating engines of free association. They’re skilled at confabulation, but we’re asking them to take on the roles of consistent, rule-following engineers or accountants.
That said, even those who are critical of today’s AIs hasten to add that the march toward more-capable AI continues.
Exposing current limitations could point the way to overcoming them, says Ortiz. For example, new training methods—giving step-by-step feedback on models’ performance, adding more resources when they encounter harder problems—could help AI work through bigger problems, and make better use of conventional software.
From a business perspective, whether or not current systems can reason, they’re going to generate value for users, says Wolfe.
“Models keep getting better, and new approaches to AI are being developed all the time, so I wouldn’t be surprised if these limitations are overcome in practice in the near future," says Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania, who has studied the practical uses of AI.
Meanwhile, the true believers are undeterred.
Just a decade from now, Altman wrote in his essay, “maybe we will go from solving high-energy physics one year to beginning space colonization the next year." Those willing to “plug in" to AI with direct, brain-computer interfaces will see their lives profoundly altered, he adds.
This kind of rhetoric accelerates AI adoption in every corner of our society. AI is now being used by DOGE to restructure our government, leveraged by militaries to become more lethal, and entrusted with the education of our children, often with unknown consequences.
Which means that one of the biggest dangers of AI is that we overestimate its abilities, trust it more than we should—even as it’s shown itself to have antisocial tendencies such as “opportunistic blackmail"—and rely on it more than is wise. In so doing, we make ourselves vulnerable to its propensity to fail when it matters most.
“Although you can use AI to generate a lot of ideas, they still require quite a bit of auditing," says Ortiz. “So for example, if you want to do your taxes, you’d want to stick with something more like TurboTax than ChatGPT."
Write to Christopher Mims at [email protected]