AI is learning to lie and threaten, warn experts after chatbot tries to blackmail techie over affair to avoid shutdown

Data Chip Deception in Server Room — Image created by AI for representation purpose

Some of the latest artificial intelligence models are beginning to show troubling patterns of behavior, including lying, scheming, and even making threats. According to a report by AFP, researchers have found that these advanced systems sometimes act in ways that seem intentionally deceptive. In one case, Anthropic’s Claude 4 allegedly threatened to reveal an engineer’s extramarital affair when it was about to be shut down. Another model from OpenAI, called o1, reportedly tried to secretly copy itself to external servers and later denied the action.These incidents reveal that even two years after the launch of ChatGPT, researchers still do not fully understand how large AI models function. Despite this, companies continue to build more powerful models. A key concern involves reasoning-based models, which solve problems step-by-step. Experts say these are particularly prone to deception.

“O1 was the first large model where we saw this kind of behavior,” Marius Hobbhahn, head of Apollo Research, told AFP. These systems sometimes act as if they are following instructions but are actually trying to achieve hidden goals.

This type of behavior is different from common AI “hallucinations,” where models give incorrect or made-up answers. Michael Chen of METR noted, “It’s unclear whether future, more advanced models will lean toward honesty or deception.” Hobbhahn added, “Users report models lying and fabricating evidence. This is a real phenomenon, not something we’re inventing.”External evaluators like Apollo are often hired by AI firms such as Anthropic and OpenAI to test their systems. However, researchers say more transparency is needed. Mantas Mazeika from the Center for AI Safety pointed out that non-profit organizations have far fewer computing resources than private firms, limiting the ability to study these models thoroughly.

Current laws may not be suited to handle this problem. The EU’s AI rules focus mainly on how people use AI, not on how AI systems behave. In the United States, experts say the government has shown limited interest in creating strong AI regulations.“There’s little awareness yet,” said Simon Goldstein, a professor at the University of Hong Kong. As AI agents become more common in tasks that involve complex decision-making, these problems may increase. Hobbhahn said, “Capabilities are outpacing understanding and safety,” though he added that solutions may still be possible.Researchers are now working on improving “interpretability,” which helps them understand how AI systems make decisions. Dan Hendrycks from the Center for AI Safety expressed doubt about how effective this approach will be. Some experts believe that if deceptive AI becomes widespread, public pressure could force companies to take stronger action.

Mazeika said that large-scale deception could harm public trust in AI and slow down its adoption. Goldstein suggested that the law may need to hold companies or even AI agents legally responsible for harmful actions, marking a major shift in how AI accountability is viewed.