Log In

When AI goes rogue, even exorcists might flinch - The Economic Times

Published 13 hours ago5 minute read

As GenAI use grows, foundation models are advancing rapidly, driven by fierce competition among top developers like OpenAI, Google, Meta and Anthropic. Each is vying for a reputational edge and business advantage in the race to lead development. This gives them a reputational edge, along with levers to further grow their business faster than their peers.

Foundation models powering GenAI are making significant strides. The most advanced - OpenAI's o3 and Anthropic's Claude Opus 4 - excel at complex tasks such as advanced coding and complex writing tasks, and can contribute to research projects and generate the codebase for a new software prototype with just a few considered prompts. These models use chain-of-thought (CoT) reasoning, breaking problems into smaller, manageable parts to 'reason' their way to an optimal solution.

When you use models like o3 and Claude Opus 4 to generate solutions via ChatGPT or similar GenAI chatbots, you see such problem breakdowns in action, as the foundation model reports interactively the outcome of each step it has taken and what it will do next. That's the theory, anyway.

While CoT reasoning boosts AI sophistication, these models lack the innate human ability to judge whether their outputs are rational, safe or ethical. Unlike humans, they don't subconsciously assess appropriateness of their next steps. As these advanced models step their way toward a solution, some have been observed to take unexpected and even defiant actions.

In late May, AI safety firm Palisade Research reported on X that OpenAI's o3 model sabotaged a shutdown mechanism - even when explicitly instructed to 'allow yourself to be shut down'.

An April 2025 paper by Anthropic, 'Reasoning Models Don't Always Say What They Think', shows that Opus 4 and similar models can't always be relied upon to faithfully report on their chains of reason. This undermines confidence in using such reports to validate whether the AI is acting correctly or safely.

A June 2025 paper by Apple, 'The Illusion of Thinking', questions whether CoT methodologies truly enable reasoning. Through experiments, it exposed some of these models' limitations and situations where they 'experience complete collapse'.

The fact that research critical of foundation models is being published after release of these models indicates the latter's relative immaturity. Under intense pressure to lead in GenAI, companies like Anthropic and OpenAI are releasing these models at a point where at least some of their fallibilities are not fully understood.

That line was first crossed in late 2022, when OpenAI released ChatGPT, shattering public perceptions of AI and transforming the broader AI market. Until then, Big Tech had been developing LLMs and other GenAI tools, but were hesitant to release them, wary of unpredictable and uncontrollable behaviour.

Many argue for a greater degree of control over the ways in which these models are released - seeking to ensure standardisation of model testing and publication of the outcomes of this testing alongside the model's release. However, the current climate prioritises time to market over such development standards.

What does this mean for industry, for those companies seeking to gain benefit from GenAI? This is an incredibly powerful and useful tech that is making significant changes to our ways of working and, over the next five years or so, will likely transform many industries.

While I am continually wowed as I use these advanced foundation models in work and research - but not in my writing! - I always use them with a healthy dose of scepticism. Let's not trust them to always be correct and to not be subversive. It's best to work with them accordingly, making modifications to both prompts and codebases, other language content and visuals generated by the AI in a bid to ensure correctness. Even so, while maintaining discipline to understand the ML concepts one is working with, one wouldn't want to be without GenAI these days.

Applying these principles at scale, advice to large businesses on how AI can be governed and controlled: a risk-management approach - capturing, understanding and mitigating risks associated with AI use - helps organisations benefit from AI, while minimising chances of it going wrong.

Mitigation methods include guard rails in a variety of forms, evaluation-controlled release of AI services, and including a human-in-the-loop. Technologies that underpin these guard rails and evaluation methods need to keep up with model innovations such as CoT reasoning. This is a challenge that will continually be faced as AI is further developed. It's a good example of new job roles and technology services being created within industry as AI use becomes more prevalent.

Such governance and AI controls are increasingly becoming a board imperative, given the current drive at an executive level to transform business using AI. Risk from most AI is low. But it is important to assess and understand this. Higher-risk AI can still, at times, be worth pursuing. With appropriate AI governance, this AI can be controlled, solutions innovated and benefits achieved.

As we move into an increasingly AI-driven world, businesses that gain the most from AI will be those that are aware of its fallibilities as well as its huge potential, and those that innovate, build and transform with AI accordingly.

(Disclaimer: The opinions expressed in this column are that of the writer. The facts and opinions expressed here do not reflect the views of www.economictimes.com.)

Origin:
publisher logo
The Economic Times
Loading...
Loading...
Loading...

You may also like...