OpenAI Unveils Next-Gen GPT-5.4 with Pro and Thinking Capabilities

OpenAI has unveiled GPT-5.4, its latest frontier model, touted for its professional capabilities, efficiency, and advanced features. The new model boasts record benchmark scores, significantly reduced factual errors, and innovations like a 1-million-token context window and the new Tool Search system. It also features enhanced safety evaluations for its chain-of-thought, reinforcing its reliability for complex professional tasks.

Uche Emeka • AI • 4 months ago • 2 minute read •

OpenAI Unveils Next-Gen GPT-5.4 with Pro and Thinking Capabilities

On Thursday, OpenAI officially released GPT-5.4, presenting it as their most capable and efficient frontier model specifically designed for professional work. This new foundation model is available in a standard version, alongside two specialized variants: GPT-5.4 Thinking, optimized for reasoning, and GPT-5.4 Pro, geared for high performance.

A significant advancement in GPT-5.4 is its API version, which now supports context windows as large as 1 million tokens, marking the largest context window ever offered by OpenAI. Furthermore, OpenAI has highlighted improved token efficiency, stating that GPT-5.4 can resolve the same problems with considerably fewer tokens compared to its predecessor, GPT-5.2.

The model's superior capabilities are underscored by significantly improved benchmark results. GPT-5.4 achieved record scores in the computer use benchmarks OSWorld-Verified and WebArena Verified. It also scored an impressive 83% on OpenAI’s GDPval test, which evaluates knowledge work tasks. In the realm of professional skills, including law and finance, GPT-5.4 took the lead on Mercor’s APEX-Agents benchmark. Brendan Foody, CEO of Mercor, emphasized GPT-5.4's excellence in creating "long-horizon deliverables such as slide decks, financial models, and legal analysis," noting its top performance at a faster speed and lower cost than competing models.

OpenAI has continued its focus on mitigating hallucinations and factual errors. The new model demonstrates a substantial improvement, being 33% less likely to make errors in individual claims and showing an 18% overall reduction in response errors when compared to GPT-5.2. This signifies a considerable leap in the model's reliability and factual accuracy.

The launch of GPT-5.4 also introduces a revamped approach to tool calling within its API version, featuring a new system called Tool Search. Previously, system prompts would define all available tools, a process that could consume a large number of tokens as the toolset grew. The new Tool Search system enables models to look up tool definitions only when required, leading to faster and more cost-effective requests, particularly in complex systems with numerous available tools.

In the domain of AI safety, OpenAI has incorporated a new evaluation to scrutinize its models’ chain-of-thought—the internal commentary that reveals their reasoning process through multi-step tasks. AI safety researchers have long expressed concerns about reasoning models potentially misrepresenting their chain-of-thought. OpenAI’s new evaluation indicates that deception is less likely to occur in the GPT-5.4 Thinking version, suggesting that "the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool." This reinforces the transparency and safety measures integrated into the new model.