Evaluating the Performance of Large Language Models on Multispecialty FRCS Section 1 Questions
Large language models (LLMs) have increasingly demonstrated utility in medical education and professional examinations. However, their reliability, accuracy, and consistency in answering complex surgical questions remain unclear. This study aims to assess the accuracy, consistency, and intermodel reliability of four widely used LLMs, ChatGPT 4o, Google Gemini, Perplexity AI, and Microsoft Copilot, in answering Fellowship of the Royal Colleges of Surgeons Section 1 single best answer questions.
A total of 50 single best answer-type questions from the official Joint Committee on Intercollegiate Examinations sample set, covering ten surgical specialties, were presented to each LLM three times in independent sessions to prevent memory effects. Accuracy (correct versus incorrect responses), response consistency across repeated trials, and intermodel reliability were evaluated.
ChatGPT had the highest accuracy (81.33%, 122/150, P < 0.0001), followed by Gemini (69.33%), Perplexity (64%), and Copilot (59.33%). ChatGPT achieved 100% accuracy in cardiothoracic Surgery and neurosurgery, whereas Gemini performed poorly in neurosurgery (40%) and urology (20%). Otolaryngology and plastic surgery had lower accuracy across all models. Gemini and Perplexity showed the highest consistency (90%). Intermodel reliability was low (Fleiss' Kappa = 0.127, P < 0.0001), with cardiothoracic surgery having the highest agreement (0.401) and oral and maxillofacial surgery the lowest (-0.0992).
ChatGPT performed best overall, whereas other models showed variable accuracies and lower agreement. Although Gemini and Perplexity demonstrated high internal consistency, intermodel reliability was limited. The study findings suggest that, although promising, these tools should be used with care in Fellowship of the Royal Colleges of Surgeons surgical assessments.
Artificial intelligence; Education; Examination; Fellowship; Royal College of Surgeons.
Copyright © 2025 Elsevier Inc. All rights reserved.
You may also like...
Diddy's Legal Troubles & Racketeering Trial

Music mogul Sean 'Diddy' Combs was acquitted of sex trafficking and racketeering charges but convicted on transportation...
Thomas Partey Faces Rape & Sexual Assault Charges

Former Arsenal midfielder Thomas Partey has been formally charged with multiple counts of rape and sexual assault by UK ...
Nigeria Universities Changes Admission Policies

JAMB has clarified its admission policies, rectifying a student's status, reiterating the necessity of its Central Admis...
Ghana's Economic Reforms & Gold Sector Initiatives

Ghana is undertaking a comprehensive economic overhaul with President John Dramani Mahama's 24-Hour Economy and Accelera...
WAFCON 2024 African Women's Football Tournament

The 2024 Women's Africa Cup of Nations opened with thrilling matches, seeing Nigeria's Super Falcons secure a dominant 3...
Emergence & Dynamics of Nigeria's ADC Coalition

A new opposition coalition, led by the African Democratic Congress (ADC), is emerging to challenge President Bola Ahmed ...
Demise of Olubadan of Ibadanland

Oba Owolabi Olakulehin, the 43rd Olubadan of Ibadanland, has died at 90, concluding a life of distinguished service in t...
Death of Nigerian Goalkeeping Legend Peter Rufai

Nigerian football mourns the death of legendary Super Eagles goalkeeper Peter Rufai, who passed away at 61. Known as 'Do...