Health Alert AI Chatbots Like ChatGPT and Gemini Found Unreliable for Medical Advice in New Study

Experts warn that AI chatbots frequently offer 'highly' problematic medical advice, potentially endangering users due to biased training and a lack of factual accuracy. A study of five popular chatbots revealed half of their responses were problematic, highlighting an urgent need for public education, professional training, and regulatory oversight to safeguard public health.

Precious Eseaye • Health • 3 months ago • 2 minute read •

Health Alert AI Chatbots Like ChatGPT and Gemini Found Unreliable for Medical Advice in New Study

A study published in the British Medical Journal has raised concerns about AI-driven chatbots providing problematic medical advice that could pose risks to users.

The research found that AI chatbots generated problematic responses in roughly half of all health-related queries, with some answers potentially directing users toward ineffective treatments or harmful decisions.

The issue is compounded by growing reliance on AI tools for everyday information, as many users increasingly turn to chatbots for medical guidance despite limitations in accuracy, bias in training data, and a tendency to generate responses aligned with user expectations rather than verified scientific evidence.

The evaluation assessed five widely used platforms, including ChatGPT, Gemini, DeepSeek, Meta AI, and Grok, across ten health-related topics such as cancer, vaccines, nutrition, stem cells, and athletic performance.

The findings showed that about one-third of responses were somewhat problematic, while approximately 20 percent were classified as highly problematic.

Open-ended prompts, particularly those involving performance-enhancing drugs or alternative treatments, generated the most concerning responses.

Overall performance differences between chatbots were minimal, though some produced more problematic responses than others, while citation quality and evidence referencing remained inconsistent, incomplete, or fabricated in several cases.

Chatbots performed best on well-researched topics such as vaccines and cancer but struggled with emerging or controversial subjects including nutrition, stem cells, and athletic performance.

Readability analysis also revealed that most responses were written at a complex level requiring advanced education to fully understand. The findings highlight broader concerns that AI chatbots often produce authoritative-sounding answers without adequately weighing scientific evidence or ethical considerations.

As AI tools continue to expand in healthcare and everyday use, the study shows the need for stronger oversight, improved public awareness, and clearer safeguards to ensure that generative AI supports informed health decisions rather than contributing to misinformation or potential medical harm.