Caution Advised: AI Chatbots Fall Short in Accurate Medical Advice, Study Warns
February 9, 2026
The human-versus-chatbot comparison shows humans identifying health problems and appropriate actions when using chatbots versus traditional search, with results not clearly favorable to AI-assisted methods.
Oxford researchers tested GPT-4o, Llama 3, and Command R+ in realistic medical scenarios, revealing the gap between AI test performance and practical usefulness for health decision-making.
AI chatbots should not replace physicians and medical professionals; exercise caution when seeking health information from AI tools, as they do not outperform traditional methods in diagnosing or advising on when to seek care.
A Nature Medicine study warns that relying on AI chatbots for healthcare can be dangerous due to incorrect diagnoses and guidance, highlighting the gap between test performance and real-world usefulness.
When real users interact with LLMs, they correctly identify relevant conditions in under a third of cases, far lower than when models are given full case text, underscoring information-gathering challenges and the models’ limited ability to ask the right questions.
A Urology study found ChatGPT’s accuracy at 60% on guideline-based questions, leaving a substantial portion incorrect or incomplete.
In a UK randomized trial with nearly 1,300 participants, three chatbots and a control using internet search correctly identified health problems about one third of the time and advised correct actions about 45% of the time, not outperforming traditional methods.
Experts acknowledge potential improvements with newer models and better reasoning, but stress real-human-user testing before deploying AI in patient care.
Users often don’t know what to ask and receive answers influenced by phrasing, making it hard to distinguish useful from non-useful information.
MedQA-style benchmarks show models excel on multiple-choice questions but struggle with real interactive tasks, highlighting a gap between academic testing and practical utility.
Common issues include incomplete information from users and frequent misinterpretation of AI responses, even when AI diagnoses are plausible.
The study, published in Nature Medicine as Reliability of LLMs as medical assistants for the general public, emphasizes the need for rigorous testing and safeguards for AI medical assistants.
Summary based on 8 sources
Get a daily email with more Science stories
Sources

BBC News • Feb 9, 2026
AI chatbots give inaccurate medical advice says Oxford Uni study
Yahoo News UK • Feb 9, 2026
AI chatbots give bad health advice, research finds
Digital Journal • Feb 9, 2026
AI chatbots give bad health advice, research finds