GPT-4 Achieves High Accuracy in Classifying Brain MRI Reports, Rivaling Expert Neuroradiologists

Researchers generated 50 synthetic MRI reports with varied Fazekas scores using GPT-4-based models, SinteticRMFazekasGPT and FazekasGPT, to test the system.
A recent study highlights GPT-4's potential as a tool for automatically classifying brain MRI reports according to the Fazekas scale, which measures white matter abnormalities.
The research demonstrates that GPT-4 can effectively classify MRI reports, showing high agreement with expert neuroradiologists, especially achieving 100% accuracy for certain Fazekas scores.
Statistical analysis using Cohen's Kappa revealed an almost perfect agreement of 0.94, underscoring GPT-4's reliability in this classification task.
The study found GPT-4 matched the neuroradiologist's assessments perfectly for Fazekas scores 0, 2, and 3, and achieved 86.7% accuracy for score 1, with only two reports misclassified.
An expert neuroradiologist reviewed these reports, and GPT-4's classifications were compared with the expert's assessments to evaluate accuracy.
The study concludes that GPT-4 could assist radiologists and improve consistency in MRI reporting, although further validation is needed.
Limitations of the study include a small sample size of synthetic reports, lack of comparison with other AI models, and reliance on a single expert for ground truth, indicating the need for broader validation.

Summary based on 1 source

Get a daily email with more AI stories

SpringerOpen • Aug 25, 2025