nach oben

06.05.2024 | Continuing Education

Can large language models pass official high-grade exams of the European Society of Neuroradiology courses? A direct comparison between OpenAI chatGPT 3.5, OpenAI GPT4 and Google Bard

verfasst von: Gennaro D’Anna, Sofie Van Cauter, Majda Thurnher, Johan Van Goethem, Sven Haller

Einloggen, um Zugang zu erhalten

Abstract

We compared different LLMs, notably chatGPT, GPT4, and Google Bard and we tested whether their performance differs in subspeciality domains, in executing examinations from four different courses of the European Society of Neuroradiology (ESNR) notably anatomy/embryology, neuro-oncology, head and neck and pediatrics. Written exams of ESNR were used as input data, related to anatomy/embryology (30 questions), neuro-oncology (50 questions), head and neck (50 questions), and pediatrics (50 questions). All exams together, and each exam separately were introduced to the three LLMs: chatGPT 3.5, GPT4, and Google Bard. Statistical analyses included a group-wise Friedman test followed by a pair-wise Wilcoxon test with multiple comparison corrections. Overall, there was a significant difference between the 3 LLMs (p < 0.0001), with GPT4 having the highest accuracy (70%), followed by chatGPT 3.5 (54%) and Google Bard (36%). The pair-wise comparison showed significant differences between chatGPT vs GPT 4 (p < 0.0001), chatGPT vs Bard (p < 0. 0023), and GPT4 vs Bard (p < 0.0001). Analyses per subspecialty showed the highest difference between the best LLM (GPT4, 70%) versus the worst LLM (Google Bard, 24%) in the head and neck exam, while the difference was least pronounced in neuro-oncology (GPT4, 62% vs Google Bard, 48%). We observed significant differences in the performance of the three different LLMs in the running of official exams organized by ESNR. Overall GPT 4 performed best, and Google Bard performed worst. This difference varied depending on subspeciality and was most pronounced in head and neck subspeciality.

Nur mit Berechtigung zugänglich

Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH (2023) Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 307(4):e230424. https://doi.org/10.1148/radiol.230424CrossRefPubMed

Shen Y, Heacock L, Elias J et al (2023) ChatGPT and other large language models are double-edged swords. Radiology 307(2):e230163. https://doi.org/10.1148/radiol.230163CrossRefPubMed

Alkaissi H, McFarlane SI (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. Published online February 2023. https://doi.org/10.7759/cureus.35179

Ismail A, Ghorashi NS, Javan R (2023) New horizons: the potential role of OpenAI’s ChatGPT in clinical radiology. J Am Coll Radiol 20(7):696–698. https://doi.org/10.1016/j.jacr.2023.02.025CrossRefPubMed

Kitamura FC (2023) ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology 307(2):e230171. https://doi.org/10.1148/radiol.230171CrossRefPubMed

Kung TH, Cheatham M, Medenilla A et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198CrossRefPubMedPubMedCentral

Bhayana R, Krishna S, Bleakney RR (2023) Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 307(5):e230582. https://doi.org/10.1148/radiol.230582CrossRefPubMed

Ueda D, Mitsuyama Y, Takita H et al (2023) ChatGPT’s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 308(1). https://doi.org/10.1148/radiol.231040

Biswas S (2023) ChatGPT and the future of medical writing. Radiology 307(2):e223312. https://doi.org/10.1148/radiol.223312CrossRefPubMed

10.

Stokel-Walker C (2023) ChatGPT listed as author on research papers: many scientists disapprove. Nature 613(7945):620–621. https://doi.org/10.1038/d41586-023-00107-zCrossRefPubMed

11.

Lourenco AP, Slanetz PJ, Baird GL (2023) Rise of ChatGPT: it may be time to reassess how we teach and test radiology residents. Radiology 307(5):e231053. https://doi.org/10.1148/radiol.231053CrossRefPubMed

12.

Blüthgen C (2023) Does GPT4 dream of counting electric nodules? Eur Radiol. Published online April 2023. https://doi.org/10.1007/s00330-023-09671-4

13.

OpenAI. Better language models and their implications. Httpsopenaicomblogbetter- Lang-Models. https://openai.com/blog/better-language-models/

14.

Patil NS, Huang RS, Van Der Pol CB, Larocque N (2023) Comparative performance of ChatGPT and Bard in a text-based radiology knowledge assessment. Can Assoc Radiol J. Published online August 14, 2023:08465371231193716. https://doi.org/10.1177/08465371231193716

15.

OpenAI (2023) GPT-4 technical report. Published online March 27, 2023. http://arxiv.org/abs/2303.08774. Accessed 23 Aug 2023

16.

Health TLD (2023) ChatGPT: friend or foe? Lancet Digit Health 5(3):e102. https://doi.org/10.1016/S2589-7500(23)00023-7CrossRef

17.

Ayers JW, Poliak A, Dredze M et al (2023) Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6):589. https://doi.org/10.1001/jamainternmed.2023.1838CrossRefPubMedPubMedCentral

18.

Doo FX, Cook TS, Siegel EL et al (2023) Exploring the clinical translation of generative models like ChatGPT: promise and pitfalls in radiology, from patients to population health. J Am Coll Radiol. Published online July 2023:S1546144023005161. https://doi.org/10.1016/j.jacr.2023.07.007

Titel: Can large language models pass official high-grade exams of the European Society of Neuroradiology courses? A direct comparison between OpenAI chatGPT 3.5, OpenAI GPT4 and Google Bard
verfasst von: Gennaro D’Anna
Sofie Van Cauter
Majda Thurnher
Johan Van Goethem
Sven Haller
Publikationsdatum: 06.05.2024
Verlag: Springer Berlin Heidelberg
Erschienen in: Neuroradiology
Print ISSN: 0028-3940
Elektronische ISSN: 1432-1920
DOI: https://doi.org/10.1007/s00234-024-03371-6

Leitlinien kompakt für die Neurologie

Mit medbee Pocketcards sicher entscheiden.

^{Seit 2022 gehört die medbee GmbH zum Springer Medizin Verlag}

Kostenlos registrieren

Neu im Fachgebiet Neurologie

Demenzkranke durch Antipsychotika vielfach gefährdet

Demenz Nachrichten

Der Einsatz von Antipsychotika gegen psychische und Verhaltenssymptome in Zusammenhang mit Demenzerkrankungen erfordert eine sorgfältige Nutzen-Risiken-Abwägung. Neuen Erkenntnissen zufolge sind auf der Risikoseite weitere schwerwiegende Ereignisse zu berücksichtigen.

Nicht Creutzfeldt Jakob, sondern Abführtee-Vergiftung

29.05.2024 Hyponatriämie Nachrichten

Eine ältere Frau trinkt regelmäßig Sennesblättertee gegen ihre Verstopfung. Der scheint plötzlich gut zu wirken. Auf Durchfall und Erbrechen folgt allerdings eine Hyponatriämie. Nach deren Korrektur kommt es plötzlich zu progredienten Kognitions- und Verhaltensstörungen.

Schutz der Synapsen bei Alzheimer

29.05.2024 Morbus Alzheimer Nachrichten

Mit einem Neurotrophin-Rezeptor-Modulator lässt sich möglicherweise eine bestehende Alzheimerdemenz etwas abschwächen: Erste Phase-2-Daten deuten auf einen verbesserten Synapsenschutz.

Sozialer Aufstieg verringert Demenzgefahr

24.05.2024 Demenz Nachrichten

Ein hohes soziales Niveau ist mit die beste Versicherung gegen eine Demenz. Noch geringer ist das Demenzrisiko für Menschen, die sozial aufsteigen: Sie gewinnen fast zwei demenzfreie Lebensjahre. Umgekehrt steigt die Demenzgefahr beim sozialen Abstieg.

Update Neurologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen