• Jun 17, 2025 News!JAAI Volume 3, Number 2 is available now   [Click]
  • Mar 12, 2025 News! JAAI Volume 3, Number 1 is available now   [Click]
  • Jan 02, 2025 News!JAAI will adopt Quarterly Frequency from 2025 !
General Information
    • Abbreviated Title: J. Adv. Artif. Intell.
    • E-ISSN: 2972-4503
    • Frequency: Quarterly
    • DOI: 10.18178/JAAI
    • Editor-in-Chief: Prof. Dr.-Ing. Hao Luo
    • Managing Editor: Ms. Jennifer X. Zeng
    • E-mail: editor@jaai.net
Editor-in-chief
Prof. Dr.-Ing. Hao Luo
Harbin Institute of Technology, Harbin, China
 
It is my honor to be the editor-in-chief of JAAI. The journal publishes good papers in the field of artificial intelligence. Hopefully, JAAI will become a recognized journal among the readers in the field of artificial intelligence.


 
JAAI 2025 Vol.3(3):180-186
DOI: 10.18178/JAAI.2025.3.3.180-186

GPT-3.5, Gemini, and GPT-4 Performance on the Advanced Trauma Life Support Exam

Hilary Y. Liu1,*, Mario Alessandri Bonetti1, Alain C. Corcos2, Jenny A. Ziembicki2, Francesco M. Egro1,2
1. Department of Plastic Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA 15219, United States.
2. Department of Surgery, University of Pittsburgh Medical Center, 1350 Locust Street, G103, Pittsburgh, PA 15219, United States.
email: liuh23@upmc.edu (H.Y.L.); m.alessandribonetti@gmail.com (M.A.B.); corcosac@upmc.edu (A.C.C.); ziembickija@upmc.edu (J.A.Z.); francescoegro@gmail.com (F.M.E.)
*Corresponding author

Manuscript submitted May 6, 2025; accepted May 20, 2025; published July 17, 2025


Abstract—The Advanced Trauma Life Support (ATLS) certification evaluates the ability of medical professionals to manage trauma patients effectively in emergency settings. With the rapid evolution of Large Language Models (LLMs), there is growing interest in exploring how these tools might integrate into clinical practice. This study assessed the performance of three LLMs—GPT-3.5, Gemini, and GPT-4—on the ATLS written examinations. Each model answered three different ATLS 10th edition exams. Their responses were compared to official answer keys, and average scores were calculated. Differences in performance among the LLMs were analyzed using chi-square testing. In addition, performance was examined based on question type: direct knowledge questions versus clinical scenario questions. GPT-3.5 achieved an average score of 65%, Gemini 61.7%, and GPT-4 83.3%. Among the three models, only GPT-4 surpassed the passing threshold of 75%. There was no statistically significant difference between the scores of GPT-3.5 and Gemini (p = 0.59). However, GPT-4 significantly outperformed both GPT-3.5 (p = 0.0012) and Gemini (p = 0.0002). No significant differences in performance were noted between direct and clinical scenario questions within each model. GPT-4 demonstrated the ability to successfully pass the ATLS examination, highlighting its advanced technical knowledge. Nonetheless, occasional inaccuracies or "hallucinations" were observed, particularly with more complex questions. With continued development and rigorous validation, LLMs like GPT-4 have the potential to serve as valuable adjuncts in clinical decision-making and trauma education.

keywords—artificial intelligence, Advanced Trauma Life Support (ATLS), ChatGpt, Google, trauma

Cite: Hilary Y. Liu, Mario Alessandri Bonetti, Alain C. Corcos, Jenny A. Ziembicki, Francesco M. Egro,"GPT-3.5, Gemini, and GPT-4 Performance on the Advanced Trauma Life Support Exam," Journal of Advances in Artificial Intelligence, vol. 3, no. 3, pp. 180-186, 2025. doi: 10.18178/JAAI.2025.3.3.180-186

Copyright © 2025 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Copyright © 2023-2025. Journal of Advances in Artificial Intelligence. All rights reserved.

E-mail: editor@jaai.net