Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023
Comparative analysis of LLM performance on ABTS exam questions.
June 15, 2024
AI GovernanceStatus: Published
Authors: Zain Khalpey, Ujjawal Kumar, Nicholas King, Alyssa Abraham, Amina H. Khalpey
Abstract
This study presents a comparative analysis of four large language models (LLMs) evaluated against American Board of Thoracic Surgery (ABTS) examination questions from 2023. As artificial intelligence continues to advance in medical education and clinical decision support, understanding the capabilities and limitations of these models in specialized surgical domains becomes increasingly important.
The analysis examines model performance across key cardiothoracic surgery knowledge domains including cardiac surgery, thoracic surgery, congenital heart disease, and critical care. Results demonstrate varying levels of competency across models, with notable strengths in evidence-based reasoning and clinical knowledge synthesis.
These findings contribute to the growing body of evidence on AI capabilities in medical education and highlight both the potential and current limitations of LLMs as tools for surgical training and knowledge assessment.