Behavior Labs

Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023

Comparative analysis of LLM performance on ABTS exam questions.

Nicholas King

June 15, 2024

AI Governance

Status: Published

Authors: Zain Khalpey, Ujjawal Kumar, Nicholas King, Alyssa Abraham, Amina H. Khalpey

Abstract

This study presents a comparative analysis of four large language models (LLMs) evaluated against American Board of Thoracic Surgery (ABTS) examination questions from 2023. As artificial intelligence continues to advance in medical education and clinical decision support, understanding the capabilities and limitations of these models in specialized surgical domains becomes increasingly important.

The analysis examines model performance across key cardiothoracic surgery knowledge domains including cardiac surgery, thoracic surgery, congenital heart disease, and critical care. Results demonstrate varying levels of competency across models, with notable strengths in evidence-based reasoning and clinical knowledge synthesis.

These findings contribute to the growing body of evidence on AI capabilities in medical education and highlight both the potential and current limitations of LLMs as tools for surgical training and knowledge assessment.