1. In this clinical decision support study, the performance of the Chat Generative Pre-training Transformer (ChatGPT) in recommending appropriate imaging for breast pain and cancer patient scenarios was assessed according to the American College of Radiology (ACR) Appropriateness Criteria.
2. ChatGPT versions 3.5 and 4 were correct 88.9% and 98.4% of the time for ‘select all that apply’ (SATA) questions on breast cancer screening cases compared to 58.3% and 77.7% for questions on breast pain cases.
Evidence Rating Level: 1 (Excellent)
Study Rundown: ChatGPT is a transformer-based language model which leverages human input and supervised learning to generate text responses to inputs. Emerging literature suggests that ChatGPT may aid in clinical decision support, with the capacity to be trained to assess patient data in the context of previous medical literature. The present study presented ChatGPT versions 3.5 and 4 with prompts containing information on patients with breast pain or who were being screened for breast cancer. Prompts were either written as open-ended (OE) questions or in a SATA format on image modality selection for the cases. Answers were assessed according to the ACR Appropriateness Criteria for breast pain and cancer screening. ChatGPT performed with moderate accuracy for both breast pain and breast cancer screening, with a higher percentage of correct answers for breast cancer screening questions. Additionally, it performed better on SATA than OE prompts with ChatGPT 4 achieving higher scores overall than ChatGPT 3.5. As data on the development of ChatGPT is private, a major limitation of this study is that it is not known if it has been previously exposed to ACR criteria. This impairs the assessment of ChatGPT’s decision-making. Overall, the study demonstrates that ChatGPT is able to aid in radiologic decision-making with moderate accuracy in the context of breast pain and cancer screening.
Click to read the study in the Journal of the American College of Radiology
Relevant Reading: Large language model (ChatGPT) as a support tool for breast tumor board
In-Depth [clinical decision support study]: This clinical decision support study assessed ChatGPT’s accuracy in selecting appropriate imaging for breast cancer screening and breast pain patient prompts. ChatGPT versions 3.5 and 5 were used, with answers being scored against ACR Appropriateness Criteria. For each prompt used, three separate team members put the information into ChatGPT, resulting in three outputs per version of ChatGPT per prompt. The answers provided by ChatGPT were then scored independently by two team members, between which there were no disagreements regarding any answers. OE outputs were scored out of two, while SATA outputs were scored out of 100%. In terms of SATA prompts, ChatGPT scored 88.9% (version 3.5) and 98.4% (version 4) for breast cancer screening questions as well as 1.83 for OE prompts (version 3.5 and 4). For breast pain imaging selection, ChatGPT versions 3.5 and 4 had lower scores SATA (58.3% and 77.7%, respectively) and OE scores (1.125 and 1.666, respectively). Notably, ChatGPT was more likely to provide paragraph-form reasoning for the outputs provided on OE prompts. Though the accuracy of ChatGPT version 4 did not vary with the severity of cases presented, version 3.5 performed better on breast cancer screening cases with low severity and breast pain cases with higher severity. ChatGPT version 4 also differed in that it was able to identify a scenario where no imaging was needed. In summary, the present study provides critical information on the performance of ChatGPT versions 3.5 and 4 in recommending imaging modalities for patients being screened for breast cancer and assessed for breast pain.
Image: PD
©2023 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc.