Encouragement vs. liability: How prompt engineering influences ChatGPT-4's radiology exam performance
Nguyen, Daniel ; MacKenzie, Allison ; Kim, Young H
Citations
Student Authors
Faculty Advisor
Academic Program
UMass Chan Affiliations
Document Type
Publication Date
Keywords
Subject Area
Embargo Expiration Date
Link to Full Text
Abstract
Large Language Models (LLM) like ChatGPT-4 hold significant promise in medical application, especially in the field of radiology. While previous studies have shown the promise of ChatGTP-4 in textual-based scenarios, its performance on image-based response remains suboptimal. This study investigates the impact of prompt engineering on ChatGPT-4's accuracy on the 2022 American College of Radiology In Training Test Questions for Diagnostic Radiology Residents that include textual and visual-based questions. Four personas were created, each with unique prompts, and evaluated using ChatGPT-4. Results indicate that encouraging prompts and those disclaiming responsibility led to higher overall accuracy (number of questions answered correctly) compared to other personas. Personas that threaten the LLM with legal action or mounting clinical responsibility were not only found to score less, but also refrain of answering questions at a higher rate. These findings highlight the importance of prompt context in optimizing LLM responses and the need for further research to integrate AI responsibly into medical practice.
Source
Nguyen D, MacKenzie A, Kim YH. Encouragement vs. liability: How prompt engineering influences ChatGPT-4's radiology exam performance. Clin Imaging. 2024 Nov;115:110276. doi: 10.1016/j.clinimag.2024.110276. Epub 2024 Sep 6. PMID: 39288636.