Evaluation of ChatGPT and Google Bard Using Prompt Engineering in Cancer Screening Algorithms
Student AuthorsDaniel Swanson
Document TypeJournal Article
MetadataShow full item record
AbstractLarge language models (LLMs) such as ChatGPT and Bard have emerged as powerful tools in medicine, showcasing strong results in tasks such as radiology report translations and research paper drafting. While their implementation in clinical practice holds promise, their response accuracy remains variable. This study aimed to evaluate the accuracy of ChatGPT and Bard in clinical decision-making based on the American College of Radiology Appropriateness Criteria for various cancers. Both LLMs were evaluated in terms of their responses to open-ended (OE) and select-all-that-apply (SATA) prompts. Furthermore, the study incorporated prompt engineering (PE) techniques to enhance the accuracy of LLM outputs. The results revealed similar performances between ChatGPT and Bard on OE prompts, with ChatGPT exhibiting marginally higher accuracy in SATA scenarios. The introduction of PE also marginally improved LLM outputs in OE prompts but did not enhance SATA responses. The results highlight the potential of LLMs in aiding clinical decision-making processes, especially when guided by optimally engineered prompts. Future studies in diverse clinical situations are imperative to better understand the impact of LLMs in radiology.
SourceNguyen D, Swanson D, Newbury A, Kim YH. Evaluation of ChatGPT and Google Bard Using Prompt Engineering in Cancer Screening Algorithms. Acad Radiol. 2023 Dec 15:S1076-6332(23)00618-9. doi: 10.1016/j.acra.2023.11.002. Epub ahead of print. PMID: 38103973.
Permanent Link to this Itemhttp://hdl.handle.net/20.500.14038/52976
RightsCopyright © 2023 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.