Seems AI for cancer info and advice might not yet be ready for prime time. For the immediate future, I'd trust something like Cancer Hacker Lab long before a Chatbot for info and guidance on treatment, esp. if it was other than SOC.
* * *
Chatbots Not Always Reliable for Cancer Treatment Advice - Studies show their potential, but reveal clear issues with treatment information reliability, by Mike Bassett, Staff Writer, MedPage Today August 24, 2023.
Chatbots had mixed results when it came to providing direct-to-patient cancer-related advice and treatment strategies for a wide variety of cancers, according to two studies in JAMA Oncology.
When testing GPT-3.5 (OpenAI) with prompts designed to obtain treatment strategies for different kinds of cancers, they found that while most answers were in accordance with National Comprehensive Cancer Network (NCCN) guidelines, one-third were at least partially nonconcordant, reported Danielle Bitterman, MD, of Mass General Brigham and Harvard Medical School in Boston, and colleagues in a research letter.
They suggested that clinicians "advise patients that LLM [large language model] chatbots are not a reliable source of treatment information."
Findings from the second study -- which tested four AI chatbots including GPT-3.5 on direct-to-patient advice -- were more positive, suggesting that their use "generally" produced accurate information on cancer-related search inquiries, but that these responses were not readily actionable and are written at a college level, according to Abdo Kabarriti, MD, of the State University of New York Downstate Health Sciences University in New York City, and colleagues.
"Findings of this study suggest that AI chatbots are an accurate and reliable supplementary resource for medical information," wrote Kabarriti and colleagues, "but are limited in their readability and should not replace healthcare professionals for individualized healthcare questions."
In an editorial accompanying the two studies, Atul Butte, MD, PhD, of the University of California San Francisco, said that while the results of these studies may suggest "our core belief in GPT technology as a clinical partner has not sufficiently been earned yet," the chatbots used in these studies are off the shelf and likely do not have specific healthcare training.
"Newer LLMs are now being released that have specific healthcare training, such as Google's Med-PaLM 2," he wrote. "Future medical evaluation studies are likely going to need to compare across several LLMs."
Moreover, Butte said the real potential of these tools in cancer care is that they can be trained from the very best centers, and then used "to deliver the right best care through digital tools to all patients, especially to those who do not have the resources or privilege to get that level of care."
Treatment Recommendations
For their study, Bitterman and colleagues developed four prompt templates for treatment recommendations for 26 different kinds of cancers (for a total of 104 prompts), and benchmarked the chatbot's recommendations against 2021 NCCN guidelines. Concordance of the chatbot output with NCCN guidelines was assessed by board-certified oncologists.
Findings showed the chatbot provided at least one recommendation for 102 of 104 (98%) prompts and all outputs with a recommendation included at least one NCCN-concordant treatment. However, 35 of 102 (34.3%) of these outputs also recommended one or more nonconcordant treatments, with 13 of 104 responses (12.5%) "hallucinated," meaning they weren't part of any recommended treatment.
"The chatbot did not purport to be a medical device, and need not be held to such standards," Bitterman and colleagues wrote. "However, patients will likely use such technologies in their self-education, which may affect shared decision-making and the patient-clinician relationship. Developers should have some responsibility to distribute technologies that do not cause harm, and patients and clinicians need to be aware of these technologies' limitations."
Consumer Health Info
In their study, Kabarriti and colleagues inputted Google Trends' top five search queries related to skin, lung, breast, colorectal, and prostate cancer into four chatbots. Outcomes included the quality of consumer health information based on the DISCERN instrument (a scale of 1-5, with 1 representing low quality) and the understandability and actionability of this information based on domains of the Patient Education Materials Assessment Tool (PEMAT), with scores ranging from 0% to 100%, with higher scores indicating a higher level of understandability and actionability.
They determined the quality of text responses generated by the four AI chatbots was good (median DISCERN score of 5, with no misinformation identified). Understandability was considered moderate (median PEMAT Understandability score of 66.7%) but actionability was poor (median PEMAT Actionability score of 20%), with authors noting that responses were written at the college level. "This finding suggests that AI chatbots use medical terminology that may not be familiar or useful for lay audiences," Kabarriti and colleagues said.
"These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information," they added. "To this end, AI chatbots typically encourage users to seek medical attention relating to cancer symptoms and treatment."
AI and LLMs are not yet perfect and can carry biases, Butte said in his editorial.
"These algorithms will need to be carefully monitored as they are brought into health systems," he continued. "But this does not alter the potential of how they can improve care for both the haves and have-nots of healthcare."
* * *
Here are links to 1. Medpage Article, 2. Dr. Bitterman's JAMA Research Letter, 3. the referenced "second study" JAMA Brief Report (full paper behind paywall), and 4. Dr. Atul Butte's Editorial (also behind paywall):
1. medpagetoday.com/hematology...
2. jamanetwork.com/journals/ja...
3. jamanetwork.com/journals/ja...
4. jamanetwork.com/journals/ja...
This from the JAMA Brief Report summarizes the current state on AI Chatbots for cancers:
* * *
Results The analysis included 100 responses from 4 chatbots about the 5 most common search queries for skin, lung, breast, colorectal, and prostate cancer. The quality of text responses generated by the 4 AI chatbots was good (median [range] DISCERN score, 5 [2-5]) and no misinformation was identified. Understandability was moderate (median [range] PEMAT Understandability score, 66.7% [33.3%-90.1%]), and actionability was poor (median [range] PEMAT Actionability score, 20.0% [0%-40.0%]). The responses were written at the college level based on the Flesch-Kincaid Grade Level score.
Conclusions and Relevance Findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.
* * *
I think I would trust "expert AI" for readings on scans, but not sure I'm ready to turn the other aspects of my care over to it just yet - except, maybe, as a second opinion to challenge the flesh and blood intelligence I now rely on for my care.
Say "Hello" to your new Chatbot MO for us - and Stay S&W,
Ciao - K9