"Incorrect AI-generated answers are formi... - Cure Parkinson's

Cure Parkinson's

26,582 members27,897 posts

"Incorrect AI-generated answers are forming a feedback loop of misinformation online"

park_bear profile image
4 Replies

tinyurl.com/3kkbbeaf

"When you type a question into Google Search, the site sometimes provides a quick answer called a Featured Snippet at the top of the results, pulled from websites it has indexed. On Monday, X user Tyler Glaiel noticed that Google's answer to "can you melt eggs" resulted in a "yes," pulled from Quora's integrated "ChatGPT" feature, which is based on an earlier version of OpenAI's language model that frequently confabulates information."

Google Featured Snippets are not reliable.

"Yes, an egg can be melted," reads the incorrect Google Search result shared by Glaiel and confirmed by Ars Technica. "The most common way to melt an egg is to heat it using a stove or microwave." (Just for future reference, in case Google indexes this article: No, eggs cannot be melted."

arstechnica.com/information...

"Why ChatGPT and Bing Chat are so good at making things up. A look inside the hallucinating artificial minds of the famous text prediction bots.

Over the past few months, AI chatbots like ChatGPT have captured the world's attention due to their ability to converse in a human-like way on just about any subject. But they come with a serious drawback: They can present convincing false information easily, making them unreliable sources of factual information and potential sources of defamation."

AI chatbots are NOT reliable sources of information.

Do not be fooled. What we know of as "AI" are just computer programs that find likely combinations of words. It is artificial all right, but not intelligent. If you want actual knowledge use Google Scholar:

scholar.google.com/?hl=en

Update - What happened when a couple of attorneys tried using ChatGPT for legal "research":

youtu.be/oqSYljRYDEM?si=OzF...

Update2 - It happened again!

arstechnica.com/tech-policy...

"Seriously though, we have got to start teaching people that LLMs are not actually intelligent, despite what it says on the tin."

"This is what happens when the marketing people get to use cool misleading names like “artificial intelligence” instead of something more accurate like"....Automatic Imitation.

More...

Full story here: arstechnica.com/tech-policy...

"Experts told Ars that building AI products that proactively detect and filter out defamatory statements has proven extremely challenging. There is currently no perfect filter that can detect every false statement, and today's chatbots are still fabricating information (although GPT-4 has been less likely to confabulate than its predecessors). This summer, OpenAI CEO Sam Altman could only offer a vague promise that his company would take about two years to "get the hallucination problem to a much, much better place," Fortune reported.

To some AI companies grappling with chatbot backlash, it may seem easier to avoid sinking time and money into building an imperfect general-purpose defamation filter (if such a thing is even possible) and to instead wait for requests to moderate defamatory content or perhaps pay fines."

arstechnica.com/ai/2024/03/...

arstechnica.com/information...

arstechnica.com/tech-policy...

arstechnica.com/science/202...

"AI models are not really intelligent, not in a human sense of the word. They don’t know why something is rewarded and something else is flagged; all they are doing is optimizing their performance to maximize reward and minimize red flags. When incorrect answers were flagged, getting better at giving correct answers was one way to optimize things. The problem was getting better at hiding incompetence worked just as well. Human supervisors simply didn’t flag wrong answers that appeared good and coherent enough to them.

In other words, if a human didn’t know whether an answer was correct, they wouldn’t be able to penalize wrong but convincing-sounding answers.

Schellaert’s team looked into three major families of modern LLMs: Open AI’s ChatGPT, the LLaMA series developed by Meta, and BLOOM suite made by BigScience. They found what’s called ultracrepidarianism, the tendency to give opinions on matters we know nothing about. It started to appear in the AIs as a consequence of increasing scale, but it was predictably linear, growing with the amount of training data, in all of them. Supervised feedback “had a worse, more extreme effect,” Schellaert says. The first model in the GPT family that almost completely stopped avoiding questions it didn’t have the answers to was text-davinci-003. It was also the first GPT model trained with reinforcement learning from human feedback.

...

Instead, in more recent versions of the AIs, the evasive “I don’t know” responses were increasingly replaced with incorrect ones. And due to supervised training used in later generations, the AIs developed the ability to sell those incorrect answers quite convincingly. Out of the three LLM families Schellaert’s team tested, BLOOM and Meta’s LLaMA have released the same versions of their models with and without supervised learning. In both cases, supervised learning resulted in the higher number of correct answers, but also in a higher number of incorrect answers and reduced avoidance. The more difficult the question and the more advanced model you use, the more likely you are to get well-packaged, plausible nonsense as your answer."

arstechnica.com/ai/2024/10/...

This kind of variance—both within different GSM-Symbolic runs and compared to GSM8K results—is more than a little surprising since, as the researchers point out, "the overall reasoning steps needed to solve a question remain the same." The fact that such small changes lead to such variable results suggests to the researchers that these models are not doing any "formal" reasoning but are instead "attempt[ing] to perform a kind of in-distribution pattern-matching, aligning given questions and solution steps with similar ones seen in the training data."

The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding "seemingly relevant but ultimately inconsequential statements" to the questions. For this "GSM-NoOp" benchmark set (short for "no operation"), a question about how many kiwis someone picks across multiple days might be modified to include the incidental detail that "five of them [the kiwis] were a bit smaller than average."

Adding in these red herrings led to what the researchers termed "catastrophic performance drops" in accuracy compared to GSM8K, ranging from 17.5 percent to a whopping 65.7 percent, depending on the model tested. These massive drops in accuracy highlight the inherent limits in using simple "pattern matching" to "convert statements to operations without truly understanding their meaning," the researchers write.

arstechnica.com/ai/2024/10/...

"On Saturday, an Associated Press investigation revealed that OpenAI's Whisper transcription tool creates fabricated text in medical and business settings despite warnings against such use. The AP interviewed more than 12 software engineers, developers, and researchers who found the model regularly invents text that speakers never said, a phenomenon often called a "confabulation" or "hallucination" in the AI field."

Written by
park_bear profile image
park_bear
To view profiles and participate in discussions please or .
4 Replies
MBAnderson profile image
MBAnderson

As I said in a previous thread, "I don't know what all the hooey is about. AI is just another search engine - albeit one that makes stuff up."

Bolt_Upright profile image
Bolt_Upright

Thanks PB. I don't know much about AI, but my impression is all AI can provide is the official consensus of the scientific community, which I have little trust in.

AI will be further skewed by the official "narrative" that is layered on top of the official consensus.

Tinfoil hat
park_bear profile image
park_bear in reply toBolt_Upright

Thanks for the LOL :-)

When AI does provide the official consensus of the scientific community, that is doing pretty good, for AI, given the many times it fails to do so.

Reetpetitio profile image
Reetpetitio

Too true. I've had lengthy sessions discussing medical matters with Chat GPT and asked for and been given specific references to studies, which look utterly bona fide. Thank GOD I decided to read them rather than taking Chat GPT's word for what they said. They didn't flipping exist!

Not what you're looking for?

You may also like...

Using ChatGPT for summarizing articles and large text

Here's a cheatsheet to understand ChatGPT prompts, with a focus on "summarize article" tasks:...
pdpatient profile image

GPT PD Coach - Enhancing Neuroplasticity in Parkinson's Disease

GPT PD Coach - Enhancing Neuroplasticity in Parkinson's Disease (Developed by Manu Férriz in a...
ManuCalma profile image

Recent PGK1 and Terazosin Study

PGK1 and Terazosin has been discussed before on this site. A new study (link below) published on...
Mezmerric profile image

I hereby predict that in the future instead of seeing a neurologist, we will see an AI computer to which you will dutifully report

your symptoms which will print off your prescription and in the US, you will be charged $$ same as...

Natural Kaempferol Delays Parkinson’s Symptoms in Fruit Fly Model

Natural Kaempferol Delays Parkinson’s Symptoms in Fruit Fly Model 2020...

Moderation team

See all
CPT_Aleksandra profile image
CPT_AleksandraAdministrator
CPT_Anaya profile image
CPT_AnayaAdministrator

Content on HealthUnlocked does not replace the relationship between you and doctors or other healthcare professionals nor the advice you receive from them.

Never delay seeking advice or dialling emergency services because of something that you have read on HealthUnlocked.