I’ve recently found myself discussing artificial intelligence (AI) with people—including some doctors—who had a very limited understanding of AI in medicine. Their knowledge was mostly confined to large language models (LLMs) like ChatGPT or Perplexity, along with a list of their known limitations (many of which have already been largely addressed, indicating how outdated some professionals' views can be). So, I decided to write this short post (rant!) to clarify the current situation.
In recent years, there's been much talk about Artificial General Intelligence (AGI), a highly advanced form of AI capable of understanding, learning, and applying knowledge across various domains, much like a human mind. However, for us patients who urgently await new therapies, AGI remains a distant and abstract goal, having limited direct impact on our daily lives.
What really provides immediate and tangible results is specialized AI, meaning systems designed to tackle specific medical and pharmaceutical challenges (at least in our case). Platforms like DeepMind’s AlphaFold, which accurately predicts protein structures, are already significantly transforming pharmaceutical research by accelerating the discovery and development of new treatments. Other key examples include AI tools for early cancer diagnosis developed by PathAI or Tempus, which use machine learning to swiftly and accurately analyze medical imaging.
We must also consider AI's "jagged capabilities curve": machines today excel greatly in certain specialized tasks, often surpassing human capabilities in complex analyses (which is exactly why we should leverage them in pharmaceuticals). Yet they can fail dramatically at tasks that humans find simple, like interpreting emotional nuances in language or handling new and unforeseen contexts.
This is precisely why the tech industry continuously develops new benchmarks to test and improve AI. Tests such as GPT-4, SuperGLUE, or ARC help identify exactly what current AI lacks to move closer to true AGI, highlighting both strengths and weaknesses.
In conclusion, from my perspective as a patient and as a professional, although the idea of genuine AGI fascinates me, the real value today lies in specialized AI, capable of concretely enhancing the quality of life and treatment possibilities for us patients.
I add a picture to explain the concept of jagged frontier of AI capabilities, but remember this is valid for AGI. Most specialized AI implementations go beyond the frontier, like "super humans" but just in a narrow field.
Written by
Maxone73
To view profiles and participate in discussions please or .
Thank you.....I think this is what most here would think. What we don't know is what types of medical questions AI can answer with equal or better acuracy than our physicians?? How do we determine which questions go beyond current AI cpability? Lots of folks here are quoting AI responses to post topics!!
Well, let me give you a more complete answer (which means, take some time off and get yourself comfortable!) When we talk about models that answer our questions we are referring mostly to LLMs (large language models) which is a subclass of AI (which is a huge huge umbrella term).
NOTE: I realized now after I have written the whole answer that I have introduced ambiguity, so let's define two thing to avoid confusion.
- ChatGPT, DeepSeek, LLAMA, Gemini, etc. are AI language platforms or systems—large language models. They’re like AI engines trained separately, each with unique architectures and capabilities.
- Within these models there are different versions or variations within each platform, which are still called models! These internal models vary by size, training data, speed, precision, and functionality. You select an internal model depending on the accuracy, speed, and depth you need. For example ChatGPT (at least the paid versions) has: GPT 4.5, o3, o3-mini-high, 4o and so on, all of them with different abilities.
So when I talk about models I might be referring to the general model or to the internal model of a general model.
OK, now...
When you read that this model or that model is more capable to diagnose a disease than a doctor, or to design a new protein (for example) usually you fall in two categories:
1) It's not an LLM model in the strict meaning (it does not interact/produce through a commonly used human language): Alphafold, Alphaproteo, EVO 2 and similar, speak the language of math and biology, the reply and interact through formulas, they produce a string of genes as output and stuff like that. They are built to be used by human specialists. Their impact is great but they are, in a way, much simpler than general LLMs. The main reason is that math/science/programming languages are not ambiguous languages. This allows questions to be asked precisely and answers to be equally precise (if not always correct, at least precise, which makes it easier to identify potential mistakes). Furthermore, it's very easy to provide immediate feedback to the system regarding any errors made, because detecting a logical error in a mathematical formula is simpler than determining whether the correct meaning has been assigned to an ambiguous sentence. This, in turn, makes it easier to "reward" the system when it performs well and "punish" it when it performs poorly (reinforcement learning), which is precisely how these neural networks are trained.
Practical example: Insilico is a biotechnology company that specializes in using artificial intelligence and deep learning methods for drug discovery and development. They developed a generative AI that is targeted for that and they outsource their systems to a bunch of big pharma companies.
2) It is an LLM model, but trained on a precise dataset. For example, LLAMA, DeepSeek, and Gemini are open-source models. If I have the financial resources, I can download these pre-trained models and further train them on a specific dataset (for instance, tens of thousands of PET scans from deceased prostate cancer patients, along with the initial diagnosis provided by the doctor who first reviewed them, and the final diagnosis at the time of the patient's death to verify the accuracy of the initial assessment). At that point, the system starts identifying patterns that, given the extremely high number of variables involved, would be impossible for a human to detect. Some startups have already developed predictive models that outperform humans in diagnoses, also because humans are subject to fatigue, might have argued with their spouse the previous night, may be pressed for time, or perhaps had a few too many drinks with friends the evening before, etc. 😀
OK, enough intro. What can we do with what is given to us to have answers that make sense?
1) Avoid free models for the kind of research that involves our disease. Free models usually work well for very general questions, brainstorming, generating texts, developing marketing ideas, translating texts, creating small programs, etc. Paid models (most of them) incorporate a feature called "deep research," meaning the model takes significantly more time to respond (even 10-20 minutes), but spends this extra time carefully evaluating different hypotheses and conducting real-time searches to find accurate sources for its information. Some free models (Grok and Perplexity come to mind, but I think Gemini now does as well) offer a certain number of free "deep research" responses per day. It's preferable to use deep research always.
2) Try to ask precise, unambiguous questions. Some models struggle with admitting they don't know an answer (a problem that's currently being addressed), and therefore attempt to respond regardless. This issue is particularly common in free models; paid models with deep research capabilities have almost entirely eliminated hallucinations and the tendency to respond without sufficient data. However, you should still be careful not to introduce your own bias into questions. For example, asking "What is the current status of research on xyz?" versus "Is it true that xyz can do this and that?" can produce significantly different outcomes.
3) When using models that offer deep research, always ask for a list of sources and explicitly request that only authoritative sources be considered. This forces the model to undertake additional work gathering data before processing it, but it also helps exclude sources that, while perhaps popular among users, may be regarded as unreliable by experts.
4) Always verify at least the main sources referenced in the answer. If you have difficulty understanding the statistics or language in general, copy and paste the text (or share the URL) into another AI—or even the same AI in a separate window—and ask it for a clear summary. AI models are particularly effective at simplifying statistical jargon and explaining its meaning clearly. Personally, I find Google's free tool NotebookLM particularly helpful for tasks like this.
5) Use multiple models, especially when they're free! Ask the same question to different models, or cross-check their responses to confirm consistency and accuracy (and sources).
This is what I would suggest to someone that is starting to explore this world.
There are also deep reasoning models for example, which are incredibly good at math and especially at coding.
I’ve literally seen people doing "vibe coding"—asking an LLM to generate Python code for running a flight simulator or a popular game, with numerous customizations from the original version—and having fully working code ready in 30 seconds. As a software engineer, I find the speed at which I can develop things using these systems absolutely incredible. At the same time, I can usually tell within seconds if the model has started off on the wrong foot—especially with deep reasoning models, since they make their thought process visible.
I see these systems as a powerful extension of my own skills. But okay, that’s a whole other story.
Content on HealthUnlocked does not replace the relationship between you and doctors or other healthcare professionals nor the advice you receive from them.
Never delay seeking advice or dialling emergency services because of something that you have read on HealthUnlocked.