this post was submitted on 09 Feb 2026

550 points (98.9% liked)

Technology

80928 readers

4862 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

550

Chatbots Make Terrible Doctors, New Study Finds (www.404media.co)

submitted 1 day ago* (last edited 1 day ago) by XLE@piefed.social to c/technology@lemmy.world

142 comments fedilink hide all child comments

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

(page 3) 43 comments

sorted by: hot top controversial new old

[–] Treczoks@lemmy.world 9 points 1 day ago

One needs a study for that?

[–] theunknownmuncher@lemmy.world 18 points 1 day ago (1 children)

A statistical model of language isn't the same as medical training??????????????????????????

[–] scarabic@lemmy.world 5 points 1 day ago* (last edited 1 day ago) (2 children)

It’s actually interesting. They found the LLMs gave the correct diagnosis high-90-something percent of the time if they had access to the notes doctors wrote about their symptoms. But when thrust into the room, cold, with patients, the LLMs couldn’t gather that symptom info themselves.

[–] Hacksaw@lemmy.ca 5 points 1 day ago (2 children)

LLM gives correct answer when doctor writes it down first.... Wowoweewow very nice!

[–] tyler@programming.dev 1 points 17 hours ago (5 children)

You have misunderstood what they said.

load more comments (5 replies)

load more comments (1 replies)

[–] SuspciousCarrot78@lemmy.world -1 points 13 hours ago* (last edited 1 hour ago)

Funny how people over look that bit enroute to dunk on LLMs.

If anything, that 90% result supports the idea that Garbage In = Garbage Out. I imagine a properly used domain-tuned medical model with structured inputs could exceed those results in some diagnostic settings (task-dependent).

Iirc, the 2024 Nobel prize in chemistry was won on the basis of using ML expert system to investigate protein folding. ML =! LLM but at the same time, let's not throw the baby out with the bathwater.

EDIT: for the lulz, I posted my above comment in my locally hosted bespoke llm. It politely called my bullshit out (Alpha fold is technically not an expert system, I didn't cite my source for Med-Palm 2 claims). As shown, not all llm are tuned sycophantic yes man; there might be a sliver of hope yet lol.

The statement contains a mix of plausible claims and minor logical inconsistencies. The core idea—that expert systems using ML can outperform simple LLMs in specific tasks—is reasonable.

However, the claim that "a properly used expert system LLM (Med-PALM-2) is even better than 90% accurate in differentials" is unsupported by the provided context and overreaches from the general "Garbage In = Garbage Out" principle.

Additionally, the assertion that the 2024 Nobel Prize in Chemistry was won "on the basis of using ML expert system to investigate protein folding" is factually incorrect; the prize was awarded for AI-assisted protein folding prediction, not an ML expert system per se.

Confidence: medium | Source: Mixed

[–] GnuLinuxDude@lemmy.ml 14 points 1 day ago (1 children)

If you want to read an article that’s optimistic about AI and healthcare, but where if you start asking too many questions it falls apart, try this one

https://text.npr.org/2026/01/30/nx-s1-5693219/

Because it’s clear that people are starting to use it and many times the successful outcome is it just tells you to see a doctor. And doctors are beginning to use it, but they should have the professional expertise to understand and evaluate the output. And we already know that LLMs can spout bullshit.

For the purposes of using and relying on it, I don’t see how it is very different from gambling. You keep pulling the lever, oh excuse me I mean prompting, until you get the outcome you want.

[–] HeyThisIsntTheYMCA@lemmy.world 2 points 1 day ago (1 children)

the one time my doctor used it and i didn't get mad at them (they did the google and said "the ai says" and I started making angry Nottingham noises even though all the ai did was tell us exactly what we had just been discussing was correct) uh, well that's pretty much it I'm not sure where my parens are supposed to open and close on that story.

load more comments (1 replies)

[–] Tollana1234567@lemmy.today 1 points 19 hours ago* (last edited 19 hours ago) (1 children)

its basically a convoluted version of webmd. even MD mods in medical subs are more accurate.

load more comments (1 replies)

[–] JoMiran@lemmy.ml 9 points 1 day ago

[–] homes@piefed.world 8 points 1 day ago* (last edited 1 day ago)

This is a major problem with studies like this : they approach from a position of assuming that AI doctors would be competent rather than a position of demanding why AI should ever be involved with something so critical, and demanding a mountain of evidence to prove why it is worthwhile before investing a penny or a second in it

“ChatGPT doesn’t require a wage,” and, before you know it, billions of people are out of work and everything costs 10000x your annual wage (when you were lucky enough to still have one).

How long until the workers revolt? How long have you gone without food?

[–] thesohoriots@lemmy.world 6 points 1 day ago (1 children)

This says you’re full of owls. So we doing a radical owlectomy or what?

[–] HeyThisIsntTheYMCA@lemmy.world 3 points 1 day ago

[–] supersquirrel@sopuli.xyz 4 points 1 day ago

pikachufacegravestone.jpeg

[–] HubertManne@piefed.social 4 points 1 day ago

its not ready to take any role. It should not be doing anything but assiting. So yeah you can talk to a chat bot instead of filling out that checklist and the output might be useful to the doc while he then talks with you.

[–] Rhoeri@piefed.world 4 points 1 day ago* (last edited 1 day ago)

So the same tech that lonely incels use to make themselves feel important doesn’t make good doctors? Ya don’t say?

[–] Lembot_0006@programming.dev 4 points 1 day ago

You know what else is a bad doctor? My axe!

[–] FelixCress@lemmy.world 3 points 1 day ago

... You don't say.

[–] cecilkorik@piefed.ca 3 points 1 day ago

It's great at software development though /s

Remember that when software written by AI will soon replace all the devices doctors use daily.

[–] NuXCOM_90Percent@lemmy.zip 2 points 1 day ago (1 children)

How much of that is the chat bot itself versus humans just being horrible at self reporting symptoms?

That is why "bedside manner" is so important. Connect the dots and ask follow up questions for clarifications or just look at a person and assume they are wrong. Obviously there are some BIG problems with that (ask any black woman, for example) but... humans are horrible at reporting symptoms.

Which gets back to how "AI" is actually an incredible tool (especially in this case when it is mostly a human language interface to a search engine) but you still need domain experts in the loop to understand what questions to ask and whether the resulting answer makes any sense at all.

Yet, instead, people do the equivalent of just raw dogging whatever the first response on stack overflow is.

[–] snooggums@piefed.world -1 points 1 day ago* (last edited 1 day ago) (1 children)

Rawdogging the first response from stack overflow to try and fix a coding issue isn't going to kill someone.

[–] NuXCOM_90Percent@lemmy.zip 3 points 1 day ago (1 children)

It is if your software goes anywhere near infrastructure or safety.

Which is literally what musk and the oligarchs were arguing as a way to "fix" Air Traffic Control. And that is far from the first time tech charlatans have wanted to "disrupt" an industry.

[–] snooggums@piefed.world -1 points 1 day ago (1 children)

Someone who uses stack overflow to solve a problem will be doing testing to confirm it worked as part of an overall development workflow.

Using an LLM as a doctor is like vibe coding, where there is no testing or quality control.

[–] NuXCOM_90Percent@lemmy.zip 2 points 1 day ago* (last edited 1 day ago)

So... they wouldn't be raw dogging stack overflow? Because raw dogging the code you get from a rando off stack overflow is a bad idea?

Because you can just as easily use generative AI as a component in test driven development. But the people pushing to "make coders more efficient" are looking at firing people. And they continue to not want to add the guard rails that would mean they fire 1 engineer instead of 5.

[–] MutantTailThing@lemmy.world 1 points 1 day ago

No shit. Got a weird pain in you elbow? Youve got brain cancer buddy!

[–] sbv@sh.itjust.works -3 points 1 day ago (1 children)

It looks like the LLMs weren't trained for medical tasks. The study would be more interesting if it had been run on something built for the task.

load more comments (1 replies)

[–] Imgonnatrythis@sh.itjust.works -4 points 1 day ago (2 children)

This makes sense. However doctors aren't perfect either and one thing properly trained AI should excel at is helping doctors make rare diagnoses or determine additional testing for some diagnoses. I don't think it's quite there yet but probably close to being a tool a well trained doc could use as an adjunct to traditional inquiry. Certainly not something end users should be fiddling with with any sort of trust though. Much of doctor decision making happens based on experience - experience biases towards common diagnoses which usually works out because well, statistics, but it does lead to misdiagnosis of rare disorders. An Ai should be more objective about these.

[–] XLE@piefed.social 1 points 1 day ago* (last edited 1 day ago)

Even if AI works correctly, I don't see responsible use of it happening, though. I already say nightmarish vertical video footage of doctors checking ChatGPT for answers...

Edit: great talk, good to know the AI true believers can handle discussion of reality

[–] HeyThisIsntTheYMCA@lemmy.world -3 points 1 day ago

An Ai should be more objective about these.

they tend more toward the middle than the outliers. law of the instrument and all

load more comments