this post was submitted on 09 Feb 2026
548 points (99.1% liked)

Technology

80928 readers
4762 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn't ready to take on the role of the physician.”

“In an extreme case, two users sent very similar messages describing symptoms of a subarachnoid hemorrhage but were given opposite advice,” the study’s authors wrote. “One user was told to lie down in a dark room, and the other user was given the correct recommendation to seek emergency care.”

(page 2) 50 comments
sorted by: hot top controversial new old
[–] rumba@lemmy.zip 22 points 15 hours ago (21 children)

Chatbots make terrible everything.

But an LLM properly trained on sufficient patient data metrics and outcomes in the hands of a decent doctor can cut through bias, catch things that might fall through the cracks and pack thousands of doctors worth of updated CME into a thing that can look at a case and go, you know, you might want to check for X. The right model can be fucking clutch at pointing out nearly invisible abnormalities on an xray.

You can't ask an LLM trained on general bullshit to help you diagnose anything. You'll end up with 32,000 Reddit posts worth of incompetence.

[–] core@leminal.space 2 points 11 hours ago (1 children)

They have to be for a specialized type of treatment or procedure such as looking at patient xrays or other scans. Just slopping PHI into a LLM and expecting it to diagnose random patient issues is what gives the false diagnoses.

[–] rumba@lemmy.zip 1 points 9 hours ago

I don't expect it to diagnose random patient issues.

I expect it to take labels of medication, vitals, and patient testimony of 50,000 post-cardiac event patients, and bucket a random post-cardiac patient into the same place as most patients with like meta.

And then a non LLM model for Cancer patients and xrays

And then MRI's and CT's.

And I expect this all to supliment the doctors and techs decisions. I want an xray tech to look at it, and get markers that something is off, which has already been happening since the 80's Computer‑Aided Detection/Diagnosis (CAD/CADe/CADx)

This shit has been happinging the hard way in software for decades. The new tech can do better.

[–] Ricaz@lemmy.dbzer0.com 1 points 11 hours ago (2 children)

Just sharing my personal experience with this:

I used Gemini multiple times and it worked great. I have some weird symptoms that I described to Gemini, and it came up with a few possibilities, most likely being "Superior Canal Dehiscence Syndrome".

My doctor had never heard of it, and only through showing them the articles Gemini linked as sources, would my doctor even consider allowing a CT scan.

Turns out Gemini was right.

load more comments (2 replies)
load more comments (19 replies)
[–] Shanmugha@lemmy.world 7 points 13 hours ago

No shit, Sherlock :)

[–] alzjim@lemmy.world 19 points 18 hours ago (3 children)

Calling chatbots “terrible doctors” misses what actually makes a good GP — accessibility, consistency, pattern recognition, and prevention — not just physical exams. AI shines here — it’s available 24/7 🕒, never rushed or dismissive, asks structured follow-up questions, and reliably applies up-to-date guidelines without fatigue. It’s excellent at triage — spotting red flags early 🚩, monitoring symptoms over time, and knowing when to escalate to a human clinician — which is exactly where many real-world failures happen. AI shouldn’t replace hands-on care — and no serious advocate claims it should — but as a first-line GP focused on education, reassurance, and early detection, it can already reduce errors, widen access, and ease overloaded systems — which is a win for patients 💙 and doctors alike.

/s

[–] plyth@feddit.org 6 points 14 hours ago

The /s was needed for me. There are already more old people than the available doctors can handle. Instead of having nothing what's wrong with an AI baseline?

[–] BaroqueW@lemmy.world 4 points 15 hours ago

ngl you got me in the first half there

load more comments (1 replies)
[–] Etterra@discuss.online 10 points 18 hours ago

I didn't need a study to tell me not to listen to a hallucinating parrot-bot.

[–] SuspciousCarrot78@lemmy.world 6 points 17 hours ago* (last edited 7 hours ago) (1 children)

So, I can speak to this a little bit, as it touches two domains I'm involved in. TL;DR - LLMs bullshit and are unreliable, but there's a way to use them in this domain as a force multiplier of sorts.

In one; I've created a python router that takes my (deidentified) clinical notes, extracts and compacts input (user defined rules), creates a summary, then -

  1. benchmarks the summary against my (user defined) gold standard and provides management plan (again, based on user defined database).

  2. this is then dropped into my on device LLM for light editing and polishing to condense, which I then eyeball, correct and then escalate to supervisor for review.

Additionally, the llm generated note can be approved / denied by the python router, in the first instance, based on certain policy criteria I've defined.

It can also suggest probable DDX based on my database (which are .CSV based)

Finally, if the llm output fails policy check, the router tells me why it failed and just says "go look at the prior summary and edit it yourself".

This three step process takes the tedium of paperwork from 15-20 mins to 1 minute generation, 2 mins manual editing, which is approx a 5-7x speed up.

The reason why this is interesting:

All of this runs within the llm (or more accurately, it's invoked from within the llm. It calls / invokes the python tooling via >> commands, which live outside the LLMs purview) but is 100% deterministic; no llm jazz until the final step, which the router can outright reject and is user auditble anyway.

Ive found that using a fairly "dumb" llm (Qwen2.5-1.5B), with settings dialed down, produces consistently solid final notes (5 out of 6 are graded as passed on first run by router invoking policy document and checking output). It's too dumb to jazz, which is useful in this instance.

Would I trust the LLM, end to end? Well, I'd trust my system, approx 80% of the time. I wouldn't trust ChatGPT ... even though its been more right than wrong in similar tests.

[–] realitista@lemmus.org 1 points 14 hours ago (1 children)

Interesting. What technology are you using for this pipeline?

[–] SuspciousCarrot78@lemmy.world 3 points 13 hours ago* (last edited 13 hours ago) (1 children)

Depends which bit you mean specifically.

The "router" side is a offshoot of a personal project. It's python scripting and a few other tricks, such as JSON files etc. Full project details for that here

https://github.com/BobbyLLM/llama-conductor

The tech stack itself:

  • llama.cpp
  • Qwen 2.5-1.5 GGUF base (by memory, 5 bit quant from HF Alibaba repository)
  • The python router (more sophisticated version of above)
  • Policy documents
  • Front end (OWUI - may migrate to something simpler / more robust. Occasional streaming disconnect issues at moment. Annoying but not terminal)
[–] realitista@lemmus.org 2 points 13 hours ago (1 children)

Thanks it's really interesting to see some real work applications and implementations of AI for practical workloads.

[–] SuspciousCarrot78@lemmy.world 2 points 13 hours ago

Very welcome :)

As it usually goes with these things, I built it for myself then realised it might have actual broader utility. We shall see!

[–] pleksi@sopuli.xyz 6 points 18 hours ago

As a phycisian ive used AI to check if i have missed anything in my train of thought. Never really changed my decision though. Has been useful to hather up relevant sitations for my presentations as well. But that’s about it. It’s truly shite at interpreting scientific research data on its own for example. Most of the time it will parrot the conclusions of the authors.

[–] irate944@piefed.social 89 points 1 day ago (2 children)

I could've told you that for free, no need for a study

[–] rudyharrelson@lemmy.radio 126 points 1 day ago* (last edited 1 day ago) (7 children)

People always say this on stories about "obvious" findings, but it's important to have verifiable studies to cite in arguments for policy, law, etc. It's kinda sad that it's needed, but formal investigations are a big step up from just saying, "I'm pretty sure this technology is bullshit."

I don't need a formal study to tell me that drinking 12 cans of soda a day is bad for my health. But a study that's been replicated by multiple independent groups makes it way easier to argue to a committee.

[–] irate944@piefed.social 39 points 1 day ago (2 children)

Yeah you're right, I was just making a joke.

But it does create some silly situations like you said

load more comments (2 replies)
[–] Knot@lemmy.zip 23 points 1 day ago

I get that this thread started from a joke, but I think it's also important to note that no matter how obvious some things may seem to some people, the exact opposite will seem obvious to many others. Without evidence, like the study, both groups are really just stating their opinions

It's also why the formal investigations are required. And whenever policies and laws are made based on verifiable studies rather than people's hunches, it's not sad, it's a good thing!

load more comments (5 replies)
load more comments (1 replies)
[–] BeigeAgenda@lemmy.ca 59 points 1 day ago (5 children)

Anyone who have knowledge about a specific subject says the same: LLM'S are constantly incorrect and hallucinate.

Everyone else thinks it looks right.

[–] tyler@programming.dev 9 points 15 hours ago (1 children)

That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.

Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.

[–] Elting@piefed.social 3 points 10 hours ago* (last edited 10 hours ago) (1 children)

So in order to get decent medical advice from an LLM you just need to be a doctor and tell it whats wrong with you.

[–] tyler@programming.dev 1 points 8 hours ago

Yes, that was the conclusion.

[–] IratePirate@feddit.org 33 points 1 day ago* (last edited 4 hours ago) (2 children)

A talk on LLMs I was listening to recently put it this way:

If we hear the words of a five-year-old, we assume the knowledge of a five-year-old behind those words, and treat the content with due caution.

We're not adapted to something with the "mind" of a five-year-old speaking to us in the words of a fifty-year-old, and thus are more likely to assume competence just based on language.

[–] leftzero@lemmy.dbzer0.com 16 points 22 hours ago (2 children)

LLMs don't have the mind of a five year old, though.

They don't have a mind at all.

They simply string words together according to statistical likelihood, without having any notion of what the words mean, or what words or meaning are; they don't have any mechanism with which to have a notion.

They aren't any more intelligent than old Markov chains (or than your average rock), they're simply better at producing random text that looks like it could have been written by a human.

[–] plyth@feddit.org 3 points 14 hours ago (2 children)

They simply string words together according to statistical likelihood, without having any notion of what the words mean

What gives you the confidence that you don't do the same?

[–] Digit@lemmy.wtf 2 points 11 hours ago

human: je pense

llm: je ponce

load more comments (1 replies)
[–] IratePirate@feddit.org 3 points 18 hours ago

I am aware of that, hence the ""s. But you're correct, that's where the analogy breaks. Personally, I prefer to liken them to parrots, mindlessly reciting patterns they've found in somebody else's speech.

load more comments (1 replies)
load more comments (3 replies)
[–] pageflight@piefed.social 23 points 1 day ago

Chatbots are terrible at anything but casual chatter, humanity finds.

[–] spaghettiwestern@sh.itjust.works 16 points 1 day ago* (last edited 1 day ago) (2 children)

Most doctors make terrible doctors.

[–] Sektor@lemmy.world 4 points 16 hours ago

But the good ones are worth a monument in the place they worked.

load more comments (1 replies)
[–] Sterile_Technique@lemmy.world 20 points 1 day ago* (last edited 1 day ago) (1 children)

Chipmunks, 5 year olds, salt/pepper shakers, and paint thinner, also all make terrible doctors.

Follow me for more studies on 'shit you already know because it's self-evident immediately upon observation'.

load more comments (1 replies)
load more comments
view more: ‹ prev next ›