this post was submitted on 23 Feb 2026
375 points (97.5% liked)

Technology

81772 readers
3476 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the 'reasoning' models.

you are viewing a single comment's thread
view the rest of the comments
[–] Slashme@lemmy.world 38 points 6 hours ago (3 children)

The most common pushback on the car wash test: "Humans would fail this too."

Fair point. We didn't have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between "drive" and "walk," no additional context, past 10,000 real people through their human feedback platform.

71.5% said drive.

So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

[–] bluesheep@sh.itjust.works 3 points 1 hour ago

I saw that and hoped it is cause of the dead Internet theory. At least I hope so cause I'll be losing the last bit of faith in humanity if it isn't

[–] T156@lemmy.world 18 points 5 hours ago (1 children)

It is an online poll. You also have to consider that some people don't care/want to be funny, and so either choose randomly, or choose the most nonsensical answer.

[–] yakko@feddit.uk 0 points 3 hours ago

I wonder... If humans were all super serious, direct, and not funny, would LLMs trained on their stolen data actually function as intended? Maybe. But such people do not use LLMs.

[–] masterofn001@lemmy.ca 7 points 5 hours ago* (last edited 5 hours ago) (3 children)

Without reading the article, the title just says wash the car.

I could go for a walk and wash my car in my driveway.

Reading the article... That is exactly the question asked. It is a very ambiguous question.

[–] Geth@lemmy.dbzer0.com 1 points 47 minutes ago

Mentioning the car wash and washing the car plus the possibility of driving the car in the same context pretty much eliminates any ambiguity. All of the puzzle pieces are there already.

I guess this is an uninteded autism test as well if this is not enough context for someone to understand the question.

[–] bluesheep@sh.itjust.works 3 points 1 hour ago

Without reading the article, the title just says wash the car.

No it doesn't? It says:

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

In which world is that an ambiguous question?

[–] elucubra@sopuli.xyz 3 points 3 hours ago

It is not. It says what I want to do, and where.