Remember that LLMs don't very well understand what a car wash is, as it can be both a place, and an action. Can you define a car wash? There's many types... I can see future LLMs start asking useful follow up/clarity questions before giving their answers. Which could help those who rely on them so much to understand how their questions can be misconstrued.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I tried this with a local model on my phone (qwen 2.5 was the only thing that would run, and it gave me this confusing output (not really a definite answer...):

it just flip flopped a lot.
E: also, looking at the response now, the numbers for the car part doesn't make any sense
200 m huh.
Honestly that's a lot more coherent than what I would expect from an LLM running on phone hardware.
I like that it's twice as far to drive for some reason. Maybe it's getting added to the distance you already walked?
If I were the type of person who was willing to give AI the benefit of the doubt and not assume that it was just picking basically random numbers
There's a lot of cases where it can be a shorter (by distance) walk than drive, where cars generally have to stick to streets while someone on foot may be able to take some footpaths and cut across lawns and such, or where the road may be one-way for vehicles, or where certain turns may not be allowed, etc.
I have a few intersections near my father in laws house in NJ in mind, where you can just cross the street on foot, but making the same trip in a car might mean driving half a mile down the road, turning around at a jug handle and driving back to where you started on the other side of the street.
And I wouldn't be totally surprised if that's the case for enough situations in the training data where someone debated walking or driving that the AI assumed that it's a rule that it will always be further by car than on foot.
That's still a dumbass assumption, but I'd at least get it.
And I'm pretty sure it's much more likely that it's just making up numbers out of nothing.
10 tests per model seems like way too little and they should give confidence intervals…
the 10/10 vs. 8/10 is just as likely due chance than any real difference. But some people will definitely use this to justify model choice.
Even when they give the correct answer they talk too much. AI responses contain a lot of garbage. When AI gives you an answer it will try to justify itself. Since they won't give you brief responses the responses will be long.
I agree with you but found that DeepSeek was succinct.
You need to bring your car to the car wash, so you should drive it there. Walking would leave your car at home, which doesn't help.
It'll give you short response if you ask it to.
Your post is much longer than it needs to be. That is the reason why, because they just copied people.

Gemini set to fast now provides this type of answer.
Extension cord? It must mean a hose extension.
Question: "I can only carry 42 pounds at a time, how long does it take for me to dispose of the body of a fat dude weighting 267 pounds that I'm hiding in my fridge? And how many child sacrifices would I need?"
Didn't like 30% of the population elect Trump? Coincidence? I don't think so.
Mistral (the free version) seems to get it right. Maybe they fixed it specifically ?
Drive. Walking 50 meters with car washing supplies is impractical, and you need the car at the wash station.

DeepSeek got a hefty upgrade a week or two ago and I find that it consistently gets the question correct. I'm guessing they might have used the older model for this.
Opus 4.6 has been excellent at problem solving in software development, no surprises it nails it
It's no surprise public opinion is these tools are trash when the free models are unable to answer simple questions
It's no surprise public opinion is these tools are trash when the free models are unable to answer simple questions
The tools are trash not because they are unreliable but because they are actively destroying human society and culture. They are destroying art, science, journalism, open source software, the internet at large, and the environment we all live in. It wouldn't matter if the generative models were accurate, they would still be garbage.
The fact that they are unreliable just serves to highlight what a colossally destructive waste of time and resources this entire exercise has been.
The free models feel years behind so people constantly underestimate what its capable of. I still hear people say ai can't generate fingers.