this post was submitted on 11 Jun 2026

99 points (99.0% liked)

Fuck AI

7069 readers

2256 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

TrickDacy@lemmy.world

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

Spell checker hallucinates Anaconda bug report (github.com)

submitted 2 days ago* (last edited 1 day ago) by rounding_error@lemmy.today to c/fuck_ai@lemmy.world

69 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] pixxelkick@lemmy.world 4 points 2 days ago (2 children)

Yeah, LLMs are gonna spin their wheels hard when it comes to testing anything at the kernel/os level, if you dont have automated testing with a virtual machine setup to actually be able to replicate a bug, you 100% just cannot test anything they produce or say

As soon as you have the ability to go "Okay we have a failing test, make it pass", the LLMs get a lot less stupid, because instead of just randomly fumbling around and guessing, they have actual feedback to iterate on and can actually chew on it til they fix the issue or give up.

[–] jj4211@lemmy.world 1 points 16 hours ago (1 children)

Not just automated testing but, for CodeGen to really work 'agentic' like:

You need that automated test case to trigger the misbehavior 100% of the time (often, the act of figuring out how to trigger the misbehavior means you already know the fix, but not always)
That automated test case needs to be succinct and as much as possible, feed only the problematic output back to the CodeGen. CodeGen can easily get distracted by irrelevant input
That automated test needs to be very quick from time to code change to test case completion. Even with everything just right, expect the CodeGen to basically thrash around guessing things that sound right but to no avail. Most attempts summed up as: "Ok, the problem is absolutely caused by , and here is the definite fix and it is complete but just double checking... Ok, that didn't quite fully fix it... see next attempt. So a long test case can make it take an eternity as the CodeGen has to wait and run it over and over and over again, while a human might actually reason through it.
You need to let the token hose go. It's guessing and it can take quite a few guesses to get right.
Be prepared for pointless code changes along the way. It makes guesses and often leaves the wrong guesses in, doing nothing at all to help the problem, but potentially having side effects. It decides that while it didn't work, it must have been a part of the solution, and that it must be left in.
Consequently, you better have an amazing test suite to capture the likely side effects of those spurious changes, or be prepared to unwind the progress and extricate the result manually.

[–] pixxelkick@lemmy.world 1 points 7 hours ago

Absolutely 100% all of this, though with a lot of other tricks like caveman mode and careful skill files and helper scripts to help the agent quickly surgical extract out just the useful output, you can substantially reduce token burn and improve its memory.

As well as carefully having it rollback changes everytime a fix doesn't work, and having ut keep a markdown file log of each fix it tried and the results, so it can review each thing it tried previously.

[–] schipelblorp@sh.itjust.works 2 points 1 day ago (1 children)

I'm not a programmer, but isn't reproducing a reported bug step 1?

[–] pixxelkick@lemmy.world 4 points 1 day ago

Reproducing the bug with an automated test is harder, its code you can run that tests your other code.

But allows you to just 1 click run it and get a yes/no "is this still broken" output without having to manually reproduce it by hand each time.

Whats important is this is in the domain of what LLMs can actually work with, the output of the test is something they can parse and iterate on until it works.

They execute the command to run the test, check the output, and keep working til the test passes.

They can add additional tests to help isolate the problem, or strip down the existing test until its doing the absolute bare min steps to reproduce, in order to narrow the scope of whats causing it.

But when your test involves stuff running in the kernel of an OS, your automated tests meed to effectively be code you write that bootstraps a virtual machine up and manipulates and observes that second machines kernel...

You can do it, but its one of the most complicated forms of automated tests to design and run!