Yeah, LLMs are gonna spin their wheels hard when it comes to testing anything at the kernel/os level, if you dont have automated testing with a virtual machine setup to actually be able to replicate a bug, you 100% just cannot test anything they produce or say
As soon as you have the ability to go "Okay we have a failing test, make it pass", the LLMs get a lot less stupid, because instead of just randomly fumbling around and guessing, they have actual feedback to iterate on and can actually chew on it til they fix the issue or give up.