Fundamentally models based on an LLM structure are never going to be remotely performant with current technology.
Part of the problem is that as you increase parameters, in some parts of the model the amount of digital processing time is increased close to or more likely above and beyond a magnitude. Absolute best case is somewhere around N^2 per sequence of layers where N is the number of neurons.
additionally floating point multiplication is involved which is like turning two numbers into their own matrix multiplication operation.
Agentic is really more like taking a crappy slot machine that does multiple rolls in one go to improve the odds of a more desirable outcome.
So to simplify the problem, this is like a bunch of stupid monkeys rubbing two rocks together hard enough to create enough friction to make it seem like they have discovered fire.
Taking that analogy a step further, so much effort is being put into rubbing rocks together faster that other important issues like food and shelter are being ignored.