this post was submitted on 28 May 2026
280 points (99.0% liked)
Fuck AI
7069 readers
1541 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
And we're still in the "let's be cheap and try and undercut each other" phase, before the snake eats its own tail. Things only get more expensive from here.
Meanwhile whole careers are in shambles because of these greedy asshats.
What a fucking joke.
We might jump into a new phase of this as tech advances. Other companies are trying to create different ways of running the models which will be substantially cheaper.
For example, one is exploring etching the models directly into the silicon and have built a rapid workflow to go from model to silicon, while another is trying to etch the transformer architecture into the silcon.
If any of these new ideas work, it could really upend things and start another phase of everyone trying to undercut everyone, and also be really bad for the likes of Nvidia.
Edit: Just as an example, the etched model one gets just shy of 17,000 tokens/s on a Llama 3.1 8b model, where a Nvidia H200 gets 230. But how they're going to scale this up to a more meaningfully sized model I dunno.
Trying to etch models into a chip is a dead end until we reach “peak” quality.
However, unless they include some kind of LoRA (low-rank adaptation) adapter onto the silicon, it severely limits the utility of whatever model or architecture they choose. Being able to modify the weights is way more useful.
Honestly, diffusion decoders are probably where we’ll end up some day. Not end there, but that’s probably the next logical step in the throughput chain.
General purpose compute is infinitely more valuable during times of great software improvements than highly specialized compute.
Things like Tensor Processing Units (TPUs) still aren’t ubiquitous yet, even though they’ve been around for 10+ years. They’re Too specialized to allow for reasonable flexibility on testing.
They claim they can apply Lora's to it, and that at a data centre scale, it will pay for itself in a year vs existing GPU methods... but who knows if any of that is true.
They'd need a pretty good recycle process set up to get rid of cards that are no longer useful after a couple years as well.
But ya, maybe this is future future, once we have these amazing models that don't need to be changing often.
Edit: and some models would be better suited for it than others. A creative writing model is less likely to suffer not being updated as frequently as a programming model for example.