You had me at horded.
You. Had. Me. At. Horded.
You had me at horded.
You. Had. Me. At. Horded.
Do we dare ask why you need 48TB to store media, or do we slowly back out of the room, avoiding eye contact?
It warms the cockles of my heart that I renamed my self hosted LLM's deep thinking mode to Mentats. For shits and giggles, I made it append every "deep thinking" conclusion it makes with [ZARDOZ HAS SPOKEN!].
It's the simple things, really.
I like to secretly imagine it stands for SIG SAUER. Bang = process ded
NPUs yes, TPUs no (or not yet). Rumour has it that Hailo is meant to be releasing a plug in NPU "soon" that accelerates LLM.
I'm still sanguine that 1.58 BITNET models take off. Those could plausibly run at good clip on existing CPUs, no GPU needed.
Super basic medium article for those not in know
~~Necessity~~ spite is usually a good driver...though given BITNET is Microsoft IP....ehh...I won't hold my breath for too long. Still waiting for their 70B model to drop... maybe this year....
Additionally, in windows (linux too?) one could use Moonlight / Sunshine to compute on the GPU and stream to secondary device (either directly, like say to a Chromecast, or via the iGPU to their monitor). Latency is quite small in most circumstances, and allows for some interesting tricks (eg: server GPUs allow you to split GPU into multiple "mini-gpus" - essentially, with the right card, you could host two+ entirely different, concurrent instances of GTA V on one machine, via one physical GPU).
A bit hacky, but it works.
Source: I bought a Tesla P4 for $100 and stuck it in a 1L case.
GPU goes brrr