this post was submitted on 24 Jun 2026
73 points (79.7% liked)

Selfhosted

60093 readers
858 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, your post is exempt from this rule as long as you continue to engage in comments.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

(page 3) 35 comments
sorted by: hot top controversial new old
[–] hendrik@palaver.p3x.de 2 points 13 hours ago* (last edited 13 hours ago)

Well, I don't exactly host AI. But some of my software uses AI and/or machine learning. My photo gallery does face detection, I've installed text to speech and speech to text. My Home Assistant has a voice satellite (which is a poor-man's Alexa because I lack the hardware to do voice recognition in realtime). And I also regularly try some large language models and chatbots. But I don't have any real application (yet). And it's slow without a proper GPU. So I'm more or less just messing around. Currently that's with Ministral 3.

[–] realitaetsverlust@piefed.zip 1 points 12 hours ago (3 children)

Jup. Ollama and OpenWebUI is a great stack to tinker with some LLM models. They're kinda useful for aggregating large datasets, translations, frontend development and gathering relevant sources for me to read into. Also, Qwen has been amazing in understanding frameworks without documentation and writing one for me. I had to use some self-developed PHP framework for a task once and without qwen, I would've taken probably two more weeks to get the task done.

MiniCPM has also been REALLY good at image detection, describing it as accurately as possible, feeding it into qwen who then searches what the object could be and returning the result. I always liked google lense and that stack gave me a TEMU-Version of google lense that isn't quite as reliable, but definitely very useful.

load more comments (3 replies)
[–] fizzle@quokk.au 1 points 13 hours ago (1 children)

The short answer is no.

I have played around with ollama and whisper. It's just too slow to be practical. The cost of the hardware is preclusive.

That said, I do selfhost openwebui and use inference end points from huggingface and ovh.

I've never used chatgpt or claude and I have to wonder whether those alternatives are really as terrible as the models available on huggingface. The output is always super plausible but usually just plain wrong.

[–] SuspiciousCarrot78@aussie.zone -2 points 12 hours ago* (last edited 12 hours ago)

They're not. Call them via API on Open Router and see for yourself.

There's a reason OAI and Anthropic are considered best in class and it's not just hype.

[–] brucethemoose@lemmy.world 0 points 10 hours ago* (last edited 10 hours ago) (1 children)

Yep.

I have a RTX 3090 + 128GB CPU RAM.

Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.

Never thought I’d ever run such a thing on my lowly desktop.

For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.


…And honestly, I use GLM 5.2 API a good bit.

I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.

load more comments (1 replies)
[–] irmadlad@lemmy.world 1 points 14 hours ago (8 children)

I've tried just about most of the small models. Tried NanoClaw. I just don't have the equipment necessary to pull that off and make it a worthwile, in house tool rather than an in house oddity. I really, really want to tho. So much so that I have been looking at what it would take to accomplish that, which seems to be at the $4k to $5k USD range. The sweet spot for GPUs seems to be at the 32 gb level. It is pricey, but hell, at my age, I figure wtf....I should treat myself. Whats wrong with that? If I do pull the trigger, I want it to be a LTS type computer like I built 15 years ago and is still running like a champ today tho it's probably worth less than a quarter of what I had invested. So, I'd probably overstock it to the max.

load more comments (8 replies)
[–] Faceman2K23@discuss.tchncs.de 0 points 11 hours ago (1 children)

I've played with it for Home Assistant integration, but I just dont have much interest in it, the whole thing is too inefficient at the moment, and the tiny models that can run in a few gigs of system ram on an ipgu or npu arent good enough in quality or speed to rely on.

Hopefully some future generation micro-models will be more useful for the way I want to use it (aka , ultra light, no dedicated hardware etc.), but for now it's a lot of compute resources, plus heat and energy for a gimmick.

[–] SuspiciousCarrot78@aussie.zone 0 points 11 hours ago* (last edited 11 hours ago)

Agreed. It will be ironic if 1.58B models (Microsoft) turns out to be the great white hope.

I looked at the recent Steam stats (which is a GPU sample of convenience); the most common GPU size was 6GB. Meanwhile you probably need what...64GB unified memory or a 5090 to drive a decent model at a decent speed/context?

There's a real gap between the haves and the have nots and it's widening.

[–] superglue@lemmy.dbzer0.com -1 points 12 hours ago

I use my gaming rig to serve up qwen3.6-coder to Open Web UI and that's been very successful in helping me refactor my home lab to be more effecient and easier to support. Over the years of building my server I got everything working, but lets just say it's a bot of a mess and a lot of shortcuts were taken.

I plan to look into ComfyUI soon but I do that have much of a use case for it at the moment.

[–] SuspiciousCarrot78@aussie.zone 0 points 12 hours ago* (last edited 12 hours ago)

Myself - I've self hosted LLMs before, but with only 4-8GB vram (depending which card is in place), I can't run the good stuff at acceptable enough speeds.

(Don't @ me - I know all the tricks with turbo quants, spec decoding, MoE etc. 192GB/s is 192GB/s)

I do use Handy (STT) which is amazing (my fingers are arthritic and typing hurts after a while).

My personal use case for LLM is quite simple - a trumped up super google and / or self reflection / journalling / sound board. Despite being glib about it, that's actually very useful to me.

Work wise, I use the big winking orange asshole (Claude) when I have to. I have moral tension with with it, so am seriously looking at other options. I hear good things about GLM 5.2, but if I can't run Qwen 35B at any kind of decent speed, well....self hosted GLM is a pipe dream.

[–] artyom@piefed.social 0 points 13 hours ago

I installed LM Studio just for fun on a 6800XT. But it was even less useful than the web-based ones.

[–] ccunning@lemmy.world 0 points 13 hours ago

I’ve got ollama setup with whisper and piper and a HA voice PE, but I honestly haven’t gotten around to configuring much yet. Most notable thing was being able to use the wake word to start a timer, but it was pickier than old Siri about the precise wording.

[–] Egonallanon@feddit.uk 0 points 13 hours ago

I've fiddled around with a few models on ollama and opencode but more for the sake of seeing what I can run as ive yet to really find a use for it in my home usage.

My server is way to weak for that unfortunately. I run some llms on my laptop with ollama but it's not particularly effective. I use it to run dolphin series models when k need an uncensored LLM I have tried running some of the coding models but they just aren't smart enough on my level of compute for any useful work so Ive ended up just paying api prices on open routers.

load more comments
view more: ‹ prev next ›