I've tried a few times but with only 8gig of vram it's simply not worth it.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam.
-
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
-
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
-
Submission headline should match the article title.
-
No trolling.
-
Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, your post is exempt from this rule as long as you continue to engage in comments.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Have you tried qwen3.5-9b? It’s pretty solid for its size.
Yeah, it's "good for its size" but it's just too flaky for me to use for any significant coding.
Yeah, I wouldn’t use it for coding. It’s a bit dumb unfortunately.
Yep.
Ollama + about 8 different models at the moment, hosted on a mac mini with open webui as a front end.
Predominantly for transcription, translation, an extra round of security checks on code, a more context friendly home assistant interface, and a daily run of context evaluation on property I'm looking for with a lot of specific needs (acreage, min elevation change, soil type, area, etc).
I have to recommend switching to llamacpp. It's SO much faster than ollama.
I do, I use ollama. I mostly just tinker, but I use with with home assistant for a quasi Alexa like experience with the voice assistant, I use it for summarizing some YouTube transcripts in too lazy to read/watch, and I've tried to see how capable it is with coding.
Can you elaborate on what you are using exactly with home assistant ? And is English your primary language in that context ?
Trying to do something similar, English not primary and its a bit... Harder than it seems. Can't figure out if it is because I'm not using English or something else. (3060 12GB BTW)
I started running LLMs a couple months ago on my own hardware. I have a Framework Desktop that I ordered last year and also recently picked up a refurbished 24GB AMD RX 7900 XTX which I'm doing some performance testing against. The dGPU is much better for dense models, and slightly faster for MoE if I'm willing to run them at a lower quant -- but uses more power and has annoying coil whine. The Framework Desktop uses ~100W under load, is quieter, and for the MoE models already runs them fast enough for most of my needs -- so most of my LLM use happens on that system still.
For software: I'm using ollama on the Framework currently, but I want to replace it with just using llama.cpp directly eventually. I've been using llama-cli for testing the dGPU. I wrote my own chat client to interact with ollama as well as a few other programs for specific tasks.
I've been using the LLMs for a mix of research (both personal and professional), entertainment, practical coding tasks (mostly debugging and brainstorming, plus a bit of UI prototyping, automatic generation of sequence diagrams for documentation, and light scripting), as well as automation of tedious tasks.
As an example of the latter, people often send me requests to prepare data sets by email but don't specify the sources they want precisely so I have to go match the name against the real name in our archives; LLMs are great for mapping the imperfect name -- with typos, missing prefixes, incorrect addition of spaces, addition/removal of hyphens, etc. -- to the exact name I actually need to pull the data off disk when given a lookup table to compare against.
As far as models go, I'm mostly using various Qwen 3.6 and Gemma4 variants. I have multiple versions of each for different purposes. llmfan46's uncensored Qwen 3.6 35B-A3B @ Q6_K (from Hugging Face) is my default model currently.
Yup, ollama, various models. I initially downloaded it because I, along with thousands of other people, wanted to see what would happen if I made models debate with each other after RAGging them with various books (The Prince, The Art of War, The complete works of Shakespeare, etc.).
The results were uninteresting and I abandoned the project pretty quickly. I'll sometimes use them for code analysis but they're too slow on my rig to be really useful.
Did you use OWUIs native "call simultaneous models to answer" feature for that or one of the AI debate harnesses?
Technically, TTS/STT are mostly MLs; I'm pretty sure many people run these. I have a setup but I'm better with buttons that with spoken words, and I listen to ambient sounds or music. I think some day I'll make voice assistant for talking to while driving, but that's not a trivial task hardware-wise, even if I used cloud LLM layer, which I won't. Putting AI on baremetal sounds like an interesting project.
I have a homemade "local agent" that can actually "code" somewhat, I use it just to figure out how this thing works on the inside practically. Mostly useless otherwise (also I have GPU that's older than AI, so it's kind of fun technical task to run this stuff on pure RAM+swap). Feels like the whole hype is greatly overrated, but I appreciate a chance to learn something new anyway.
Ollama with gemma 4 for LLM stuff, coding brainstorming, etc.
Comfy ui with z-image or stable diffusion for images.
Yes. Currently using Gemma4:12b behind OpenWebUI and Hermes Agent plus a few lighter models for OCR and tagging in Paperless.
I don’t host it exactly, just use it when I don’t use my graphics card for gaming. I run Qwen3.6-35b on my 16gb vram RX 9700 xt with 34t/s. I use it as an IT advisor, admin and Linux teacher for my cachyOS gaming PC.
The other day I made a machine learning model that classifies images as either 'a certain type of undesirable image' (no, not porn) or 'any other image'. It is 96.4% accurate and takes 14 ms to classify one image (using CPU only - with a GPU it could be 5x - 10x faster).
I plan to offer this as an API service that social media networks can use to filter posts.