this post was submitted on 24 Jun 2026
63 points (77.9% liked)

Selfhosted

60093 readers
876 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

  7. Promotion posts require your active participation in selfhosting or related communities, or the post will be removed. No more than 10% of your posts or comments may be self-promotional, or your post will be removed. F/LOSS Exception: If your post is about a project that is completely open source & can be self-hosted in full without payment, your post is exempt from this rule as long as you continue to engage in comments.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

(page 2) 50 comments
sorted by: hot top controversial new old
[–] atzanteol@sh.itjust.works 9 points 11 hours ago (6 children)

I've tried a few times but with only 8gig of vram it's simply not worth it.

[–] Franconian_Nomad@feddit.org 4 points 11 hours ago (1 children)

Have you tried qwen3.5-9b? It’s pretty solid for its size.

[–] atzanteol@sh.itjust.works 1 points 9 hours ago (1 children)

Yeah, it's "good for its size" but it's just too flaky for me to use for any significant coding.

[–] Franconian_Nomad@feddit.org 0 points 6 hours ago

Yeah, I wouldn’t use it for coding. It’s a bit dumb unfortunately.

load more comments (5 replies)
[–] curbstickle@anarchist.nexus 3 points 9 hours ago (3 children)

Yep.

Ollama + about 8 different models at the moment, hosted on a mac mini with open webui as a front end.

Predominantly for transcription, translation, an extra round of security checks on code, a more context friendly home assistant interface, and a daily run of context evaluation on property I'm looking for with a lot of specific needs (acreage, min elevation change, soil type, area, etc).

[–] surewhynotlem@lemmy.world 2 points 8 hours ago (1 children)

I have to recommend switching to llamacpp. It's SO much faster than ollama.

load more comments (1 replies)
[–] irmadlad@lemmy.world 1 points 8 hours ago (1 children)

mac mini

How? What is your average response time?

load more comments (1 replies)
[–] async_amuro@lemmy.zip 1 points 9 hours ago (3 children)

What spec Mini do you use?

load more comments (3 replies)
[–] Nednarb44@lemmy.world 7 points 12 hours ago (1 children)

I do, I use ollama. I mostly just tinker, but I use with with home assistant for a quasi Alexa like experience with the voice assistant, I use it for summarizing some YouTube transcripts in too lazy to read/watch, and I've tried to see how capable it is with coding.

[–] diminou@lemmy.zip 2 points 11 hours ago (1 children)

Can you elaborate on what you are using exactly with home assistant ? And is English your primary language in that context ?

Trying to do something similar, English not primary and its a bit... Harder than it seems. Can't figure out if it is because I'm not using English or something else. (3060 12GB BTW)

load more comments (1 replies)
[–] e0qdk@reddthat.com 2 points 9 hours ago

I started running LLMs a couple months ago on my own hardware. I have a Framework Desktop that I ordered last year and also recently picked up a refurbished 24GB AMD RX 7900 XTX which I'm doing some performance testing against. The dGPU is much better for dense models, and slightly faster for MoE if I'm willing to run them at a lower quant -- but uses more power and has annoying coil whine. The Framework Desktop uses ~100W under load, is quieter, and for the MoE models already runs them fast enough for most of my needs -- so most of my LLM use happens on that system still.

For software: I'm using ollama on the Framework currently, but I want to replace it with just using llama.cpp directly eventually. I've been using llama-cli for testing the dGPU. I wrote my own chat client to interact with ollama as well as a few other programs for specific tasks.

I've been using the LLMs for a mix of research (both personal and professional), entertainment, practical coding tasks (mostly debugging and brainstorming, plus a bit of UI prototyping, automatic generation of sequence diagrams for documentation, and light scripting), as well as automation of tedious tasks.

As an example of the latter, people often send me requests to prepare data sets by email but don't specify the sources they want precisely so I have to go match the name against the real name in our archives; LLMs are great for mapping the imperfect name -- with typos, missing prefixes, incorrect addition of spaces, addition/removal of hyphens, etc. -- to the exact name I actually need to pull the data off disk when given a lookup table to compare against.

As far as models go, I'm mostly using various Qwen 3.6 and Gemma4 variants. I have multiple versions of each for different purposes. llmfan46's uncensored Qwen 3.6 35B-A3B @ Q6_K (from Hugging Face) is my default model currently.

[–] queerlilhayseed@piefed.blahaj.zone 5 points 11 hours ago (9 children)

Yup, ollama, various models. I initially downloaded it because I, along with thousands of other people, wanted to see what would happen if I made models debate with each other after RAGging them with various books (The Prince, The Art of War, The complete works of Shakespeare, etc.).

The results were uninteresting and I abandoned the project pretty quickly. I'll sometimes use them for code analysis but they're too slow on my rig to be really useful.

[–] SuspiciousCarrot78@aussie.zone 2 points 11 hours ago (3 children)

Did you use OWUIs native "call simultaneous models to answer" feature for that or one of the AI debate harnesses?

load more comments (3 replies)
load more comments (8 replies)
[–] alzymologist@sopuli.xyz 5 points 11 hours ago

Technically, TTS/STT are mostly MLs; I'm pretty sure many people run these. I have a setup but I'm better with buttons that with spoken words, and I listen to ambient sounds or music. I think some day I'll make voice assistant for talking to while driving, but that's not a trivial task hardware-wise, even if I used cloud LLM layer, which I won't. Putting AI on baremetal sounds like an interesting project.

I have a homemade "local agent" that can actually "code" somewhat, I use it just to figure out how this thing works on the inside practically. Mostly useless otherwise (also I have GPU that's older than AI, so it's kind of fun technical task to run this stuff on pure RAM+swap). Feels like the whole hype is greatly overrated, but I appreciate a chance to learn something new anyway.

[–] iceberg314@slrpnk.net 4 points 11 hours ago (1 children)

Ollama with gemma 4 for LLM stuff, coding brainstorming, etc.

Comfy ui with z-image or stable diffusion for images.

load more comments (1 replies)

Yes. Currently using Gemma4:12b behind OpenWebUI and Hermes Agent plus a few lighter models for OCR and tagging in Paperless.

[–] Franconian_Nomad@feddit.org 3 points 11 hours ago (8 children)

I don’t host it exactly, just use it when I don’t use my graphics card for gaming. I run Qwen3.6-35b on my 16gb vram RX 9700 xt with 34t/s. I use it as an IT advisor, admin and Linux teacher for my cachyOS gaming PC.

load more comments (8 replies)
[–] rimu@piefed.social 3 points 11 hours ago* (last edited 11 hours ago)

The other day I made a machine learning model that classifies images as either 'a certain type of undesirable image' (no, not porn) or 'any other image'. It is 96.4% accurate and takes 14 ms to classify one image (using CPU only - with a GPU it could be 5x - 10x faster).

I plan to offer this as an API service that social media networks can use to filter posts.

load more comments
view more: ‹ prev next ›