Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 1 hour ago

I would!

Rambling on Twitter makes money. And politicians have finally figured out they can get in on the rage influencer pie, too.

It’s super effective.

Ugh, I can’t wait for Twitter to enshittify so much it withers away.

brucethemoose@lemmy.world · edit-2 13 hours ago

Rumor is it’s successor is 384 bit, and after that their designs are even more modular:

https://www.techpowerup.com/340372/amds-next-gen-udna-four-die-sizes-one-potential-96-cu-flagship

Hybrid inference prompt processing actually is pretty sensitive to PCIe bandwidth, unfortunately, but again I don’t think many people intend on hanging an AMD GPU off these Strix Halo boards, lol.

brucethemoose@lemmy.world · edit-2 14 hours ago

It’s PCIe 4.0 :(

but these laptop chips are pretty constrained lanes wise

Indeed. I read Strix Halo only has 16 4.0 PCIe lanes in addition to its USB4, which is resonable given this isn’t supposed to be paired with discrete graphics. But I’d happily trade an NVMe slot (still leaving one) for x8.

One of the links to a CCD could theoretically be wired to a GPU, right? Kinda like how EPYC can switch its IO between infinity fabric for 2P servers, and extra PCIe in 1P configurations. But I doubt we’ll ever see such a product.

brucethemoose@lemmy.world · edit-2 17 hours ago

Nah, unfortunately it is only PCIe 4.0 4x. That’s a bit slim for a dGPU, especially in the future :(

brucethemoose@lemmy.world · edit-2 18 hours ago

If you can swing $2K, get one of the new mini PCs with an AMD 395 and 64GB+ RAM (ideally 128GB).

They’re tiny, lower power, and the absolute best way to run the new MoEs like Qwen3 or GLM Air for coding. TBH they would blow a 5060 TI out of the water, as having a ~100GB VRAM pool is a total game changer.

I would kill for one on an ITX mobo with an x8 slot.

brucethemoose@lemmy.world · edit-2 1 day ago

Anandtech had a great saying:

There are no bad products, just bad prices.

Performance wise, Intel CPUs were just fine at the right price, no matter what manufacturing drama is going on. Don’t get me wrong, all my recent CPU purchases have been AMD, but not because of brand loyalty or anything; it’s because they were on sale and great for the price.

brucethemoose@lemmy.world · edit-2 2 days ago

Yep!

Also, I’m going to plug the AI Horde, which is basically the Fediverse for AI self hosting: https://aihorde.net/

It’s awesome! Though a bit sparsely populated, like Lemmy, heh.

Ping me, and I can host a medium-sized model to try for a few hours (via those linked web UIs), if you want. The options are limitless, from something STEM-focused like Nemotron 49B, to a long context model like Bytedance’s new 36B, to, dungeonmaster finetunes, to horny as heck roleplaying models, lol. But they should be significantly better than whatever 8B ollama downloads by default.

brucethemoose@lemmy.world · edit-2 2 days ago

I am on mobile and can be more detailed later, if you want but the jist is to sign up (with a payment method) to some API service. There are many. Some neat ones include:

Openrouter (a gateway to many, many models from many providers, I’d recommend this first)
Cerebras API (which is faster than anything and has a generous free tier)
Google Gemini, which is free to just try this out on with no credit card.

Some great models to look out for, that you may not know of:

GLM 4.5 (my all-around favorite)
Deepseek (and its uncensored finetunes)
Kimi
Jamba Large
Minimax
InternLM for image input
Qwen Coder for coding
Hermes 405B (which is particularly ‘uncensored’)
Gemini Pro/Flash, which is less private but free to try.

Most (in exchanges for charging pennies for each request) do not log your prompts. If you are really, really concerned, you can even rent your own GPU instance on demand.

Anyway, they will give you a key, which is basically a password.

Paste that key into the LLM frontend of your choice, like Open Web UI, LM Studio, or even web apps like:

Or even the Openrouter web interface.

brucethemoose@lemmy.world · edit-2 2 days ago

The only way to use chatgpt if you must is

-To use it over API.

Or, preferably, literally any other LLM API that isn’t such a censored privacy nightmare.

The problem here is that most percieve ChatGPT as the only chatbot in existance… It’s not, not even close.

Also, ChatGPT is really bad at spitting out its own policy, FYI. expect it to randomly hallucinate stuff.

brucethemoose@lemmy.world · 2 days ago

Ahem. My HTC 7 came with malware in the keyboard, baked in as an unremovable system app!

https://www.slashgear.com/google-bans-touchpal-keyboard-dev-cootek-from-play-store-ad-platform-17584118/

brucethemoose@lemmy.world · edit-2 2 days ago

Oh, that stinks. I’d love to be able to publish it somewhere I can get genuine feedback. I’m honing my writing skills again in preparation to pick up an old novel I wrote years ago.

Even better! Writing, and reading other’s writing, is the way to do it; I feel like my writing skills improved massively after my first fic (hence, I’m trying to get better prose in the second).

I publish on Ao3 too, but after a while I also upload a few chapters at a time to Fanfiction.net. As janky as it is, it still has a lot of readers.

I dunno where else you should consider publishing… Tumblr? Maybe anime forums? A Fediverse animation group? There might be places with a large One Piece following, including oldschool forums where posting fics is a thing. But I can tell you (from my experience with fandoms) Reddit subs are not it. I dunno what it is about the format, but more speculative lore discussion and fanfics just don’t have much traction there.

brucethemoose@lemmy.world · edit-2 3 days ago

Oh that sucks. It’s par for iOS though.

Yeah, me too. So what if apps were simpler than they are now… that would be nice.

brucethemoose@lemmy.world · edit-2 3 days ago

iOS jailbreaking used to be incredible too.

I guess it might still be for those who slipped into the “window” to do it, but I have no clue what Cydia looks like these days.

brucethemoose@lemmy.world · 3 days ago

I dunno why you are getting downvoted. This is a core tenant of MAHA.

The antivaxx pseudoscience and mass firings are unforgivable, and action on the food end is slow, but still.

brucethemoose@lemmy.world · 3 days ago

Actually that’s one of the few positive angles Kennedy campaigned on: fighting ultra processed foods.

I don’t know about action, but it’s at least the idea behind MAHA.

brucethemoose@lemmy.world · edit-2 3 days ago

Well look here, for instance:

https://old.reddit.com/r/LocalLLaMA/comments/1mpk2va/announcing_localllama_discord_server_bot/

Looks like a combination of:

A power trippin’ mod looking to promote their discord.
A feeling of nowhere else to go as Reddit enshittifies.
Unawareness that https://sh.itjust.works/c/localllama exists because Reddit shadow bans anyone who links Lemmy.

And this is literally a self-hosting community… Look at the comments, they all know Discord is bad, but the mood is “WTF else are ya gonna do?”

For some context, localllama itself is kinda a refuge from Twitter/Linkedin spaces filled with Tech Bro stink, and it’s already been shattered by random niche LLM Discords (like BeaverAI, which you will find in another thread).

brucethemoose@lemmy.world · edit-2 3 days ago

The power usage is massively overstated, and a meme perpetuated by Altman so he’ll get more more money for ‘scaling’. And he’s lying through his teeth: there literally isn’t enough silicon capacity in the world for that stupid idea.

GPT-5 is already proof scaling with no innovation doesn’t work. So are open source models trained/running on peanuts nipping at its heels.

And tech in the pipe like bitnet is coming to disrupt that even more; the future is small, specialized, augmented models, mostly running locally on your phone/PC because it’s so cheap and low power.

There’s tons of stuff to worry about over LLMs and other generative ML, but future power usage isn’t one.

brucethemoose@lemmy.world · edit-2 4 days ago

All my favorite niches have gone to Discord.

Freaking Discord. A black hole where anything useful is buried under mountains of folks shooting the breeze. Oh, and with a useless search function.

Every. Single. Niche. Even self-hosting ones like localllama.

Reddit may be dead and enshittified, but at least it was accessible to search engines :(

Hardly matters though, as they just randomly shadowban me anyway…

brucethemoose@lemmy.world · 4 days ago

Awesome! I know nothing of One Piece, so probably can’t give any useful feedback, but may read some time anyway.

Though be aware you may not get a ton of comments. I suspect Ao3 readers are generally into shorter-form stuff (and smut) these days, and I’ve been told some would-be commenters are ‘afraid’ of antagonizing the authors, more or less.

brucethemoose@lemmy.world · 4 days ago

Yeah, the history of GANs stretch way back before transformers LLMs, and evolved into ESRGAN, finetunes in obscure Discords…

That history went somewhere, fortunately, and it definitely blows the venerable waifu2x out of the water: https://openmodeldb.info/

brucethemoose@lemmy.world · edit-2 11 months ago

Guide to Self Hosting LLMs Faster/Better than Ollama