Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 6 days ago

Oh man, just wait.

I dunno how closely you follow US politics, but Trump was largely reigned in by his cabinet and the rest of the party/govt the first time around.

That is not the case anymore. All he has is people egging him on, and he will follow up on outbursts more frequently.

brucethemoose@lemmy.world · edit-2 7 days ago

“Don’t feed the trolls” and defaulting to skepticism were part of the old internet. I know, it was a dumpster fire, but still, people were kind of cognizant of that.

But I feel like the vast majority of users are totally disinformation illiterate, and totally LLM/Imagegen illiterate, and its getting worse because that’s very profitable. Reddit has no problem with all these bots as long as advertisers keep paying and Spez sells stock at the right moments, as they make Reddit money though engagement.

brucethemoose@lemmy.world · edit-2 8 days ago

Unfortunately Nvidia is, by fair, the best choice for local LLM coder hosting, and there are basically two tiers:

Buy a used 3090, limit the clocks to like 1400 Mhz, and then host Qwen 2.5 coder 32B.
Buy a used 3060, host Arcee Medius 14B.

Both these will expose an OpenAI endpoint.

Run tabbyAPI instead of ollama, as it’s far faster and more vram efficient.

You can use AMD, but the setup is more involved. The kernel has to be compatible with the rocm package, and you need a 7000 card and some extra hoops for TabbyAPI compatibility.

Aside from that, an Arc B570 is not a terrible option for 14B coder models.

brucethemoose@lemmy.world · 8 days ago

It’s not functional yet.

brucethemoose@lemmy.world · edit-2 9 days ago

Maybe we’re overthinking this.

What if it was a front-end for, like, Google or Apple Pay, PayPal, and other centralized financial services?

So basically, the “Fediverse” part is the account, UI, integration with other Fediverse apps, but ultimately it does not hold any financial information or perform any transactions. All it does is conveniently connect donors to creators better, and more flexibly, than a bare “here’s my PayPal’ link.

brucethemoose@lemmy.world · 28 days ago

I doubt they will kill graphics altogether, but it’s possible that Intel will abandon dGPUs and offer “big” integrated GPUs like AMD is already planning in 2025. The driver support is theoretically easier if they stick to IGPs only.

brucethemoose@lemmy.world · edit-2 3 months ago

Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 4 months ago

Qwen2.5: A Party of Foundation Models!