Total noob to this space, correct me if I’m wrong. I’m looking at getting new hardware for inference and I’m open to AMD, NVIDIA or even Apple Silicon.
It feels like consumer hardware comparatively gives you more value generating images than trying to run chatbots. Like, the models you can run at home are just dumb to talk to. But they can generate images of comparable quality to online services if you’re willing to wait a bit longer.
Like, GPT OSS 120b, assuming you can spare 80GB of memory, is still not GPT 5. But Flux Shnell is still Flux Shnel, right? So if diffusion is the thing, NVIDIA wins right now.
Other options might even be better for other uses, but chatbots are comparatively hard to justify. Maybe for more specific cases like code completion with zero latency or building a voice assistant, I guess.
Am I too off the mark?
No, you can run sd, flux based model inside the koboldcpp. You can try it out using the original koboldcpp in google colab. It loads gguf model. Related discussion on Reddit: https://www.reddit.com/r/StableDiffusion/comments/1gsdygl/koboldcpp_now_supports_generating_images_locally/
Edit: Sorry, I kinda missed the point, maybe I’m sleepy when writing that comment. Yeah, I agree that LLM need big memory to run which is one of it’s downside. I remember someone doing comparison that API with token based pricing is cheaper that to run it locally. But, running image generation locally is cheaper than API with step+megapixel pricing.
Hmm. That does look like it. But I have a build out of git within the last two weeks or so, and the only backend image generation options it lists are AI Horde, KCPP/Forge/A1111, OpenAI/DALL-E, ComfyUI, and Pollinations.ai.
Maybe there’s some compile-time option that needs to be set to build it in.
investigates
Hmm. I guess it just always has the embedded support active, and that’s what “KCPP” is for; you need to configure it at http://localhost:5001/sdui, and then I guess you presumably choose “KCPP/Forge/A1111” as the endpoint. Still not clear where one plonks in the model, but it clearly is there. Sorry about that!