Qwen3-32b: Windows95 starfield screensaver web app with warp drive on click

xodoh74984@lemmy.world · edit-2 2 months ago

Qwen3-32b: Windows95 starfield screensaver web app with warp drive on click

𞋴𝛂𝛋𝛆@lemmy.world · 2 months ago

Is 13k your max context at Q4K_M?

xodoh74984@lemmy.world · edit-2 2 months ago

I’m close to the limit at 23886MiB / 24564MiB of VRAM used when the server is running. I like to have a bit of headroom for other tasks.

But I’m by no means a llama.cpp expert. If you have any tips for better performance I’d love to hear them!

SmokeyDope@lemmy.world · 2 months ago

Enable flash attention if you havent already

xodoh74984@lemmy.world · 2 months ago

22466MiB / 24564MiB, awesome, thank you!

SmokeyDope@lemmy.world · 2 months ago

You’re welcome. Also, whats your gpu and are you using cublas (nvidia) or vulcan(universal amd+nvidia) or something else for gpu postprocessing?

xodoh74984@lemmy.world · 2 months ago

It’s a 4090 using cublas. I just run the stock llama.cpp server with CUDA support. Do you know if there’d be any advantage to building it from source or using something else?

SmokeyDope@lemmy.world · 2 months ago

If you were running amd GPU theres some versions of llama.cpp engine you can compile with rocm compat. If your ever tempted to run a huge model with partial offloaded CPU/ram inferencing you can set the program to run with highest program niceness priority which believe it or not pushes up the token speed slightly

ffhein@lemmy.world · 2 months ago

Exllamav3 is still in development so it’s not fully optimized and could have bugs, but I get 16k context with 4bpw (which has very similar perplexity as Q4_K_M, according to developer’s own measurements) using only 22GB VRAM, since I also run my desktop env on the same computer.

gencha@lemm.ee · 2 months ago

People implement this on their calculator during class. This is the kind of thing you would write to learn programming, the definition of entry-level. You’re using a device that can execute billions of trigonometric calculations per millisecond to produce code that calculates X and Y coordinates for few dozens of points on a radial trajectory.

What the fuck…

jaemo@sh.itjust.works · 2 months ago

There is, seriously, no pleasing some people. You appear to be a vocal member of this highly undignified and odious demographic.

xodoh74984@lemmy.world · edit-2 2 months ago

Fair point. My original prompt asked for more, but the model wasn’t capable enough. Not sure if the “warp drive” part would be part of any standard algo.

Any ideas on challenges that are new and more fun than the “balls rolling in a hexa-,hepta-,octagon” or “simulate a solar system” prompts everyone’s using these days?