It’s amazing how far open source LLMs have come.
Qwen3-32b recreated the Windows95 Starfield screensaver as a web app with the bonus feature to enable “warp drive” on click. This was generated with reasoning disabled (/no_think) using a 4-bit quant running locally on a 4090.
Here’s the result: https://codepen.io/mekelef486/pen/xbbWGpX
Model: Qwen3-32B-Q4_K_M.gguf (Unsloth quant)
Llama.cpp Server Docker Config:
docker run \
-p 8080:8080 \
-v /path/to/models:/models \
--name llama-cpp-qwen3-32b \
--gpus all \
ghcr.io/ggerganov/llama.cpp:server-cuda \
-m /models/qwen3-32b-q4_k_m.gguf \
--host 0.0.0.0 --port 8080 \
--n-gpu-layers 65 \
--ctx-size 13000 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0
System Prompt:
You are a helpful expert and aid. Communicate clearly and succinctly. Avoid emojis.
User Prompt:
Create a simple web app that uses javascript to visualize a simple starfield, where the user is racing forward through the stars from a first person point of view like in the old Microsoft screensaver. Stars must be uniformly distributed. Clicking inside the window enables “warp speed” mode, where the visualization speeds up and star trails are added. The app must be fully contained in a single HTML file. /no_think
If you were running amd GPU theres some versions of llama.cpp engine you can compile with rocm compat. If your ever tempted to run a huge model with partial offloaded CPU/ram inferencing you can set the program to run with highest program niceness priority which believe it or not pushes up the token speed slightly