Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
  • Baŝto@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 days ago

    I still vastly prefer Qwen3-30B thinking because it answers pretty fast. The speed was really most interesting thing compared to R1 32B. Now that Ollama supports Vulkan it runs even faster (~ 2/3 CPU & 1/3 GPU).

    I use it with Page Assist to search the web via DDG, but it would also support SearXNG.

    I have Qwen3-Coder 30B for code generation.

    I actually mostly use it with Page Assist as well. I have the Continue plugin installed in VSCodium.

    The rest I don’t use as much. I have installed

    • II search 4B (the goal of it was quick websearches)
    • pydevmini1 4B (website and code mockups, coding questions in the style of “how do I implement XY”)
    • Qwen3 4B abliterated (mostly story generation where R1 refused to generate back then; abliteration didn’t seem to impact creative writing that much)

    I only have 32GB RAM so I ran those 4B models especially if Firefox and/or other things used to much RAM already. Dunno how much that will change with Vulkan support. It probably will only shift a bit since they can run 100% on my 6GB VRAM GPU now. At least now I can run 4B without checking RAM usage first.

    After all all this stuff is nice to run this 100% open source, even when the models aren’t. Especially use them for questions that involve personal information.

    I’ve just started to play around with Qwen3-VL 4B since Ollama support was just added the yesterday. It certainly can read my handwriting.

    Only other AIs I used recently are:

    • Translation model integrated into Firefox
    • Tesseract’s OCR models when I wanted to convert scanned documents into PDFs where I can select and search for text

    My hottest take is probably that I hate the use of T for trillion parameters, even though short scale trillion is the same as Terra. I could somewhat live with the B for billion, though it’s already not great. But the larger the numbers become the more ridiculous it gets. I dunno what they’ll use after trillion but it’ll get ugly fast since quadrillion (10¹⁵) and quintillion (10¹⁸) both start with Q. SI-Prefixes have an unambiguous single character for up to quetta (Q; 10³⁰) right now. (Though SI-Prefixes definitively have some old prefixes which break their system of everything >0 having an uppercase single letter: deca, hecto, kilo) Or it’s because it’s an English, but not an international, notation.