Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
  • panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    1
    ·
    edit-2
    14 days ago

    Thinking is an awful paradigm

    Models would do better to revert and visit other token branches, but top p/k blocks that. Thinking tokens are a waste.

    One of the reasons thinking majes models good is just reinforcement learning, but it tends to be very narrow.

    Like math you can reinforcement learn until grad level. That’s fine. But it doesn’t actually improve problem solving.