• 14 Posts
  • 76 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle

  • Ive tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.

    DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution.

    I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following system prompt changes for roleplay scenarios or adopting complex personality archetypes.

    This let’s you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.

    The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. So its important to choose between a direct answer or spend 5 minutes to let it really think. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.

    Thats why I am so happy the 8b model is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I’m getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model. 6.4x speed inceease at the CoT is a huge W for my real life human time spent waiting for a complete output.







  • I just spent a good few hours optimizing my LLM rig. Disabling the graphical interface to squeeze 150mb of vram from xorg, setting programs cpu niceness to highest priority, tweaking settings to find memory limits.

    I was able to increase the token speed by half a second while doubling context size. I don’t have the budget for any big vram upgrade so I’m trying to make the most of what ive got.

    I have two desktop computers. One has better ram+CPU+overclocking but worse GPU. The other has better GPU but worse ram, CPU, no overclocking. I’m contemplating whether its worth swapping GPUs to really make the most of available hardware. Its bee years since I took apart a PC and I’m scared of doing somthing wrong and damaging everything. I dunno if its worth the time, effort, and risk for the squeeze.

    Otherwise I’m loving my self hosting llm hobby. Ive been very into l learning computers and ML for the past year. Crazy advancements, exciting stuff.


  • Cool, page assist looks neat I’ll have to check it out sometimes. My llm engine is kobold.cpp, and I just user the openwebui in internet browser to connect.

    Sorry I don’t really have good suggestions for you beyond to just try some of the more popular 1-4bs in a very high quant if not full f8 and see which works best for your use case.

    Llama 4b, mistral 4b, phi-3-mini, tinyllm 1.5b, qwen 2-1.5b, ect ect. I assume you want a model with large context size and good comprehension skills to summarize youtube transcripts and webpage articles? At least I think thats what the add-on you mentioned suggested was its purpose.

    So look for models with those things over ones that try to specialize in a little bit of domain knowledge.



  • The average of all different benchmarks can be thought of as a kind of ‘average intelligence’, though in reality its more of a gradient and vibe type thing.

    Many models are “benchmaxxed” trained to answer the exact kinds of questions the test asked which often makes the benchmarks results unrelated to real world use case checks. Use them as general indicators but not to be taken too seriously.

    All model families are different in ways that you only really understand by spending time with them. Don’t forget to set the rigt chat template and correct sample range values as needed per model. Openleaderboard is a good place to start.



  • DeepHermes 24B CoT thought patterns feels about on par with the official R1 distill Ive tried. Its important to note though my experience is limited to the deepseek r1 NeMo 12B distill as thats what fit nice and fast on my card.

    All the r1 distill thought process internal monolouge humanisms “let me write that down” “if I remember correctly” “oh, but wait that doesnt sound right lets try again” are there. the multiple 'but wait, what if’s" before ending the thought to examine multiple sides are there too. It spends about 2-5k tokens thinking. It tends to stay on track and catch minor mistakes or hallucinations.

    Compared to the unofficial mistral-24b distills this is top tier for sure. I think its toe to toe with ComputationDolphins 24B R1 distill, and its just a preview.








  • I run kobold.cpp which is a cutting edge local model engine, on my local gaming rig turned server. I like to play around with the latest models to see how they improve/change over time. The current chain of thought thinking models like deepseek r1 distills and qwen qwq are fun to poke at with advanced open ended STEM questions.

    STEM questions like “What does Gödel’s incompleteness theorem imply about scientific theories of everything?” Or “Could the speed of light be more accurately refered to as ‘the speed of causality’?”

    As for actual daily use, I prefer using mistral small 24b and treating it like a local search engine with the legitimacy of wikipedia. Its a starting point to ask questions about general things I don’t know about or want advice on, then do further research through more legitimate sources.

    Its important to not take the LLM too seriously as theres always a small statistical chance it hallucinates some bullshit but most of the time its fairly accurate and is a pretty good jumping off point for further research.

    Lets say I want an overview of how can I repair small holes forming in concrete, or general ideas on how to invest financially, how to change fluids in a car, how much fat and protein is in an egg, ect.

    If the LLM says a word or related concept I don’t recognize I grill it for clarifying info and follow it through the infinite branching garden of related information.

    I’ve used an LLM to help me go through old declassified documents and speculate on internal gov terminalogy I was unfamiliar with.

    I’ve used a speech to text model and get it to speek just for fun. Ive used multimodal model and get it to see/scan documents for info.

    Ive used websearch to get the model to retrieve information it didn’t know off a ddg search, again mostly for fun.

    Feel free to ask me anything, I’m glad to help get newbies started.





  • Not really. Lemmy.world is alright as long as you have two functioning braincells to read community guidelines and understand you can’t just say whatever you want in a large fairly moderated public space.

    You can get away with being an anarchist edgelord in a small instance but on Lemmy.world calls to violence are a no-no. When you get as large and stable as .world you need stricter rules to cover ass legally which is understandable.

    The people who complain about being instance banned usually leave out key information that makes them look bad or just plain lie about why they were banned ,cwhich you can check if you go into modlogs.

    Besides that, theres always going to be power mod fuckery any internet fourm you go. At least you can at least post it to c/yepoweryrippingbastard to vent.