• 39 Posts
  • 224 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle


  • I’m a big fan of NousResearch their deephermes release was awesome and now I’m trying out Hermes 4. I have an 8gb 1070ti GPU was able to fully offload a medium quant of hermes 4 14b with an okay amount of context.

    I’m a big fan of the hybrid reasoning models I like being able to turn thinking on or of depending on scenario.

    I had a vision model document scanner + TTS going on with a finetune of qwen 2.5 vl and outetts.

    If you care more about character emulation for writing and creativity then mistral 2407 and mistral NeMo are other models to check out.





  • Its an ambiguous slang statement that can mean several different things at once. It requires extra work to ground in a well-defined meaning.

    Much in the same way ‘going to the bank’ could mean a financial institution or a river, ‘dropping’ something can mean releasing a new thing, stopping support, or physically fumbling an object on the ground. OP could have done a better job disambiguating with either different words or more context.







  • As other commenter said your workflow requires more than what LLMs are currently capable of.

    Summarization capability in LLMs is an equation of LLMs capacity for coherence over long conversational scaling operated on by the LLMs ability to navigate and distill internal structural mappings of conceptual & contextual archetype patterns as discrete objects across a continuous ambiguity sheaf.

    This technical jargon that boils down to the idea that an llms summarization capability depends on its parameter size and enough vram for context lengths. Higher parameter and less quantized models maintaining more coherence over long conversations/datasets.

    While enterprise llms are able to get up to 128k tokens while maintaining some level of coherence, the local models of medium quantization can handle 16-32k reliably. Theoretically 70b could maybe handle around 64k tokens but even thats stretching it.

    Then comes the problem of transformer attention. You can’t just put a whole books worth of text into an LLMs input and expect it to inspect any part in real detail. For best results you have to chunk it section by section, chapter by chapter.

    So local llms may not be what you’re looking for. If you are willing to go enterprise then Claude sonnet and deepseek R1 might be good especially if you set up a API interface.




  • I rocked rechargable electronic arc lighter with hemp wick when I still did combustion I believe its a pretty solid combo if youre gonna smoke traditionally. Zippo has electric arc inserts but the cheap no named work okay too.

    Burning hemp wick still adds soot and extra carcinogens but its arguably better than inhaling butane.

    Vaporizers effectively cut out the need for hemp wick consumption and electric induction heaters paired with something like a dynavap make it so there’s no need for butane/gas flame.




  • SmokeyDope@lemmy.worldtoweedtime@crazypeople.onlineNames
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 months ago

    Ultimately its about marketing and most stoners/growers arent exactly nerds who care about scientific accuracy. As a nerd my suggestion would be to have an actual lexicon attaching each terp +pheno to a specific word and append the growers last name/company name. force the industry to standardize. I think its fine to have strains like “spicey lemon Smith” as long as each of those words is riggerously attached to a specific meaning like spice is the cinnamony terp, lemon is limoline, Smith is company grower name.

    All crap nondescript hype words like “ak” “og” “kush” “diesel” go right in the fucking bin IMO.

    Of course this is pure theory crafting what if. You and I both know 99% of stoners dont actually care about name accuracy enough to make the industry self-police. as long as it gets them where they want. anyone who does care about exacts buys based off lab testing result data sheets.



  • I don’t have a lot of knowledge on the topic but happy to point you in good direction for reference material. I heard about tensor layer offloading first from here a few months ago. In that post is linked another to MoE expert layer offloadingI highly recommend you read through both post. MoE offloading it was based off

    The gist of the Tensor Cores strategy is Instead of offloading entire layers with --gpulayers, you use --overridetensors to keep specific large tensors (particularly FFN tensors) on CPU while moving everything else to GPU.

    This works because:

    • Attention tensors: Small, benefit greatly from GPU parallelization
    • FFN tensors: Large, can be efficiently processed on CPU with basic matrix multiplication

    You need to figure out which cores exactly need to be offloaded for your model looking at weights and cooking up regex according to the post.

    Heres an example of a kobold startup flags for doing this. The key part is the override tensors flag and the regex contained in it

    python ~/koboldcpp/koboldcpp.py --threads 10 --usecublas --contextsize 40960 --flashattention --port 5000 --model ~/Downloads/MODELNAME.gguf --gpulayers 65 --quantkv 1 --overridetensors "\.[13579]\.ffn_up|\.[1-3][13579]\.ffn_up=CPU"
    ...
    [18:44:54] CtxLimit:39294/40960, Amt:597/2048, Init:0.24s, Process:68.69s (563.34T/s), Generate:56.27s (10.61T/s), Total:124.96s
    

    The exact specifics of how you determine which tensors for each model and the associated regex is a little beyond my knowledge but the people who wrote the tensor post did a good job trying to explain that process in detail. Hope this helps.


  • I would recommend you get a cheap wattage meter that plugs inbetween wall outlet and PSU powering your cards for 10-15$ (the 30$ name brand kill-a-watts are overpriced and unneeded IMO). You can try to get rough approximations doing some math with your cards listed TPD specs added together but that doesn’t account for motherboard, cpu, ram, drives, so on all and the real change between idle and load. With a meter you can just kind of watch the total power draw with all that stuff factored in, take note of increase and max out as your rig inferences a bit. Have the comfort of being reasonably confident in the actual numbers. Then you can plug the values in a calculation