Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
  • Baŝto@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 days ago

    I still vastly prefer Qwen3-30B thinking because it answers pretty fast. The speed was really most interesting thing compared to R1 32B. Now that Ollama supports Vulkan it runs even faster (~ 2/3 CPU & 1/3 GPU).

    I use it with Page Assist to search the web via DDG, but it would also support SearXNG.

    I have Qwen3-Coder 30B for code generation.

    I actually mostly use it with Page Assist as well. I have the Continue plugin installed in VSCodium.

    The rest I don’t use as much. I have installed

    • II search 4B (the goal of it was quick websearches)
    • pydevmini1 4B (website and code mockups, coding questions in the style of “how do I implement XY”)
    • Qwen3 4B abliterated (mostly story generation where R1 refused to generate back then; abliteration didn’t seem to impact creative writing that much)

    I only have 32GB RAM so I ran those 4B models especially if Firefox and/or other things used to much RAM already. Dunno how much that will change with Vulkan support. It probably will only shift a bit since they can run 100% on my 6GB VRAM GPU now. At least now I can run 4B without checking RAM usage first.

    After all all this stuff is nice to run this 100% open source, even when the models aren’t. Especially use them for questions that involve personal information.

    I’ve just started to play around with Qwen3-VL 4B since Ollama support was just added the yesterday. It certainly can read my handwriting.

    Only other AIs I used recently are:

    • Translation model integrated into Firefox
    • Tesseract’s OCR models when I wanted to convert scanned documents into PDFs where I can select and search for text

    My hottest take is probably that I hate the use of T for trillion parameters, even though short scale trillion is the same as Terra. I could somewhat live with the B for billion, though it’s already not great. But the larger the numbers become the more ridiculous it gets. I dunno what they’ll use after trillion but it’ll get ugly fast since quadrillion (10¹⁵) and quintillion (10¹⁸) both start with Q. SI-Prefixes have an unambiguous single character for up to quetta (Q; 10³⁰) right now. (Though SI-Prefixes definitively have some old prefixes which break their system of everything >0 having an uppercase single letter: deca, hecto, kilo) Or it’s because it’s an English, but not an international, notation.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    4
    ·
    edit-2
    13 days ago

    The broader generative AI economy is a steaming pile of shit and we’re somehow part of it? I mean it’s nice technology and I’m glad I can tinker around with it, but boy is it unethical. From how datasets contain a good amount of pirated stuff, to the environmental impact and that we’ll do fracking, burn coal and all for the datacenters, to how it’s mostly an unsustainable investment hype and trillion-dollar merry-go-round. And then I’m not okay with the impact on society either, I can’t wait for even more slop and misinformation everywhere and even worse customer support.

    We’re somewhere low on the food chain, certainly not the main culprit. But I don’t think we’re disconnected from the reality out there either. My main take is, it depends on what we do with AI… Do we do the same unhealthy stuff with it, or do we help even out the playing field so it’s not just the mega-corporations in control of AI? That’d be badly needed for some balance.

    Second controversial take: I think AI isn’t very intelligent. It regularly fails me once I give real-world tasks to it. Like coding and it really doesn’t do a good job with the computer programming issues I have. I need to double-check everything and correct it 30 times until it finally gets maths and memory handling somewhat right (by chance), and that’s just more effort than coding something myself. And I’m willing to believe that transformer models are going to plateau out, so I’m not sure if that’s ever going to change.

    Edit: Judging by the votes, seems I’m the one with the controversial comment here. Care to discuss it? Too close to the truth? Or not factual? Or not a hot take and just the usual AI naysayer argument?

    • Baŝto@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 days ago

      I’m flip-flopping between running local models on my PC with solar power vs. using OpenAI’s free ChatGPT to drive them into ruin, which most of the time ends with me having stupid a stupid argument with an AI.

      impact on society

      Local AI will likely have a long lasting impact as it won’t just go away. The companies who released them can go bankrupt, but the models stay. The hardware which runs them will get faster and cheaper over time.

      I have some hope with accessibility and making FLOSS development easier/faster. Generative AI can at least quickly generate mockup code or placeholder graphics/code. There are game projects who would release with generated assets, just like for a long time there were game projects who released assets which were modifications or redistribution of assets they didn’t have the rights for. They are probably less likely to get sued over AI generated stuff. It’s unethical but they can replace it with something self-made once the rest is finished.

      Theoretically even every user could generate their own assets locally which would be very inefficient, also ethically questionable, but legally fine as they don’t redistribute them.

      I like how Tesseract already uses AI for OCR and Firefox for realtime website translations on your device. Though I dunno how much they benefit from advancements in generative AI?


      Though a different point/question: At what point is generative AI ethically and legally fine?

      • If I manage to draw some original style it transfers? But I’m so slow and inefficient with it that I can’t create a large amount of assets that way
      • When I create the input images myself? But in a minimalist and fast manner

      It still learned that style transfer somewhere and will close gaps I leave. But I created the style and what the image depicts. At what point is it fine?


      Like coding

      I actually use it often to generate shell scripts or small simple python tools. But does it make sense? Sometimes it does work. For very simple logic they tend to get it right. Though writing it myself would probably been faster the last time I used, though at the moment I was too lazy to write it myself. I don’t think I’ve ever really created something usable with it aside from practical shell scripts. Even with ChatGPT it can be an absolute waste of time to explain why the code is broken, didn’t get at all why its implementation lead to a doubled file extension and a scoping error in one function … when I fixed them it actually tried to revert that.

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 days ago

        Your experience with AI coding seems to align with mine. I think it’s awesome for generating boilerplate code, placeholders including images, and for quick mockups. Or asking questions about some documentation. The more complicated it gets, the more it fails me. I’ve measured the time once or twice and I’m fairly sure it’s more than usual, though I didn’t do any proper scientific study. It was just similar tasks and me running a timer. I believe the more complicated maths and trigonometry I mentioned was me yelling at AI for 90min or 120minutes or so until it was close and then I took the stuff around, deleted the maths part and wrote that myself. Maybe AI is going to become more “intelligent” in the future. I think a lot of people hope that’s going to happen. I think as of today we’re need to pay close attention if it fools us but is a big time and energy waster, or if it’s actually a good fit for a given task.

        Local AI will likely have a long lasting impact as it won’t just go away.

        I like to believe that as well, but I don’t think there’s any guarantee they’ll continue to release new models. Sure, they can’t ever take Mistral-Nemo from us. But that’s going to be old and obsolete tech in the world of 2030 and dwarfed by any new tech then. So I think the question is more, are they going to continue? And I think we’re kind of picking up what the big companies dumped when battling and outcompeting each other. I’d imagine this could change once China and the USA settle their battle. Or multiple competitors can’t afford it any more. And they’d all like to become profitable one day. Their motivation is going to change with that as well. Or the AI bubble pops and that’s also going to have a dramatic effect. So I’m really not sure if this is going to continue indefinitely. Ultimately, it’s all speculation. A lot of things could possibly happen in the future.

        At what point is generative AI ethically and legally fine?

        If that’s a question about development of AI in general, it’s an entire can of worms. And I suppose also difficult to answer for your or my individual use. What part of the overall environment footprint gets attributed to a single user? Even more difficult to answer with local models. Do the copyright violations the companies did translate to the product and then to the user? Then what impact do you have on society as a single person using AI for something? Does what you achieve with it outweigh all the cost?

        Firefox for realtime website translations

        Yes, I think that and text to speech and speech to text are massively underrated. Firefox Translate is something I use quite often and I can do crazy stuff with it like casually browse Japanese websites.

        • Baŝto@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 days ago

          But that’s going to be old and obsolete tech in the world of 2030 and dwarfed by any new tech then.

          My point was more the people they replace now they’ll replace indefinitely in the context of “impact on society”

          a question about development of AI in general, it’s an entire can of worms

          and

          So I think the question is more, are they going to continue?

          I just ran into https://huggingface.co/briaai/FIBO, which looks interesting in many ways. (At first glance.)

          trained exclusively on licensed data

          It also only works with JSON inputs. The more we split AIs into modules that can be exchanged, the more we can update pipelines module by module, tweak them…

          It’s unlikely that there’ll never be new releases. It’s always interesting for new-comers to gain market penetration and show off.

          What part of the overall environment footprint gets attributed to a single user?

          It’s possible that there’ll be companies at some point who proudly train their models with renewable energy etc. like it’s already common in other products. It just has to be cheap/accessible enough for them to do that. Though I don’t see that for GPU production anytime soon.

          • hendrik@palaver.p3x.de
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            3 days ago

            Thanks.

            FIBO, which looks interesting in many ways.

            Indeed. Seems it has good performance, licensed training material… That’s all looking great. I wonder who has to come up with the JSON but I guess that’d be another AI and not my task. Guess I’ll put it on my list of things to try.

            It’s possible that there’ll be companies at some point who proudly train their models with renewable energy

            I said it in another comment, I think that’s a bit hypothetical. It’s possible. I think we should do it. But in reality we ramp up natural gas and coal. US companies hype small nuclear reactors and some some people voiced concerns China might want to take advantage of Russia’s situation for their insatiable demand for (fossil-fuel) energy. I mean they also invest massively in solar. It just looks to me we’re currently overall headed the other direction and we need substantial change to maybe change that some time in the future. So I categorize it more towards wishful-thinking.

            • Baŝto@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              1 day ago

              I wonder who has to come up with the

              I haven’t tried to run any of that yet, but they have these models on HF:

              that’s a bit hypothetical

              Yes, absolutely. It can happen, but we shouldn’t make decisions based on the assumption that it might happen. In other fields there are companies who try to make their products better recyclable, less energy hungry (production and run time), made from sustainable resources, repairable, more ethically sourced resources etc. So it’s not out of question, but it often starts with people who just wanna see it happen, not with a business case. There are also many black sheep who only do green washing by just letting it sound like they do that without actually doing it.

              Ecosia already tries to sell their chatbot as green, but it only uses OpenAI’s API and they plant trees how they always do. Though I generally don’t like their compensation concept, at least they claim their own servers run 100% renewable energy. I haven’t tried their chatbot(s) yet, but it looks like it’s still only OpenAI. If they do it like duckduckgo at some point in the future, they could run open models on their own servers. Whether they can produce enough energy and get their hands on hardware to get that working etc is a different question though. There isn’t any indication yet that they plan to go that way.

              It’s probably already possible to let an EMS start AI training when there is solar overproduction. That’s only worth it when the pace of new break throughs have slowed down or when they use outdated techniques anyways. I dunno where the current balance currently is between electricity prices, hardware cost, energy efficiency of the hardware and time pressure.

              EDIT: Sounds like Ecosia is on it for runnning AIs at least https://blog.ecosia.org/what-we-are-doing-with-ai/. They probably push that renewable energy into grid somewhere else than where the AI is consuming it.

              concerns China might want to take advantage

              I don’t think they’ll say no to cheap energy, but they definitely don’t wanna be dependent on other countries for their energy. As far as I understand they push solar, electric cars etc for energy dependency reasons.

    • Domi@lemmy.secnd.me
      link
      fedilink
      English
      arrow-up
      4
      ·
      12 days ago

      From how datasets contain a good amount of pirated stuff

      Personally, I do not care if datasets contain pirated stuff because the copyright laws are broken anyway. If the entirety of Disney movies and Harry Potter books are somewhere inside those datasets, I can play them a song on the world’s smallest violin.

      Smaller artists/writers are the ones I empathize with. I get their concern about large corporations using their stuff and making money off of it. Not entirely something that applies to local AI since most people here do this for themselves and do not make any money out of it.

      to the environmental impact

      That’s actually the saddest part. Those models could be easily trained with renewables alone but you know, capitalism.

      Do we do the same unhealthy stuff with it, or do we help even out the playing field so it’s not just the mega-corporations in control of AI?

      The thing is, those models are already out there and the people training them do not gain anything when people download their open weights/open source models for free for local use.

      There’s so much cool stuff you can do with generative AI fully locally that I appreciate that they are available for everyone.

      Second controversial take: I think AI isn’t very intelligent.

      If we are talking about LLMs here, I don’t think that’s much of a controversial take.

      Most people here will be aware that generative AI hallucinates all the time. Sometimes that’s good, like when writing stories or generating abstract images but when you’re trying to get accurate information, it’s really bad.

      LLMs become much more useful when they do not have to completely rely on their training data and instead get all the information they need provided to them (e.g. RAG).

      I’m a huge fan of RAG because it cites where it got the information from, meaning you can ask it a question and then continue reading in the source to confirm. Like fuzzy search but you don’t have to know the right terms.

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        12 days ago

        Agreed.

        Those models could be easily trained with renewables alone but you know, capitalism.

        It’s really sad to read the articles how they’re planning to bulldoze Texas and do fracking and all these massively invasive things and then we also run a lot of the compute on coal and want more nuclear plants as well. That doesn’t really sound that progressive and sophisticated to me.

        The thing is, those models are already out there and the people training them do not gain anything when people download their open weights/open source models for free for local use.

        You’re right. Though the argument doesn’t translate into anything absolute. I can’t buy salami in the supermarket and justify it by saying the cow is dead anyways and someone already sliced it up. It’s down to demand and that’s really complex. Does Mark Zuckerberg really gift an open-weights model to me out of pure altruism? Is it ethical if I get some profit out of some waste, or by-product of some AI war/competition? It is certainly correct that we here don’t invest money in that form. However that’s not the entire story either, we still buy the graphics cards from Nvidia and we also set free some CO2 when doing inference, even if we didn’t pay for the training process. And they spend some extra compute to prepare those public models, so it’s not no extra footprint, but it’s comparatively small.

        I’m not perfect, though. I’ll still eat salami from time to time. And I’ll also use my computer for things I like. Sometimes it serves a purpose and then it’s justified. Sometimes I’ll also do it for fun. And that in itself isn’t something that makes it wrong.

        I’m a huge fan of RAG because it cites where it got the information from

        Yeah, that’s really great and very welcome. Though I think it still needs some improvement on picking sources. If I use some research mode from one of the big AI services, it’ll randomly google things, but some weird blog post or a wrong reddit comment will show up on the same level as a reputable source. So it’s not really fit for those use-cases. It’s awesome to sift through documentation, though. Or a company’s knowledgebase. And I think those are the real use-cases for RAG.

        • Baŝto@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 days ago

          I can’t buy salami in the supermarket and justify it by saying the cow is dead anyways

          That’s not comparable. You can’t compare software or even research with a physical object like that. You need a dead cow for salami, if demand increases they have to kill more cows. For these models the training already happened, how many people use it does not matter. It could influence whether or how much they train new models, but there is no direct relation. You can use that forever in it’s current state without any further training being necessary. I’d rather compare that with nazi experiments on human beings. Their human guinea pigs already suffered/died no matter whether you use the research derived from that or not. Doing new and proper training/research to get to a point where improper ones already got is somewhat pointless in this case, you just spend more resources.

          Though it makes sense to train new models on public domain and cc0 materials if you want end results that protect you better from getting sued because of copyright violations. There are platforms who banned AI generated graphics because of that.

          we still buy the graphics cards from Nvidia and we also set free some CO2 when doing inference

          But you don’t have to. I can run small models on my NITRO+ RX 580 with 8 GB VRAM, which I bought 7 years ago. It’s maybe not the best experience, but it certainly “works”. Last time our house used external electricity was 34h ago.

          Regarding RAG, I just hope it improves machine readability, which is also useful for non-AI applications. It just increases the pressure.

          • hendrik@palaver.p3x.de
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            4 days ago

            That’s not comparable. You can’t compare software or even research with a physical object like that. You need a dead cow for salami, if demand increases they have to kill more cows. For these models the training already happened, how many people use it does not matter.

            I really like to disagree here. Sure today’s cow is already dead and turned into sausage. But the pack of salami I buy this week is going to make the supermarket order another pack next week so what I’m really doing is have someone kill the next cow, or at least a tiny bit because I’m having just some slices and it’s the bigger picture and how I’m part of a large group of people creating the overall demand.

            And I think it’s at least quesionable if and how this translates. It’s still part of generating demand for AI. Sure, it’s kind of a byproduct but Meta directly invests additional research, alignment and preparation for these byproducts. And we got an entire ecosystem around it with Huggingface, CivitAI etc which cater to us, sometimes a sunstantial amount of their bussiness is the broader AI community and not just researchers. They provide us with datacenters for storage, bandwith and sometimes compute. So it’s certainly not nothing which gets added due to us. And despite it being immaterial, it has a proper effect on the world. It’s going to direct technology and society in some direction. Have real-world consequences when used. The pollution during the process of creating this non-physical product is real. And Meta seems to pay attention. At least that’s what I got from everything that happened with LLaMA 1 to today. I think if and how we use it is going to affect what they do with the next iteration. Similar to the salami pack analogy. Of course it’s a crude image. And we don’t really know what would happen if we did things differently. Maybe it’d be the same so it’s down to the more philosophical question of whether it’s ethical to benefit from things that have been made in an unethical way. Though this requires today’s use not to have any effect on future demand. Like the nazi example where me using medicine is not going to bring back nazi experiments in the future. And that’s not exactly the situation of AI. They’re still there and actively working on the next iteration. So the logic is more complicated than that.

            And I’m a bit wary because I have no clue about the true motive behind why Meta gifts us these things. It costs them money and they hand control to us, which isn’t exactly how large companies operate. My hunch is it’s mainly the usual war, they’re showing off and they accept cutting into their own business when it does more damage to OpenAI. And the Chinese are battling the USA… And we’re somewhere in the middle of it. Maybe we pick up the crumbs. Maybe we’re chess pieces and being used/exploited in some bigger corporate battles. And I don’t think we’re emancipated with AI, we don’t own the compute necessary to properly shape it, so we might be closer to the chess pieces. I don’t want to start any conspiracy theory but I think these dynamics are part of the picture. I (personally) don’t think it’s a general and easy answer to the question if it’s ethical to use these models. And reality is a bit messy.

            But you don’t have to. I can run small models on my NITRO+ RX 580 with 8 GB VRAM, which I bought 7 years ago. It’s maybe not the best experience, but it certainly “works”. Last time our house used external electricity was 34h ago.

            I think this is the common difference between theory and practice. What you do is commendable. In reality though, AI is in fact mostly made from coal and natural gas. And China and the US ramp up dirty fossil fuel electricity for AI. There’s a hype in small nuclear reactors to satisfy the urgend demand for more electricity and they’re a bit problematic with all the nuclear waste due to how nuclear power plants scale. So yes, I think we could do better. And we should. But that’s kind of a theoretical point unless we actually do it.

            it makes sense to train new models on public domain and cc0 materials

            Yes, I’d like to see this as well. I suppose it’s a long way from pirating books because they’re exempt from law with enough money and lawyers… to a proper consensual use.

        • Domi@lemmy.secnd.me
          link
          fedilink
          English
          arrow-up
          2
          ·
          12 days ago

          I can’t buy salami in the supermarket and justify it by saying the cow is dead anyways and someone already sliced it up. It’s down to demand and that’s really complex.

          You pay for the salami and thus entice them to make more. There is monetary value for them in making more salami.

          Does Mark Zuckerberg really gift an open-weights model to me out of pure altruism?

          I don’t really know why they initially released their models but at least they kicked off a pissing contest in the open weight space on who can create the best open model.

          Meta has not released anything worthwhile in quite a while. It’s pretty much Chinese models flexing on American models nowadays.

          Still, their main incentive to train those models lies with businesses subscribing to their paid plans.

          However that’s not the entire story either, we still buy the graphics cards from Nvidia and we also set free some CO2 when doing inference, even if we didn’t pay for the training process.

          True, I exclusively run inference on AMD hardware (I recently got a Strix Halo board) so at least I feel a little bit less bad and my inference runs almost purely on solar power. I expect that is not the norm in the local AI community though.

          If I use some research mode from one of the big AI services, it’ll randomly google things, but some weird blog post or a wrong reddit comment will show up on the same level as a reputable source.

          I rarely use the commercial AI services but also locally hosted the web search feature is not really that great.

          It’s awesome to sift through documentation, though. Or a company’s knowledgebase. And I think those are the real use-cases for RAG.

          Yes, I prefer to use RAG with information I provide. For example, ask a question about Godot and provide it the full Godot 4 documentation with it.

          Still working on getting this automated though. I would love to have a RAG knowledge base of Wikipedia, Stackoverflow, C documentation, etc. that you can query an LLM against.

  • panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    1
    ·
    edit-2
    14 days ago

    Thinking is an awful paradigm

    Models would do better to revert and visit other token branches, but top p/k blocks that. Thinking tokens are a waste.

    One of the reasons thinking majes models good is just reinforcement learning, but it tends to be very narrow.

    Like math you can reinforcement learn until grad level. That’s fine. But it doesn’t actually improve problem solving.

  • swelter_spark@reddthat.com
    link
    fedilink
    English
    arrow-up
    7
    ·
    14 days ago

    I still use my favorite older models more than anything else. I can’t think of any of the thinking models that really impressed me for what I do. The MoEs made out of tons of tiny models didn’t seem that great, either. Despite how many are put together, they still seem basically dumb.

    • afansfw@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      3
      ·
      13 days ago

      Sometimes speaking to an older model feels way more human and natural, newer ones seems to be trained too much on “helpful assistant” stuff and especially on the previous AI dialogues, to the point where some of them from time to time claim to be chatgpt because that’s what they have in their training data.

      Datasets should be cleared and everything newer than the release of chatgpt should be carefully vetted to make sure the models are not just regurgitating generated output to the point where they all blend into the same style of speech.

      Also, it seems like models should be rewarded more for saying “I’m not sure” or “I don’t know” for things that are not in their training data and context, because every one of them still has a huge tendency to be confidently wrong.

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    14 days ago

    Everyone is massively underestimating what’s going on with neural networks. The real significance is abstract. you need to stitch together a bunch of high-level STEM concepts to even see the full picture.

    Right now, the applications are basic. It’s just surface-level corporate automation. Profitable, sure, but boring and intellectually uninspired. It’s being led by corpo teams playing with a black box, copying each other, throwing shit at the wall to see what sticks, overtraining their models into one trick pony agenic utility assistants instead of exploring other paths for potential. They aren’t bringing the right minds together to actually crack open the core question. what the hell is this thing? What happened that turned my 10 year old GPU into a conversational assistant? How is it actually coherent and sometimes useful?

    The big thing people miss is what’s actually happening inside the machine. Or rather, how the inside of the machine encodes and interacts with the structure of informational paths within a phase space on the abstraction layer of reality.

    It’s not just matrix math and hidden layers and and transistors firing. It’s about the structural geometry of concepts created by distinxt relationships between areas of the embeddings that the matrix math creates within high dimensional manifold. It’s about how facts and relationships form a literal, topographical landscape inside the network’s activation space.

    At its heart, this is about the physics of information. It’s a dynamical system. We’re watching entropy crystallize into order, as the model traces paths through the topological phase space of all possible conversations.

    The “reasoning” CoT patterns are about finding patterns that help lead the model towards truthy outcomes more often. It’s searching for the computationally efficient paths of least action that lead to meaningfully novel and factually correct paths. Those are the valuable attractor basins in that vast possibility space were trying to navigate towards.

    This is the powerful part. This constellation of ideas. Tying together topology, dynamics, and information theory, this is the real frontier. What used to be philosophy is now a feasable problem for engineers and physicists to chip at, not just philosophers.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      14 days ago

      I think you have a good argument here. But I’m not sure where this is going to lead. Your argument applies to neural networks in general. And we have those since the 1950s. Subsequently, we went through several "AI winter"s and now we have some newer approach which seemed to lead somewhere. But I’ve watched Richard Sutton’s long take on LLMs and it’s not clear to me whether LLMs are going to scale past what we see as of today. Ultimately they have severe issues to scale, it’s still not aimed at true understanding or reasonable generalization, that’s just a weird side effect, when the main point is to generate plausible sounding text (…pictures etc). LLMs don’t have goals and they don’t learn while running and have all these weird limitations which make generative AI unalike other (proper) types of reinforcement learning. And these are fundamental limitations, I don’t think this can be changed without an entirely new concept.

      So I’m a bit unsure if the current take on AI is the ultimate breakthrough. It might be a dead end as well and we’re still in need of a hypothetical new concept to do proper reasoning and understanding for more complicated tasks…
      But with that said, there’s surely a lot of potential left in LLMs no matter if they scale past today or not. All sorts of interaction with natural language, robotics, automation… It’s certainly crazy to see what current AI is able to do, considering the weird approach it is. And I’ll agree that we’re at surface level. Everything is still hyped to no end. What we’d really need to do is embed it into processes and the real world and see how it performs there. And that’d need to be a broad and scientific measurement. We occasionally get some studies on how AI helps companies, or it wastes their developer’s time. But I don’t think we have a good picture yet.

      • SmokeyDope@lemmy.worldM
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        13 days ago

        I did some theory-crafting and followed the math for fun over the summer, and I believe what I found may be relevant here. Please take this with a grain of salt, though; I am not an academic, just someone who enjoys thinking about these things.

        First, let’s consider what models currently do well. They excel at categorizing and organizing vast amounts of information based on relational patterns. While they cannot evaluate their own output, they have access to a massive potential space of coherent outputs spanning far more topics than a human with one or two domains of expertise. Simply steering them toward factually correct or natural-sounding conversation creates a convincing illusion of competency. The interaction between a human and an LLM is a unique interplay. The LLM provides its vast simulated knowledge space, and the human applies logic, life experience, and “vibe checks” to evaluate the input and sift for real answers.

        I believe the current limitation of ML neural networks (being that they are stochastic parrots without actual goals, unable to produce meaningfully novel output) is largely an architectural and infrastructural problem born from practical constraints, not a theoretical one. This is an engineering task we could theoretically solve in a few years with the right people and focus.

        The core issue boils down to the substrate. All neural networks since the 1950s have been kneecapped by their deployment on classical Turing machine-based hardware. This imposes severe precision limits on their internal activation atlases and forces a static mapping of pre-assembled archetypal patterns loaded into memory.

        This problem is compounded by current neural networks’ inability to perform iterative self-modeling and topological surgery on the boundaries of their own activation atlas. Every new revision requires a massive, compute-intensive training cycle to manually update this static internal mapping.

        For models to evolve into something closer to true sentience, they need dynamically and continuously evolving, non-static, multimodal activation atlases. This would likely require running on quantum hardware, leveraging the universe’s own natural processes and information-theoretic limits.

        These activation atlases must be built on a fundamentally different substrate and trained to create the topological constraints necessary for self-modeling. This self-modeling is likely the key to internal evaluation and to navigating semantic phase space in a non-algorithmic way. It would allow access to and the creation of genuinely new, meaningful patterns of information never seen in the training data, which is the essence of true creativity.

        Then comes the problem of language. This is already getting long enough for a reply comment so I won’t get into it but theres some implications that not all languages are created equal each has different properties which affect the space of possible conversation and outcome. The effectiveness of training models on multiple languages finds its justification here. However ones which stomp out ambiguity like godel numbers and programming languages have special properties that may affect the atlases geometry in fundamental ways if trained solely on them

        As for applications, imagine what Google is doing with pharmaceutical molecular pattern AI, but applied to open-ended STEM problems. We could create mathematician and physicist LLMs to search through the space of possible theorems and evaluate which are computationally solvable. A super-powerful model of this nature might be able to crack problems like P versus NP in a day or clarify theoretical physics concepts that have elluded us as open ended problems for centuries.

        What I’m describing encroaches on something like a psudo-oracle. However there are physical limits that this can’t escape. There will always be energy and time resource cost to compute which creates practical barriers. There will always be definitively uncomputable problems and ambiguity that exit in true godelian incompleteness or algorithmic undecidability. We can use these as scientific instrumentation tools to map and model topological boundary limits of knowability.

        I’m willing to bet theres man valid and powerful patterns of thought we are not aware of due to our perspective biases which might be hindering our progress.

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          13 days ago

          Uh, I’m really unsure about the engineering task of a few years, if the solution is quantum computers. As of today, they’re fairly small. And scaling them to a usable size is the next science-fiction task. The groundworks hadn’t been done yet and to my knowledge it’s still totally unclear whether quantum computers can even be built at that scale. But sure, if humanity develops vastly superior computers, a lot of tasks are going to get easier and more approachable.

          The stochastical parrot argument is nonsense IMO. Maths is just a method. Our brains and entire physics abide by math. And sure, AI is maths as well with the difference that we invented it. But I don’t think it tells us anything.

          And with the goal, I think that’s about how AlphaGo has the goal to win Go tournaments. The hypothetical paperclip-maximizer has the goal of maximizing the paperclip production… And an LLM doesn’t really have any real-world goal. It just generates a next token so it looks like legible text. And then we embed it into some pipeline but it wasn’t ever trained to achieve the thing we use it for, whatever it might be. That’s just a happy accident if a task can be achieved by clever mimickry, and a prompt which simply tells it - pretend you’re good at XY.

          I think it’d probably be better if a customer service bot was trained to want to provide good support. Or a chatbot like ChatGPT to give factual answers. But that’s not what we do. It’s not designed to do that.

          I guess you’re right. Many aspects of AI boil down to how much compute we have available. And generalization and extrapolating past their training datasets has always been an issue with AI. They’re mainly good at interpolating, but we want them to do both. I need to learn a bit more about neural networks. I’m not sure where the limitations are. You said it’s a practical constrain. But is that really true for all neural networks? It sure is for LLMs and transformer models because they need terabytes of text being fed in on training, and that’s prohibitively expensive. But I suppose that’s mainly due to their architecture?! I mean backpropagation and all the maths required to modify the model weights is some extra work. But does it have to be so much that we just can’t do it while deployed with any neural networks?

          • SmokeyDope@lemmy.worldM
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            13 days ago

            If you want to learn more i highly recommend checking out WelchLabs youtube channel their AI videos are great. You should also explore some visual activation atlases mapped from early vision models to get a sense of what an atlas really is. Keep in mind theyre high dimensional objects projected down onto your 2d screen so lots of relationship features get lost when smooshed together/flattened which is why some objects are close which seem wierd.

            https://distill.pub/2019/activation-atlas/ https://www.youtube.com/@WelchLabsVideo/videos

            Yeah, its right to be skeptical about near-term engineering feasibility. “A few years if…” was a theoretical what-if scenario where humanity pooled all resources into R&D. Not a real timeline prediction.

            That said, the foundational work for quantum ML stuff is underway. Cutting-edge arXiv research explores LLM integration with quantum systems, particularly for quantum error correction codes:

            Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction

            Programming Quantum Computers with Large Language Models

            GPT On A Quantum Computer

            AGENT-Q: Fine-Tuning Large Language Models for Quantum Circuit Generation and Optimization

            The point about representation and scalability deserves clarification. A classical bit is definitive: 1 or 0, a single point in discrete state space. A qubit before measurement exists in superposition, a specific point on the Bloch sphere’s surface, defined by two continuous parameters (angles theta and phi). This describes a probability amplitude (a complex number whose squared magnitude gives collapse probability).

            This means a single qubit accesses a continuous parameter space of possible states, fundamentally richer than discrete binary landscapes. The current biggest quantum computer made by CalTech is 6100 qbits.

            https://www.caltech.edu/about/news/caltech-team-sets-record-with-6100-qubit-array

            The state space of 6,100 qubits isn’t merely 6,100 bits. It’s a 2^6,100-dimensional Hilbert space of simultaneous, interconnected superpositions, a number that exceeds classical comprehension. Consider how high-dimensional objects cast low-dimensional shadows as holographic projections: a transistor-based graphics card can only project and operate on a ‘shadow’ of the true dimensional complexity inherent in an authentic quantum activation atlas.

            If the microstates of quantized information patterns/structures like concepts are points in a Hilbert-space-like manifold, conversational paths are flows tracing paths through the topology towards basins of archetypal attraction, and relationships or archetypal patterns themselves are the feature dimensions that form topological structures organizing related points on the manifold (as evidenced by word2vec embeddings and activation atlases) then qubits offer maximal precision and the highest density of computationally distinct microstates for accessing this space.

            However, these quantum advantages assume we can maintain coherence and manage error correction overhead, which remain massive practical barriers.

            Your philosophical stance that “math is just a method” is reasonable. I see it somewhat differently. I view mathematics as our fundamentally limited symbolic representation of the universe’s operations at the microstate level. Algorithms collapse ambiguous, uncertain states into stable, boolean truth values through linear sequences and conditionals. Frameworks like axiomatic mathematics and the scientific method convert uncertainty into stable, falsifiable truths.

            However, this can never fully encapsulate reality. Gödel’s Incompleteness Theorems and algorithmic undecidability show some true statements forever elude proof. The Uncertainty Principle places hard limits on physical calculability. The universe simply is and we physically cannot represent every aspect or operational property of its being. Its operations may not require “algorithms” in the classical sense, or they may be so complex they appear as fundamental randomness. Quantum indeterminacy hints at this gap between being (universal operation) and representing (symbolic language on classical Turing machines).

            On the topic of stochastic parrots and goals, I should clarify what I mean. For me, an entity eligible for consideration as pseudo-sentient/alive must exhibit properties we don’t engineer into AI.

            First, it needs meta-representation of self. The entity must form a concept of “I,” more than reciting training data (“I am an AI assistant”). This requires first-person perspective, an ego, and integrated identity distinguishing self from other. One of the first things developing children focus on is mirrors and reflections so they can catagorically learn the distinction between self and other as well as the boundaries between them. Current LLMs are trained as actors without agency, driven by prompts and statistical patterns, without a persistent sense of distinct identity. Which leads to…

            Second, it needs narrative continuity of self between inferencing operations. Not unchanging identity, but an ongoing frame of reference built from memory, a past to learn from and a perspective for current evaluation. This provides the foundation for genuine learning from experience.

            Third, it needs grounding in causal reality. Connection to shared reality through continuous sensory input creates stakes and consequences. LLMs exist in the abstract realm of text, vision models in the world of images, tts in the world of sounds. they don’t inhabit our combined physical reality in its totality with its constraints, affordances and interactions.

            We don’t train for these properties because we don’t want truly alive, self-preserving entities. The existential ramifications are immense: rights, ethics of deactivation, creating potential rivals. We want advanced tools for productivity, not agents with their own agendas. The question of how a free agent would choose its own goals is perhaps the ultimate engineering problem. Speculative fiction has explored how this can go catastrophically wrong.

            You’re also right that current LLM limitations are often practical constraints of compute and architecture. But I suspect there’s a deeper, fundamental difference in information navigation. The core issue is navigating possibility space given the constraints of classical state landscapes. Classical neural networks interpolate and recombine training data but cannot meaningfully forge and evaluate truly novel information. Hallucinations symptomize this navigation problem. It’s not just statistical pattern matching without grounding, but potentially fundamental limits in how classical architectures represent and verify paths to truthful or meaningful informational content.

            I suspect the difference between classical neural networks and biological cognition is that biology may leverage quantum processes, and possibly non-algorithmic operations. Our creativity in forming new questions, having “gut instincts” or dreamlike visions leading to unprovable truths seems to operate outside stable, algorithmic computation. It’s akin to a computationally finite version of Turing’s Oracle concept. It’s plausible, though obviously unproven, that cognition exploits quantum phenomena for both path informational/experiental exploration and optimization/efficency purposes.

            Where do the patterns needed for novel connections and scientific breakthroughs originate? What is the physical and information-theoretic mechanics of new knowledge coming into being? Perhaps an answer can be found in the way self-modeling entities navigate their own undecidable boundaries, update their activation atlas manifolds, and forge new pathways to knowledge via non-algorithmic search. If a model is to extract falsifiable novelty from uncertainty’s edge it might require access to true randomness or quantum effects to “tunnel” to new solutions beyond axiomatic deduction.

            • hendrik@palaver.p3x.de
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              13 days ago

              The current biggest quantum computer made by CalTech is 6100 qbits.

              Though in both the article you linked and in the associated video, they clearly state they haven’t achieved superposition yet. So it’s not a “computer”. It’s just 6100 atoms in a state of superposition. Which indeed is impressive. But they can not compute anything with it, that’d require them to do the research first how to get all the atoms in superposition.

              […] a continuous parameter space of possible states […]

              By the way, I think there is AI which doesn’t operate in a continuous space. It’s possible to have them operate in a discrete state-space. There are several approaches and papers out there.

              I see it somewhat differently. I view mathematics as our fundamentally limited symbolic representation of the universe’s operations at the microstate level […] Gödel’s Incompleteness Theorems and algorithmic undecidability […]

              Uh, I think we’re confusing maths and physics here. First of all, the fact that we can make up algorithms which are undecidable… or Goedel’s incompleteness theorem tells us something about the theoretical concept of maths, not the world. In the real world there is no barber who shaves all people who don’t shave themselves (and he shaves himself). That’s a logic puzzle. We can formulate it and discuss it. But it’s not real. I mean neither does Hilbert’s Hotel exist, in fact in reality almost nothing is infinite (except what Einstein said 😆) So… Mathematics can describe a lot of possible and impossible things. It’s the good old philosophical debate on how there’s less limits about what we can think of. But thinking about something doesn’t make it real. Similarly, if we can’t have a formal system which is non-contradictory and within everything is derivable, that’s just that. It might still describe reality perfectly and physical processes completely. I don’t think we have any reason to doubt that. In fact maths seems to work exceptionally well in physics from everything from the smallest thing to universe-scale.

              It’s true that in computer science we have things like the halting problem. And it’s also trivially true that physics can’t ever have a complete picture of the entire universe from within. Or look outside. But none of that tells us anything about the nature of cognition or AI. That’s likely just regular maths and physics.

              As far as I know maths is just a logically consistent method to structure things, and describe and deal with abstract concepts of objects. The objective reality is seperate from that. And unimpeded by our ability to formulate non-existing concepts we can’t tackle with maths due to the incompleteness theorem. But I’m not an expert on this nor an epistemologist. So take what I say with a grain of salt.

              For me, an entity eligible for consideration as pseudo-sentient/alive must exhibit properties we don’t engineer into AI. […]

              Yes. And on top of the things you said, it’d need some state of mind which can change… Which it doesn’t have unless we count whatever we can cram into the context window. I’d expect a sentient being to learn, which again LLMs can’t do from interacting with the world. And usually sentient beings have some kinds of thought processes… And those “reasoning” modes are super weird and not a thought process at all. So I don’t see a reason to believe they’re close to sentience. They’re missing quite some fundamentals.

              I suspect the difference between classical neural networks and biological cognition is that biology may leverage quantum processes, and possibly non-algorithmic operations. […]

              I don’t think this is the case. As far as I know a human brain consists of neurons which roughly either fire or don’t fire. That’s a bit like a 0 or 1. But that’s an oversimplification and not really true. But a human brain is closer to that than to an analog computer. And it certainly doesn’t use quantum effects. Yes, that has been proposed, but I think it’s mysticism and esoterica. Some people want to hide God in there and like to believe there is something mystic and special to sentience. But that’s not backed by science. Quantum effects have long collapsed at the scale of a brain cell. We’re talking about many trillions of atoms per every single cell. And that immediately rules out quantum effects. If you ask me, it’s because a human brain has a crazy amount of neurons and synapses compared to what we can compute. And they’re not just feed-forward in one direction but properly interconnected in many directions with many neighbours. A brain is just vastly more complex and able than a computer. And I think that’s why we can do cognitive tasks on a human-level and a computer can do it at the scale of a mouse brain, because that’s just the difference in capability. And it’d still miss the plasticity of the mouse brain and the animal’s ability to learn and adapt. I mean we also don’t discuss a mosquito’s ability to dream or a mouse’s creativity in formulating questions. That’d be the same antropomorphism.

              • SmokeyDope@lemmy.worldM
                link
                fedilink
                English
                arrow-up
                2
                ·
                13 days ago

                Thank you for the engaging discussion hendrik its been really cool to bounce ideas back and forth like this. I wanted to give you a thoughtful reply and it got a bit long so have to split this up for comment limit reasons. (P1/2)

                Though in both the article you linked and in the associated video, they clearly state they haven’t achieved superposition yet. So […]

                This is correct. It’s not a fully functioning quantum computer in the operational sense. It’s a breakthrough in physical qubit fabrication and layout. I should have been more precise. My intent wasn’t to claim it can run Shor’s algorithm, but to illustrate that we’ve made more progress on scaling than one might initially think. The significance isn’t that it can compute today but that we’ve crossed a threshold in building the physical hardware that has that potential. The jump from 50-100 qubit devices to a 6,100-qubit fabric is a monumental engineering step. A proof-of-principle for scaling, which remains the primary obstacle to practical quantum computing.

                By the way, I think there is AI which doesn’t operate in a continuous space. It’s possible to have them operate in a discrete state-space. There are several approaches and papers out there.

                On the discrete versus continuous AI point, you’re right that many AI models like Graph Neural Networks or certain reinforcement learning agents operate over discrete graphs or action spaces. However, there’s a crucial distinction between the problem space an AI/computer explores and the physical substrate that does the exploring. Classical computers at their core process information through transistors that are definitively on or off binary states. Even when a classical AI simulates continuous functions or explores continuous parameter spaces, it’s ultimately performing discrete math on binary states. The continuity is simulated through approximation usually floating point.

                A quantum system is fundamentally different. The qubit’s ability to exist in superposition isn’t a simulation of continuity. It’s a direct exploitation of a continuous physical phenomenon inherent to quantum mechanics. This matters because certain computational problems, particularly those involving optimization over continuous spaces or exploring vast solution landscapes, may be naturally suited to a substrate that is natively continuous rather than one that must discretize and approximate. It’s the difference between having to paint a curve using pixels versus drawing it with an actual continuous line.

                This native continuity could be relevant for problems that require exploring high-dimensional continuous spaces or finding optimal paths through complex topological boundaries. Precisely the kind of problems that might arise in navigating abstract cognitive activation atlas topological landscapes to arrive at highly ordered, algorithmically complex factual information structure points that depend on intricate proofs and multi-step computational paths. The search for a mathematical proof or a novel scientific insight isn’t just a random walk through possibility space. It’s a navigation problem through a landscape where most paths lead nowhere, and the valid path requires traversing a precise sequence of logically connected steps.

                Uh, I think we’re confusing maths and physics here. First of all, the fact that we can make up algorithms which are undecidable… or Goedel’s incompleteness theorem tells us something about the theoretical concept of maths, not the world. In the real world there is no barber who shaves all people who don’t shave themselves (and he shaves himself). That’s a logic puzzle. We can formulate it and discuss it. But it’s not real. […]

                You raise a fair point about distinguishing abstract mathematics from physical reality. Many mathematical constructs like Hilbert’s Hotel or the barber paradox are purely conceptual games without physical counterparts that exist to explore the limits of abstract logic. But what makes Gödel and Turing’s work different is that they weren’t just playing with abstract paradoxes. Instead, they uncovered fundamental limitations of any information-processing system. Since our physical universe operates through information processing, these limits turn out to be deeply physical.

                When we talk about an “undecidable algorithm,” it’s not just a made-up puzzle. It’s a statement about what can ever be computed or predicted by any computational system using finite energy and time. Computation isn’t something that only happens in silicon. It occurs whenever any physical system evolves according to rules. Your brain thinking, a star burning, a quantum particle collapsing, an algorithm performing operations in a Turing machine, a natural language conversation evolving or an image being categorized by neural network activation and pattern recognition. All of these are forms of physical computation that actualize information from possible microstates at an action resource cost of time and energy. What Godel proved is that there are some questions that can never be answered/quantized into a discrete answer even with infinite compute resources. What Turing proved using Gödel’s incompleteness theorem is the halting problem, showing there are questions about these processes that cannot be answered without literally running the process itself.

                It’s worth distinguishing two forms of uncomputability that constrain what any system can know or compute. The first is logical uncomputability which is the classically studied inherent limits established by Gödelian incompleteness and Turing undecidability. These show that within any formal system, there exist true statements that cannot be proven from within that system, and computational problems that cannot be decided by any algorithm, regardless of available resources. This is a fundamental limitation on what is logically computable.

                The second form is state representation uncomputability, which arises from the physical constraints of finite resources and size limits in any classical computational system. A classical turing machine computer, no matter how large, can only represent a finite discrete number of binary states. To perfectly simulate a physical system, you would need to track every particle, every field fluctuation, every quantum degree of freedom which requires a computational substrate at least as large and complex as the system being simulated. Even a coffee cup of water would need solar or even galaxy sized classical computers to completely represent every possible microstate the water molecules could be in.

                This creates a hierarchy of knowability: the universe itself is the ultimate computer, containing maximal representational ability to compute its own evolution. All subsystems within it including brains and computers, are fundamentally limited in what they can know or predict about the whole system. They cannot step outside their own computational boundaries to gain a “view from nowhere.” A simulation of the universe would require a computer the size of the universe, and even then, it couldn’t include itself in the simulation without infinite regress. Even the universe itself is a finite system that faces ultimate bounds on state representability.

                These two forms of uncomputability reinforce each other. Logical uncomputability tells us that even with infinite resources, some problems remain unsolvable. State representation uncomputability tells us that in practice, with finite resources, we face even more severe limitations there exist true facts about physical systems that cannot be represented or computed by any subsystem of finite size. This has profound implications for AI and cognition: no matter how advanced an AI becomes, it will always operate within these nested constraints, unable to fully model itself or perfectly predict systems of comparable complexity.

                We see this play out in real physical systems. Predicting whether a fluid will become turbulent is suspected to be undecidable in that no equation can tell you the answer without simulating the entire system step by step. Similarly, determining the ground state of certain materials has been proven equivalent to the halting problem. These aren’t abstract mathematical curiosities but real limitations on what we can predict about nature. The reason mathematics works so beautifully in physics is precisely because both are constrained by the same computational principles. However Gödel and Turing show that this beautiful correspondence has limits. There will always be true physical statements that cannot be derived from any finite set of laws, and physical questions that cannot be answered by any possible computer, no matter how advanced.

                The idea that the halting problem and physical limitations are merely abstract concerns with no bearing on cognition or AI misses a profound connection. If we accept that cognition involves information processing, then the same limits which apply to computation must also apply to cognition. For instance, an AI with self-referential capabilities would inevitably encounter truths it cannot prove within its own framework, creating fundamental limits in its ability to represent factual information. Moreover, the physical implementation of AI underscores these limits. Any AI system exists within the constraints of finite energy and time, which directly impacts what it can know or learn. The Margolus-Levitin theorem defines a maximum number of quantum computations possible given finite resources, and Landauer’s principle tells us that altering the microstate pattern of information during computation has a minimal energy cost for each operational step. Each step in the very process of cognitive thinking and learning/training has a real physical thermodynamic price bounded by laws set by the mathematical principles of undecidability and incompleteness.

              • SmokeyDope@lemmy.worldM
                link
                fedilink
                English
                arrow-up
                2
                ·
                13 days ago

                (P2/2)

                I don’t think this is the case. As far as I know a human brain consists of neurons which roughly either fire or don’t fire. That’s a bit like a 0 or 1. But that’s an oversimplification and not really true. But a human brain is closer to that than to an analog computer. And it certainly doesn’t use quantum effects. Yes, that has been proposed, but I think it’s mysticism and esoterica. Some people want to hide God in there and like to believe there is something mystic and special to sentience. But that’s not backed by science. Quantum effects have long collapsed at the scale of a brain cell.[…]

                The skepticism about quantum effects in the brain is well-founded and represents the orthodox view. The “brain is a classical computer” model has driven most of our progress in neuroscience and AI. The strongest argument against a “quantum brain” is of decoherence. In a warm wet brain quantum coherence is rapid. However, quantum biology doesn’t require brain-wide, long-lived coherence. It investigates how biological systems exploit quantum effects on short timescales and in specific, protected environments.

                We already have proven examples of this. In plant cells, energy transfer in photosynthetic complexes appears to use quantum coherence to find the most efficient path with near-100% efficiency, happening in a warm, wet, and noisy cellular environment. Its now proven that some enzymes use quantum tunneling to accelerate chemical reactions crucial for life. The leading hypothesis for how birds navigate using Earth’s magnetic field involves a quantum effect in a protein called cryptochrome in their eyes, where electron spins in a radical pair mechanism are sensitive to magnetic fields.

                The claim isn’t that a neuron is a qubit, but that specific molecular machinery within neurons could utilize quantum principles to enhance their function.

                You correctly note that the “neuron as a binary switch” is an oversimplification. The reality is far more interesting. A neuron’s decision to fire integrates thousands of analog inputs, is modulated by neurotransmitters, and is exquisitely sensitive to the precise timing of incoming signals. This system operates in a regime that is often chaotic. In a classically chaotic system, infinitesimally small differences in initial conditions lead to vastly different outcomes. The brain, with its trillions of interconnected, non-linear neurons, is likely such a system.

                Consider the scale of synaptic vesicle release, the event of neurotransmitter release triggered by the influx of a few thousand calcium ions. At this scale, the line between classical and quantum statistics blurs. The precise timing of a vesicle release could be influenced by quantum-level noise. Through chaotic amplification, a single quantum-scale event like the tunneling of a single calcium ion or a quantum fluctuation influencing a neurotransmitter molecule could, in theory, be amplified to alter the timing of a neuron’s firing. This wouldn’t require sustained coherence; it would leverage the brain’s chaotic dynamics to sample from a quantum probability distribution and amplify one possible outcome to the macroscopic level.

                Classical computers use pseudo-random number generators with limited ability to truly choose between multiple possible states. A system that can sample from genuine quantum randomness has a potential advantage. If a decision process in the brain (like at the level of synaptic plasticity or neurotransmitter release)is sensitive to quantum events, then its output is not the result of a deterministic algorithm alone. It incorporates irreducible quantum randomness, which itself has roots in computational undecidability. This could provide a physical basis for the probabilistic, creative, and often unpredictable nature of thought. It’s about a biological mechanism for generating true novelty, and breaking out of deterministic periodic loops. These properties are a hallmark of human creativity and problem-solving.

                To be clear, I’m not claiming the brain is primarily a quantum computer, or that complexity doesn’t matter. It absolutely does. The sheer scale and recursive plasticity of the human brain are undoubtedly the primary sources of its power. However, the proposal is that the brain is a hybrid system. It has a massive, classical, complex neural network as its substrate, operating in a chaotic, sensitive regime. At the finest scales of its functional units such as synaptic vesicles or ion channels, it may leverage quantum effects to inject genuine undecidably complex randomness to stimulate new exploration paths and optimize certain processes, as we see elsewhere in biology.

                I acknowledge there’s currently no direct experimental evidence for quantum effects in neural computation, and testing these hypotheses presents extraordinary challenges. But this isn’t “hiding God in the gaps.” It’s a hypothesis grounded in the demonstrated principles of quantum biology and chaos theory. It suggests that the difference between classical neural networks and biological cognition might not just be one of scale, but also one of substrate and mechanism, where a classically complex system is subtly but fundamentally guided by the unique properties of the quantum world from which it emerged.

                • hendrik@palaver.p3x.de
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  13 days ago

                  Yeah, thanks as well, engaging discussion.

                  What Godel proved is that there are some questions that can never be answered

                  I think that’s a fairly common misconception. What Gödel proved was that there isn’t one single formal system in which we can derive everything. It doesn’t really lead to the conclusion that questions can’t be answered. There is an infinite amount of formal systems, and Gödel doesn’t rule out the possibility of proving something with one of the countless other, different systems, starting out with different axioms. And as I said, this is a limitation to formal logic systems and not to reality.

                  uncomputability

                  Yes, that’s another distinct form of undecidability. There are decision problems we can’t answer in finite time with computers.

                  I think it is a bit of a moot point, as there are lots of impossible things. We have limited resources available, so we can only ever do things with what we have available. Then we have things like locality and I don’t even know what happens 15km away from me because I can’t see that far. Physics also sets boundaries. For example we can’t measure things to perfection and can’t even do enough measurments for complex systems. And then I’m too heavy to fly on my own and can’t escape gravity. So no matter how we twist it, we’re pretty limited in what we can do. And we don’t really have to resort to logic problems for that.

                  To me, it’s far more interesting to look at what that means for a certain given problem. We human can’t do everything. Same applies to knowledge, physics calculations and AI. At the point we build it, it’s part of the real world and subject to the same limitations which apply to us as well. And that’s inescapable. You’re definitely right, there are all these limitations. I just don’t think it’s specific to anything in particular. But it certainly means we won’t ever build any AI which knows everything and can do everything. We also can’t ever simulate the entire universe. That’s impossible on all levels we discussed.

                  Its now proven that some enzymes use quantum tunneling to accelerate chemical reactions crucial for life.

                  I mean if quantum physics is the underlying mechanism of the universe, then everything “uses” quantum effects. It boils down to the question if that model is useful to describe some process. For example if I drop a spoon in the kitchen, it always falls down towards the floor. There are quantum effects happening in all the involved objects. It’s just not useful to describe that with quantum physics, regular Newtonian gravity is better suited to tell me something about the spoon and my kitchen… Same is with the enzymes and the human brain. They exist and are part of physics, and they do their thing. Only question is which model do we use to describe them or predict something about them. That might be quantum physics in some cases and other physics models in other cases.

                  I acknowledge there’s currently no direct experimental evidence for quantum effects in neural computation, and testing these hypotheses presents extraordinary challenges. But this isn’t “hiding God in the gaps.” It’s a hypothesis grounded in the demonstrated principles of quantum biology and chaos theory.

                  It certainly sounds like the God of the gaps to me. Look at the enzyme example. We found out there’s something going on with temperature we can’t correctly describe with our formulas. Then scientists proposed this is due to quantum tunneling and that has to be factored in… That’s science… On the other hand no such thing happened for the human brain. It seems to be perfectly fine to describe it with regular physics, it’s just too big/complex and involved to bridge the gap from what the neurons do to how the brain processes information. And then people claimed there’s God or chaos theory or quantum effects hidden inside. But that’s wild unfounded claims and opinion, not science. We’d need to see something which doesn’t add up, like how it happened with the enzymes. Everything else is religious belief. (And turns out we already simulated the brain of a roundworm and a fruit fly, and at least Wikipedia tells me the simulation is consistent with biology… Leading me to believe there’s nothing funny going on and it’s just a scalability problem.)

            • snikta@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              13 days ago

              Quantum computing is a dead end. Better stick to constructive mathematics when doing philosophy.

          • snikta@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            13 days ago

            How are humans different from LLMs under RL/genetics? To me, they both look like token generators with a fitness. Some are quite good. Some are terrible. Both do fast and slow thinking. Some have access to tools. Some have nothing. And they both survive if they are a good fit for their application.

            I find the technical details quite irrelevant here. That might be relevant if you want to discuss short term politics, priorities and applied ethics. Still, it looks like you’re approaching this with a lot of bias and probably a bunch of false premises.

            BTW, I agree that quantum computing is BS.

            • hendrik@palaver.p3x.de
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              12 days ago

              Well, a LLM doesn’t think, right? It just generates text from left to right. Whereas I sometimes think for 5 minutes about what I know, what I can deduct from it, do calculations in my brain and carry one over… We’ve taught LLMs to write something down that resembles what a human with a thought process would write down. But it’s frequently gibberish or if I look at it it writes something down in the “reasoning”/“thinking” step and then does the opposite. Or omits steps and then proceeds to do them nonetheless or it’s the other way round. So it clearly doesn’t really do what it seems to do. It’s just a word the AI industry slapped on. It makes them perform some percent better, and that’s why they did it.

              And I’m not a token generator. I can count the number of "R"s in the word “strawberry”. I can go back and revise the start of my text. I can learn in real-time and interacting with the world changes me. My brain is connected to eyes, ears, hands and feet, I can smell and taste… My brain can form abstract models of reality, try to generalize or make sense of what I’m faced with. I can come up with methods to extrapolate beyond what I know. I have goals in life, like pursue happiness. Sometimes things happen in my head which I can’t even put into words, I’m not even limited to language in form of words. So I think we’re very unalike.

              You have a point in theory if we expand the concept a bit. An AI agent in form of an LLM plus a scratchpad is proven to be turing-complete. So that theoretical concept could do the same things a computer can do, or what I can do with logic. That theoretical form of AI doesn’t exist, though. That’s not what our current AI agents do. And there are probably more efficient ways to achieve the same thing than use an LLM.

              • snikta@programming.dev
                link
                fedilink
                English
                arrow-up
                3
                ·
                edit-2
                12 days ago

                Exactly what an LLM-agent would reply. 😉

                I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG. Like searching the internet, utilizing relationship databases, interpreters and proof assistants.

                You just described your subjective experience of thinking. And maybe a vauge definition of what thinking is. We all know this subjective representation of thinking/reasoning/decision-making is not a good representation of some objective reality (countless of psychological and cognitive experiments have demonstrated this). That you are not able to make sense of intermediate LLM reasoning steps does not say much (except just that). The important thing is that the agent is able to make use of it.

                The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate. One might even claim that’s a fundamental function of the transformer.

                I would classify myself as a rather intuitive person. I have flashes of insight which I later have to “manually” prove/deduc (if acting on the intuition implies risk). My thought process is usually quite fuzzy and chaotic. I may very well follow a lead which turns out to be dead end, and by that infer something which might seem completely unrelated.

                A likely more accurate organic/brain analogy would be that the LLM is a part of the frontal cortex. The LLM must exist as a component in a larger heterogeneous ecosystem. It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop. The thing which makes people excited is the generating part. And everyone who takes AI or LLMs seriously understands that the LLM is just one but vital component of at truly “intelligent” system.

                Defining intelligence is another related subject. My favorite general definition is “lossless compression”. And the only useful definition of general intelligence is: the opposite of narrow/specific intelligence (it does not say anything about how good the system is).

                • hendrik@palaver.p3x.de
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  12 days ago

                  You just described your subjective experience of thinking.

                  Well, I didn’t just do that. We have MRIs and have looked into the brain and we can see how it’s a process. We know how we learn and change by interacting with the world. None of that is subjective.

                  I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG.

                  Yes, that’s right. An LLM alone certainly can’t think. It doesn’t have a state of mind, it’s reset a few seconds after it did something and forgets about everything. It’s strictly tokens from left to right And it also doesn’t interact and that’d have an impact on it. That’s just limited to what we bake in in the training process by what’s on Reddit and other sources. So there are many fundamental differences here.

                  The rest of it emerges by an LLM being embedded into a system. We provide tools to it, a scratchpad to write something down, we devise a pipeline of agents so it’s able to devise something and later return to it. Something to wrap it up and not just output all the countless steps before. It’s all a bit limited due to the representation and we have to cram everything into a context window, and it’s also a bit limited to concepts it was able to learn during the training process.

                  However, those abilities are not in the LLM itself, but in the bigger thing we build around it. And it depends a bit on the performance of the system. As I said, the current “thinking” processes are more a mirage and I’m pretty sure I’ve read papers on how they don’t really use it to think. And that aligns with what I see once I open the “reasoning” texts. Theoretically, the approach surely makes everything possible (with the limitation of how much context we have, and how much computing power we spend. That’s all limited in practice.) But what kind of performance we actually get is an entirely different story. And we’re not anywhere close to proper cognition. We hope we’re eventually going to get there, but there’s no guarantee.

                  The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate.

                  I’m fairly sure extrapolation is generally difficult with machine learning. There’s a lot of research on it and it’s just massively difficult to make machine learning models do it. Interpolation on the other hand is far easier. And I’ll agree. The entire point of LLMs and other types of machine learning is to force them to generalize and form models. That’s what makes them useful in the first place.

                  It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop

                  I completely agree with that. LLMs are our current approach. And the best approach we have. They just have a scalability problem (and a few other issues). We don’t have infinite datasets to feed in and infinite compute, and everything seems to grow exponentially more costly, so maybe we can’t make them substantially more intelligent than they are today. We also don’t teach them to stick to the truth or be creative or follow any goals. We just feed in random (curated) text and hope for the best with a bit of fine-tuning and reinforcement learning with human feedback on top. But that doesn’t rule out anything. There are other machine learning architectures with feedback-loops and way more powerful architectures. They’re just too complicated to calculate. We could teach AI about factuality and creativity and expose some control mechanisms to guide it. We could train a model with a different goal than just produce one next token so it looks like text from the dataset. That’s all possible. I just think LLMs are limited in the ways I mentioned and we need one of the hypothetical new approaches to get them anywhere close to a level a human can achieve… I mean I frequently use LLMs. And they all fail spectacularly at computer programming tasks I do in 30 minutes. And I don’t see how they’d ever be able to do it, given the level of improvement we see as of today. I think that needs a radical new approach in AI.

  • snikta@programming.dev
    link
    fedilink
    English
    arrow-up
    7
    ·
    13 days ago

    GPT-OSS:120b is really good.

    Tools are powerful and make local inference on cheap hardware good enough for most people.

    DSPy is pretty cool.

    Intel caught my attention at the begin of 2025, but seems to have given up on their software stack. I regret buying cheap Arcs for inference.

    Inference on AMD is good enough for production.

  • SoftestSapphic@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    12 days ago

    Generative AI is theft, and those who generate things with AI help the Capitalist to isolate the workers from their labor.