Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I’d consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I’ve really tried Kimi, GLM, and Qwen3’s larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    13 days ago

    Everyone is massively underestimating what’s going on with neural networks. The real significance is abstract. you need to stitch together a bunch of high-level STEM concepts to even see the full picture.

    Right now, the applications are basic. It’s just surface-level corporate automation. Profitable, sure, but boring and intellectually uninspired. It’s being led by corpo teams playing with a black box, copying each other, throwing shit at the wall to see what sticks, overtraining their models into one trick pony agenic utility assistants instead of exploring other paths for potential. They aren’t bringing the right minds together to actually crack open the core question. what the hell is this thing? What happened that turned my 10 year old GPU into a conversational assistant? How is it actually coherent and sometimes useful?

    The big thing people miss is what’s actually happening inside the machine. Or rather, how the inside of the machine encodes and interacts with the structure of informational paths within a phase space on the abstraction layer of reality.

    It’s not just matrix math and hidden layers and and transistors firing. It’s about the structural geometry of concepts created by distinxt relationships between areas of the embeddings that the matrix math creates within high dimensional manifold. It’s about how facts and relationships form a literal, topographical landscape inside the network’s activation space.

    At its heart, this is about the physics of information. It’s a dynamical system. We’re watching entropy crystallize into order, as the model traces paths through the topological phase space of all possible conversations.

    The “reasoning” CoT patterns are about finding patterns that help lead the model towards truthy outcomes more often. It’s searching for the computationally efficient paths of least action that lead to meaningfully novel and factually correct paths. Those are the valuable attractor basins in that vast possibility space were trying to navigate towards.

    This is the powerful part. This constellation of ideas. Tying together topology, dynamics, and information theory, this is the real frontier. What used to be philosophy is now a feasable problem for engineers and physicists to chip at, not just philosophers.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      13 days ago

      I think you have a good argument here. But I’m not sure where this is going to lead. Your argument applies to neural networks in general. And we have those since the 1950s. Subsequently, we went through several "AI winter"s and now we have some newer approach which seemed to lead somewhere. But I’ve watched Richard Sutton’s long take on LLMs and it’s not clear to me whether LLMs are going to scale past what we see as of today. Ultimately they have severe issues to scale, it’s still not aimed at true understanding or reasonable generalization, that’s just a weird side effect, when the main point is to generate plausible sounding text (…pictures etc). LLMs don’t have goals and they don’t learn while running and have all these weird limitations which make generative AI unalike other (proper) types of reinforcement learning. And these are fundamental limitations, I don’t think this can be changed without an entirely new concept.

      So I’m a bit unsure if the current take on AI is the ultimate breakthrough. It might be a dead end as well and we’re still in need of a hypothetical new concept to do proper reasoning and understanding for more complicated tasks…
      But with that said, there’s surely a lot of potential left in LLMs no matter if they scale past today or not. All sorts of interaction with natural language, robotics, automation… It’s certainly crazy to see what current AI is able to do, considering the weird approach it is. And I’ll agree that we’re at surface level. Everything is still hyped to no end. What we’d really need to do is embed it into processes and the real world and see how it performs there. And that’d need to be a broad and scientific measurement. We occasionally get some studies on how AI helps companies, or it wastes their developer’s time. But I don’t think we have a good picture yet.

      • SmokeyDope@lemmy.worldM
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        13 days ago

        I did some theory-crafting and followed the math for fun over the summer, and I believe what I found may be relevant here. Please take this with a grain of salt, though; I am not an academic, just someone who enjoys thinking about these things.

        First, let’s consider what models currently do well. They excel at categorizing and organizing vast amounts of information based on relational patterns. While they cannot evaluate their own output, they have access to a massive potential space of coherent outputs spanning far more topics than a human with one or two domains of expertise. Simply steering them toward factually correct or natural-sounding conversation creates a convincing illusion of competency. The interaction between a human and an LLM is a unique interplay. The LLM provides its vast simulated knowledge space, and the human applies logic, life experience, and “vibe checks” to evaluate the input and sift for real answers.

        I believe the current limitation of ML neural networks (being that they are stochastic parrots without actual goals, unable to produce meaningfully novel output) is largely an architectural and infrastructural problem born from practical constraints, not a theoretical one. This is an engineering task we could theoretically solve in a few years with the right people and focus.

        The core issue boils down to the substrate. All neural networks since the 1950s have been kneecapped by their deployment on classical Turing machine-based hardware. This imposes severe precision limits on their internal activation atlases and forces a static mapping of pre-assembled archetypal patterns loaded into memory.

        This problem is compounded by current neural networks’ inability to perform iterative self-modeling and topological surgery on the boundaries of their own activation atlas. Every new revision requires a massive, compute-intensive training cycle to manually update this static internal mapping.

        For models to evolve into something closer to true sentience, they need dynamically and continuously evolving, non-static, multimodal activation atlases. This would likely require running on quantum hardware, leveraging the universe’s own natural processes and information-theoretic limits.

        These activation atlases must be built on a fundamentally different substrate and trained to create the topological constraints necessary for self-modeling. This self-modeling is likely the key to internal evaluation and to navigating semantic phase space in a non-algorithmic way. It would allow access to and the creation of genuinely new, meaningful patterns of information never seen in the training data, which is the essence of true creativity.

        Then comes the problem of language. This is already getting long enough for a reply comment so I won’t get into it but theres some implications that not all languages are created equal each has different properties which affect the space of possible conversation and outcome. The effectiveness of training models on multiple languages finds its justification here. However ones which stomp out ambiguity like godel numbers and programming languages have special properties that may affect the atlases geometry in fundamental ways if trained solely on them

        As for applications, imagine what Google is doing with pharmaceutical molecular pattern AI, but applied to open-ended STEM problems. We could create mathematician and physicist LLMs to search through the space of possible theorems and evaluate which are computationally solvable. A super-powerful model of this nature might be able to crack problems like P versus NP in a day or clarify theoretical physics concepts that have elluded us as open ended problems for centuries.

        What I’m describing encroaches on something like a psudo-oracle. However there are physical limits that this can’t escape. There will always be energy and time resource cost to compute which creates practical barriers. There will always be definitively uncomputable problems and ambiguity that exit in true godelian incompleteness or algorithmic undecidability. We can use these as scientific instrumentation tools to map and model topological boundary limits of knowability.

        I’m willing to bet theres man valid and powerful patterns of thought we are not aware of due to our perspective biases which might be hindering our progress.

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          13 days ago

          Uh, I’m really unsure about the engineering task of a few years, if the solution is quantum computers. As of today, they’re fairly small. And scaling them to a usable size is the next science-fiction task. The groundworks hadn’t been done yet and to my knowledge it’s still totally unclear whether quantum computers can even be built at that scale. But sure, if humanity develops vastly superior computers, a lot of tasks are going to get easier and more approachable.

          The stochastical parrot argument is nonsense IMO. Maths is just a method. Our brains and entire physics abide by math. And sure, AI is maths as well with the difference that we invented it. But I don’t think it tells us anything.

          And with the goal, I think that’s about how AlphaGo has the goal to win Go tournaments. The hypothetical paperclip-maximizer has the goal of maximizing the paperclip production… And an LLM doesn’t really have any real-world goal. It just generates a next token so it looks like legible text. And then we embed it into some pipeline but it wasn’t ever trained to achieve the thing we use it for, whatever it might be. That’s just a happy accident if a task can be achieved by clever mimickry, and a prompt which simply tells it - pretend you’re good at XY.

          I think it’d probably be better if a customer service bot was trained to want to provide good support. Or a chatbot like ChatGPT to give factual answers. But that’s not what we do. It’s not designed to do that.

          I guess you’re right. Many aspects of AI boil down to how much compute we have available. And generalization and extrapolating past their training datasets has always been an issue with AI. They’re mainly good at interpolating, but we want them to do both. I need to learn a bit more about neural networks. I’m not sure where the limitations are. You said it’s a practical constrain. But is that really true for all neural networks? It sure is for LLMs and transformer models because they need terabytes of text being fed in on training, and that’s prohibitively expensive. But I suppose that’s mainly due to their architecture?! I mean backpropagation and all the maths required to modify the model weights is some extra work. But does it have to be so much that we just can’t do it while deployed with any neural networks?

          • SmokeyDope@lemmy.worldM
            link
            fedilink
            English
            arrow-up
            3
            ·
            edit-2
            13 days ago

            If you want to learn more i highly recommend checking out WelchLabs youtube channel their AI videos are great. You should also explore some visual activation atlases mapped from early vision models to get a sense of what an atlas really is. Keep in mind theyre high dimensional objects projected down onto your 2d screen so lots of relationship features get lost when smooshed together/flattened which is why some objects are close which seem wierd.

            https://distill.pub/2019/activation-atlas/ https://www.youtube.com/@WelchLabsVideo/videos

            Yeah, its right to be skeptical about near-term engineering feasibility. “A few years if…” was a theoretical what-if scenario where humanity pooled all resources into R&D. Not a real timeline prediction.

            That said, the foundational work for quantum ML stuff is underway. Cutting-edge arXiv research explores LLM integration with quantum systems, particularly for quantum error correction codes:

            Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction

            Programming Quantum Computers with Large Language Models

            GPT On A Quantum Computer

            AGENT-Q: Fine-Tuning Large Language Models for Quantum Circuit Generation and Optimization

            The point about representation and scalability deserves clarification. A classical bit is definitive: 1 or 0, a single point in discrete state space. A qubit before measurement exists in superposition, a specific point on the Bloch sphere’s surface, defined by two continuous parameters (angles theta and phi). This describes a probability amplitude (a complex number whose squared magnitude gives collapse probability).

            This means a single qubit accesses a continuous parameter space of possible states, fundamentally richer than discrete binary landscapes. The current biggest quantum computer made by CalTech is 6100 qbits.

            https://www.caltech.edu/about/news/caltech-team-sets-record-with-6100-qubit-array

            The state space of 6,100 qubits isn’t merely 6,100 bits. It’s a 2^6,100-dimensional Hilbert space of simultaneous, interconnected superpositions, a number that exceeds classical comprehension. Consider how high-dimensional objects cast low-dimensional shadows as holographic projections: a transistor-based graphics card can only project and operate on a ‘shadow’ of the true dimensional complexity inherent in an authentic quantum activation atlas.

            If the microstates of quantized information patterns/structures like concepts are points in a Hilbert-space-like manifold, conversational paths are flows tracing paths through the topology towards basins of archetypal attraction, and relationships or archetypal patterns themselves are the feature dimensions that form topological structures organizing related points on the manifold (as evidenced by word2vec embeddings and activation atlases) then qubits offer maximal precision and the highest density of computationally distinct microstates for accessing this space.

            However, these quantum advantages assume we can maintain coherence and manage error correction overhead, which remain massive practical barriers.

            Your philosophical stance that “math is just a method” is reasonable. I see it somewhat differently. I view mathematics as our fundamentally limited symbolic representation of the universe’s operations at the microstate level. Algorithms collapse ambiguous, uncertain states into stable, boolean truth values through linear sequences and conditionals. Frameworks like axiomatic mathematics and the scientific method convert uncertainty into stable, falsifiable truths.

            However, this can never fully encapsulate reality. Gödel’s Incompleteness Theorems and algorithmic undecidability show some true statements forever elude proof. The Uncertainty Principle places hard limits on physical calculability. The universe simply is and we physically cannot represent every aspect or operational property of its being. Its operations may not require “algorithms” in the classical sense, or they may be so complex they appear as fundamental randomness. Quantum indeterminacy hints at this gap between being (universal operation) and representing (symbolic language on classical Turing machines).

            On the topic of stochastic parrots and goals, I should clarify what I mean. For me, an entity eligible for consideration as pseudo-sentient/alive must exhibit properties we don’t engineer into AI.

            First, it needs meta-representation of self. The entity must form a concept of “I,” more than reciting training data (“I am an AI assistant”). This requires first-person perspective, an ego, and integrated identity distinguishing self from other. One of the first things developing children focus on is mirrors and reflections so they can catagorically learn the distinction between self and other as well as the boundaries between them. Current LLMs are trained as actors without agency, driven by prompts and statistical patterns, without a persistent sense of distinct identity. Which leads to…

            Second, it needs narrative continuity of self between inferencing operations. Not unchanging identity, but an ongoing frame of reference built from memory, a past to learn from and a perspective for current evaluation. This provides the foundation for genuine learning from experience.

            Third, it needs grounding in causal reality. Connection to shared reality through continuous sensory input creates stakes and consequences. LLMs exist in the abstract realm of text, vision models in the world of images, tts in the world of sounds. they don’t inhabit our combined physical reality in its totality with its constraints, affordances and interactions.

            We don’t train for these properties because we don’t want truly alive, self-preserving entities. The existential ramifications are immense: rights, ethics of deactivation, creating potential rivals. We want advanced tools for productivity, not agents with their own agendas. The question of how a free agent would choose its own goals is perhaps the ultimate engineering problem. Speculative fiction has explored how this can go catastrophically wrong.

            You’re also right that current LLM limitations are often practical constraints of compute and architecture. But I suspect there’s a deeper, fundamental difference in information navigation. The core issue is navigating possibility space given the constraints of classical state landscapes. Classical neural networks interpolate and recombine training data but cannot meaningfully forge and evaluate truly novel information. Hallucinations symptomize this navigation problem. It’s not just statistical pattern matching without grounding, but potentially fundamental limits in how classical architectures represent and verify paths to truthful or meaningful informational content.

            I suspect the difference between classical neural networks and biological cognition is that biology may leverage quantum processes, and possibly non-algorithmic operations. Our creativity in forming new questions, having “gut instincts” or dreamlike visions leading to unprovable truths seems to operate outside stable, algorithmic computation. It’s akin to a computationally finite version of Turing’s Oracle concept. It’s plausible, though obviously unproven, that cognition exploits quantum phenomena for both path informational/experiental exploration and optimization/efficency purposes.

            Where do the patterns needed for novel connections and scientific breakthroughs originate? What is the physical and information-theoretic mechanics of new knowledge coming into being? Perhaps an answer can be found in the way self-modeling entities navigate their own undecidable boundaries, update their activation atlas manifolds, and forge new pathways to knowledge via non-algorithmic search. If a model is to extract falsifiable novelty from uncertainty’s edge it might require access to true randomness or quantum effects to “tunnel” to new solutions beyond axiomatic deduction.

            • hendrik@palaver.p3x.de
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              13 days ago

              The current biggest quantum computer made by CalTech is 6100 qbits.

              Though in both the article you linked and in the associated video, they clearly state they haven’t achieved superposition yet. So it’s not a “computer”. It’s just 6100 atoms in a state of superposition. Which indeed is impressive. But they can not compute anything with it, that’d require them to do the research first how to get all the atoms in superposition.

              […] a continuous parameter space of possible states […]

              By the way, I think there is AI which doesn’t operate in a continuous space. It’s possible to have them operate in a discrete state-space. There are several approaches and papers out there.

              I see it somewhat differently. I view mathematics as our fundamentally limited symbolic representation of the universe’s operations at the microstate level […] Gödel’s Incompleteness Theorems and algorithmic undecidability […]

              Uh, I think we’re confusing maths and physics here. First of all, the fact that we can make up algorithms which are undecidable… or Goedel’s incompleteness theorem tells us something about the theoretical concept of maths, not the world. In the real world there is no barber who shaves all people who don’t shave themselves (and he shaves himself). That’s a logic puzzle. We can formulate it and discuss it. But it’s not real. I mean neither does Hilbert’s Hotel exist, in fact in reality almost nothing is infinite (except what Einstein said 😆) So… Mathematics can describe a lot of possible and impossible things. It’s the good old philosophical debate on how there’s less limits about what we can think of. But thinking about something doesn’t make it real. Similarly, if we can’t have a formal system which is non-contradictory and within everything is derivable, that’s just that. It might still describe reality perfectly and physical processes completely. I don’t think we have any reason to doubt that. In fact maths seems to work exceptionally well in physics from everything from the smallest thing to universe-scale.

              It’s true that in computer science we have things like the halting problem. And it’s also trivially true that physics can’t ever have a complete picture of the entire universe from within. Or look outside. But none of that tells us anything about the nature of cognition or AI. That’s likely just regular maths and physics.

              As far as I know maths is just a logically consistent method to structure things, and describe and deal with abstract concepts of objects. The objective reality is seperate from that. And unimpeded by our ability to formulate non-existing concepts we can’t tackle with maths due to the incompleteness theorem. But I’m not an expert on this nor an epistemologist. So take what I say with a grain of salt.

              For me, an entity eligible for consideration as pseudo-sentient/alive must exhibit properties we don’t engineer into AI. […]

              Yes. And on top of the things you said, it’d need some state of mind which can change… Which it doesn’t have unless we count whatever we can cram into the context window. I’d expect a sentient being to learn, which again LLMs can’t do from interacting with the world. And usually sentient beings have some kinds of thought processes… And those “reasoning” modes are super weird and not a thought process at all. So I don’t see a reason to believe they’re close to sentience. They’re missing quite some fundamentals.

              I suspect the difference between classical neural networks and biological cognition is that biology may leverage quantum processes, and possibly non-algorithmic operations. […]

              I don’t think this is the case. As far as I know a human brain consists of neurons which roughly either fire or don’t fire. That’s a bit like a 0 or 1. But that’s an oversimplification and not really true. But a human brain is closer to that than to an analog computer. And it certainly doesn’t use quantum effects. Yes, that has been proposed, but I think it’s mysticism and esoterica. Some people want to hide God in there and like to believe there is something mystic and special to sentience. But that’s not backed by science. Quantum effects have long collapsed at the scale of a brain cell. We’re talking about many trillions of atoms per every single cell. And that immediately rules out quantum effects. If you ask me, it’s because a human brain has a crazy amount of neurons and synapses compared to what we can compute. And they’re not just feed-forward in one direction but properly interconnected in many directions with many neighbours. A brain is just vastly more complex and able than a computer. And I think that’s why we can do cognitive tasks on a human-level and a computer can do it at the scale of a mouse brain, because that’s just the difference in capability. And it’d still miss the plasticity of the mouse brain and the animal’s ability to learn and adapt. I mean we also don’t discuss a mosquito’s ability to dream or a mouse’s creativity in formulating questions. That’d be the same antropomorphism.

              • SmokeyDope@lemmy.worldM
                link
                fedilink
                English
                arrow-up
                2
                ·
                12 days ago

                Thank you for the engaging discussion hendrik its been really cool to bounce ideas back and forth like this. I wanted to give you a thoughtful reply and it got a bit long so have to split this up for comment limit reasons. (P1/2)

                Though in both the article you linked and in the associated video, they clearly state they haven’t achieved superposition yet. So […]

                This is correct. It’s not a fully functioning quantum computer in the operational sense. It’s a breakthrough in physical qubit fabrication and layout. I should have been more precise. My intent wasn’t to claim it can run Shor’s algorithm, but to illustrate that we’ve made more progress on scaling than one might initially think. The significance isn’t that it can compute today but that we’ve crossed a threshold in building the physical hardware that has that potential. The jump from 50-100 qubit devices to a 6,100-qubit fabric is a monumental engineering step. A proof-of-principle for scaling, which remains the primary obstacle to practical quantum computing.

                By the way, I think there is AI which doesn’t operate in a continuous space. It’s possible to have them operate in a discrete state-space. There are several approaches and papers out there.

                On the discrete versus continuous AI point, you’re right that many AI models like Graph Neural Networks or certain reinforcement learning agents operate over discrete graphs or action spaces. However, there’s a crucial distinction between the problem space an AI/computer explores and the physical substrate that does the exploring. Classical computers at their core process information through transistors that are definitively on or off binary states. Even when a classical AI simulates continuous functions or explores continuous parameter spaces, it’s ultimately performing discrete math on binary states. The continuity is simulated through approximation usually floating point.

                A quantum system is fundamentally different. The qubit’s ability to exist in superposition isn’t a simulation of continuity. It’s a direct exploitation of a continuous physical phenomenon inherent to quantum mechanics. This matters because certain computational problems, particularly those involving optimization over continuous spaces or exploring vast solution landscapes, may be naturally suited to a substrate that is natively continuous rather than one that must discretize and approximate. It’s the difference between having to paint a curve using pixels versus drawing it with an actual continuous line.

                This native continuity could be relevant for problems that require exploring high-dimensional continuous spaces or finding optimal paths through complex topological boundaries. Precisely the kind of problems that might arise in navigating abstract cognitive activation atlas topological landscapes to arrive at highly ordered, algorithmically complex factual information structure points that depend on intricate proofs and multi-step computational paths. The search for a mathematical proof or a novel scientific insight isn’t just a random walk through possibility space. It’s a navigation problem through a landscape where most paths lead nowhere, and the valid path requires traversing a precise sequence of logically connected steps.

                Uh, I think we’re confusing maths and physics here. First of all, the fact that we can make up algorithms which are undecidable… or Goedel’s incompleteness theorem tells us something about the theoretical concept of maths, not the world. In the real world there is no barber who shaves all people who don’t shave themselves (and he shaves himself). That’s a logic puzzle. We can formulate it and discuss it. But it’s not real. […]

                You raise a fair point about distinguishing abstract mathematics from physical reality. Many mathematical constructs like Hilbert’s Hotel or the barber paradox are purely conceptual games without physical counterparts that exist to explore the limits of abstract logic. But what makes Gödel and Turing’s work different is that they weren’t just playing with abstract paradoxes. Instead, they uncovered fundamental limitations of any information-processing system. Since our physical universe operates through information processing, these limits turn out to be deeply physical.

                When we talk about an “undecidable algorithm,” it’s not just a made-up puzzle. It’s a statement about what can ever be computed or predicted by any computational system using finite energy and time. Computation isn’t something that only happens in silicon. It occurs whenever any physical system evolves according to rules. Your brain thinking, a star burning, a quantum particle collapsing, an algorithm performing operations in a Turing machine, a natural language conversation evolving or an image being categorized by neural network activation and pattern recognition. All of these are forms of physical computation that actualize information from possible microstates at an action resource cost of time and energy. What Godel proved is that there are some questions that can never be answered/quantized into a discrete answer even with infinite compute resources. What Turing proved using Gödel’s incompleteness theorem is the halting problem, showing there are questions about these processes that cannot be answered without literally running the process itself.

                It’s worth distinguishing two forms of uncomputability that constrain what any system can know or compute. The first is logical uncomputability which is the classically studied inherent limits established by Gödelian incompleteness and Turing undecidability. These show that within any formal system, there exist true statements that cannot be proven from within that system, and computational problems that cannot be decided by any algorithm, regardless of available resources. This is a fundamental limitation on what is logically computable.

                The second form is state representation uncomputability, which arises from the physical constraints of finite resources and size limits in any classical computational system. A classical turing machine computer, no matter how large, can only represent a finite discrete number of binary states. To perfectly simulate a physical system, you would need to track every particle, every field fluctuation, every quantum degree of freedom which requires a computational substrate at least as large and complex as the system being simulated. Even a coffee cup of water would need solar or even galaxy sized classical computers to completely represent every possible microstate the water molecules could be in.

                This creates a hierarchy of knowability: the universe itself is the ultimate computer, containing maximal representational ability to compute its own evolution. All subsystems within it including brains and computers, are fundamentally limited in what they can know or predict about the whole system. They cannot step outside their own computational boundaries to gain a “view from nowhere.” A simulation of the universe would require a computer the size of the universe, and even then, it couldn’t include itself in the simulation without infinite regress. Even the universe itself is a finite system that faces ultimate bounds on state representability.

                These two forms of uncomputability reinforce each other. Logical uncomputability tells us that even with infinite resources, some problems remain unsolvable. State representation uncomputability tells us that in practice, with finite resources, we face even more severe limitations there exist true facts about physical systems that cannot be represented or computed by any subsystem of finite size. This has profound implications for AI and cognition: no matter how advanced an AI becomes, it will always operate within these nested constraints, unable to fully model itself or perfectly predict systems of comparable complexity.

                We see this play out in real physical systems. Predicting whether a fluid will become turbulent is suspected to be undecidable in that no equation can tell you the answer without simulating the entire system step by step. Similarly, determining the ground state of certain materials has been proven equivalent to the halting problem. These aren’t abstract mathematical curiosities but real limitations on what we can predict about nature. The reason mathematics works so beautifully in physics is precisely because both are constrained by the same computational principles. However Gödel and Turing show that this beautiful correspondence has limits. There will always be true physical statements that cannot be derived from any finite set of laws, and physical questions that cannot be answered by any possible computer, no matter how advanced.

                The idea that the halting problem and physical limitations are merely abstract concerns with no bearing on cognition or AI misses a profound connection. If we accept that cognition involves information processing, then the same limits which apply to computation must also apply to cognition. For instance, an AI with self-referential capabilities would inevitably encounter truths it cannot prove within its own framework, creating fundamental limits in its ability to represent factual information. Moreover, the physical implementation of AI underscores these limits. Any AI system exists within the constraints of finite energy and time, which directly impacts what it can know or learn. The Margolus-Levitin theorem defines a maximum number of quantum computations possible given finite resources, and Landauer’s principle tells us that altering the microstate pattern of information during computation has a minimal energy cost for each operational step. Each step in the very process of cognitive thinking and learning/training has a real physical thermodynamic price bounded by laws set by the mathematical principles of undecidability and incompleteness.

              • SmokeyDope@lemmy.worldM
                link
                fedilink
                English
                arrow-up
                2
                ·
                12 days ago

                (P2/2)

                I don’t think this is the case. As far as I know a human brain consists of neurons which roughly either fire or don’t fire. That’s a bit like a 0 or 1. But that’s an oversimplification and not really true. But a human brain is closer to that than to an analog computer. And it certainly doesn’t use quantum effects. Yes, that has been proposed, but I think it’s mysticism and esoterica. Some people want to hide God in there and like to believe there is something mystic and special to sentience. But that’s not backed by science. Quantum effects have long collapsed at the scale of a brain cell.[…]

                The skepticism about quantum effects in the brain is well-founded and represents the orthodox view. The “brain is a classical computer” model has driven most of our progress in neuroscience and AI. The strongest argument against a “quantum brain” is of decoherence. In a warm wet brain quantum coherence is rapid. However, quantum biology doesn’t require brain-wide, long-lived coherence. It investigates how biological systems exploit quantum effects on short timescales and in specific, protected environments.

                We already have proven examples of this. In plant cells, energy transfer in photosynthetic complexes appears to use quantum coherence to find the most efficient path with near-100% efficiency, happening in a warm, wet, and noisy cellular environment. Its now proven that some enzymes use quantum tunneling to accelerate chemical reactions crucial for life. The leading hypothesis for how birds navigate using Earth’s magnetic field involves a quantum effect in a protein called cryptochrome in their eyes, where electron spins in a radical pair mechanism are sensitive to magnetic fields.

                The claim isn’t that a neuron is a qubit, but that specific molecular machinery within neurons could utilize quantum principles to enhance their function.

                You correctly note that the “neuron as a binary switch” is an oversimplification. The reality is far more interesting. A neuron’s decision to fire integrates thousands of analog inputs, is modulated by neurotransmitters, and is exquisitely sensitive to the precise timing of incoming signals. This system operates in a regime that is often chaotic. In a classically chaotic system, infinitesimally small differences in initial conditions lead to vastly different outcomes. The brain, with its trillions of interconnected, non-linear neurons, is likely such a system.

                Consider the scale of synaptic vesicle release, the event of neurotransmitter release triggered by the influx of a few thousand calcium ions. At this scale, the line between classical and quantum statistics blurs. The precise timing of a vesicle release could be influenced by quantum-level noise. Through chaotic amplification, a single quantum-scale event like the tunneling of a single calcium ion or a quantum fluctuation influencing a neurotransmitter molecule could, in theory, be amplified to alter the timing of a neuron’s firing. This wouldn’t require sustained coherence; it would leverage the brain’s chaotic dynamics to sample from a quantum probability distribution and amplify one possible outcome to the macroscopic level.

                Classical computers use pseudo-random number generators with limited ability to truly choose between multiple possible states. A system that can sample from genuine quantum randomness has a potential advantage. If a decision process in the brain (like at the level of synaptic plasticity or neurotransmitter release)is sensitive to quantum events, then its output is not the result of a deterministic algorithm alone. It incorporates irreducible quantum randomness, which itself has roots in computational undecidability. This could provide a physical basis for the probabilistic, creative, and often unpredictable nature of thought. It’s about a biological mechanism for generating true novelty, and breaking out of deterministic periodic loops. These properties are a hallmark of human creativity and problem-solving.

                To be clear, I’m not claiming the brain is primarily a quantum computer, or that complexity doesn’t matter. It absolutely does. The sheer scale and recursive plasticity of the human brain are undoubtedly the primary sources of its power. However, the proposal is that the brain is a hybrid system. It has a massive, classical, complex neural network as its substrate, operating in a chaotic, sensitive regime. At the finest scales of its functional units such as synaptic vesicles or ion channels, it may leverage quantum effects to inject genuine undecidably complex randomness to stimulate new exploration paths and optimize certain processes, as we see elsewhere in biology.

                I acknowledge there’s currently no direct experimental evidence for quantum effects in neural computation, and testing these hypotheses presents extraordinary challenges. But this isn’t “hiding God in the gaps.” It’s a hypothesis grounded in the demonstrated principles of quantum biology and chaos theory. It suggests that the difference between classical neural networks and biological cognition might not just be one of scale, but also one of substrate and mechanism, where a classically complex system is subtly but fundamentally guided by the unique properties of the quantum world from which it emerged.

                • hendrik@palaver.p3x.de
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  12 days ago

                  Yeah, thanks as well, engaging discussion.

                  What Godel proved is that there are some questions that can never be answered

                  I think that’s a fairly common misconception. What Gödel proved was that there isn’t one single formal system in which we can derive everything. It doesn’t really lead to the conclusion that questions can’t be answered. There is an infinite amount of formal systems, and Gödel doesn’t rule out the possibility of proving something with one of the countless other, different systems, starting out with different axioms. And as I said, this is a limitation to formal logic systems and not to reality.

                  uncomputability

                  Yes, that’s another distinct form of undecidability. There are decision problems we can’t answer in finite time with computers.

                  I think it is a bit of a moot point, as there are lots of impossible things. We have limited resources available, so we can only ever do things with what we have available. Then we have things like locality and I don’t even know what happens 15km away from me because I can’t see that far. Physics also sets boundaries. For example we can’t measure things to perfection and can’t even do enough measurments for complex systems. And then I’m too heavy to fly on my own and can’t escape gravity. So no matter how we twist it, we’re pretty limited in what we can do. And we don’t really have to resort to logic problems for that.

                  To me, it’s far more interesting to look at what that means for a certain given problem. We human can’t do everything. Same applies to knowledge, physics calculations and AI. At the point we build it, it’s part of the real world and subject to the same limitations which apply to us as well. And that’s inescapable. You’re definitely right, there are all these limitations. I just don’t think it’s specific to anything in particular. But it certainly means we won’t ever build any AI which knows everything and can do everything. We also can’t ever simulate the entire universe. That’s impossible on all levels we discussed.

                  Its now proven that some enzymes use quantum tunneling to accelerate chemical reactions crucial for life.

                  I mean if quantum physics is the underlying mechanism of the universe, then everything “uses” quantum effects. It boils down to the question if that model is useful to describe some process. For example if I drop a spoon in the kitchen, it always falls down towards the floor. There are quantum effects happening in all the involved objects. It’s just not useful to describe that with quantum physics, regular Newtonian gravity is better suited to tell me something about the spoon and my kitchen… Same is with the enzymes and the human brain. They exist and are part of physics, and they do their thing. Only question is which model do we use to describe them or predict something about them. That might be quantum physics in some cases and other physics models in other cases.

                  I acknowledge there’s currently no direct experimental evidence for quantum effects in neural computation, and testing these hypotheses presents extraordinary challenges. But this isn’t “hiding God in the gaps.” It’s a hypothesis grounded in the demonstrated principles of quantum biology and chaos theory.

                  It certainly sounds like the God of the gaps to me. Look at the enzyme example. We found out there’s something going on with temperature we can’t correctly describe with our formulas. Then scientists proposed this is due to quantum tunneling and that has to be factored in… That’s science… On the other hand no such thing happened for the human brain. It seems to be perfectly fine to describe it with regular physics, it’s just too big/complex and involved to bridge the gap from what the neurons do to how the brain processes information. And then people claimed there’s God or chaos theory or quantum effects hidden inside. But that’s wild unfounded claims and opinion, not science. We’d need to see something which doesn’t add up, like how it happened with the enzymes. Everything else is religious belief. (And turns out we already simulated the brain of a roundworm and a fruit fly, and at least Wikipedia tells me the simulation is consistent with biology… Leading me to believe there’s nothing funny going on and it’s just a scalability problem.)

            • snikta@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              12 days ago

              Quantum computing is a dead end. Better stick to constructive mathematics when doing philosophy.

          • snikta@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            12 days ago

            How are humans different from LLMs under RL/genetics? To me, they both look like token generators with a fitness. Some are quite good. Some are terrible. Both do fast and slow thinking. Some have access to tools. Some have nothing. And they both survive if they are a good fit for their application.

            I find the technical details quite irrelevant here. That might be relevant if you want to discuss short term politics, priorities and applied ethics. Still, it looks like you’re approaching this with a lot of bias and probably a bunch of false premises.

            BTW, I agree that quantum computing is BS.

            • hendrik@palaver.p3x.de
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              12 days ago

              Well, a LLM doesn’t think, right? It just generates text from left to right. Whereas I sometimes think for 5 minutes about what I know, what I can deduct from it, do calculations in my brain and carry one over… We’ve taught LLMs to write something down that resembles what a human with a thought process would write down. But it’s frequently gibberish or if I look at it it writes something down in the “reasoning”/“thinking” step and then does the opposite. Or omits steps and then proceeds to do them nonetheless or it’s the other way round. So it clearly doesn’t really do what it seems to do. It’s just a word the AI industry slapped on. It makes them perform some percent better, and that’s why they did it.

              And I’m not a token generator. I can count the number of "R"s in the word “strawberry”. I can go back and revise the start of my text. I can learn in real-time and interacting with the world changes me. My brain is connected to eyes, ears, hands and feet, I can smell and taste… My brain can form abstract models of reality, try to generalize or make sense of what I’m faced with. I can come up with methods to extrapolate beyond what I know. I have goals in life, like pursue happiness. Sometimes things happen in my head which I can’t even put into words, I’m not even limited to language in form of words. So I think we’re very unalike.

              You have a point in theory if we expand the concept a bit. An AI agent in form of an LLM plus a scratchpad is proven to be turing-complete. So that theoretical concept could do the same things a computer can do, or what I can do with logic. That theoretical form of AI doesn’t exist, though. That’s not what our current AI agents do. And there are probably more efficient ways to achieve the same thing than use an LLM.

              • snikta@programming.dev
                link
                fedilink
                English
                arrow-up
                3
                ·
                edit-2
                12 days ago

                Exactly what an LLM-agent would reply. 😉

                I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG. Like searching the internet, utilizing relationship databases, interpreters and proof assistants.

                You just described your subjective experience of thinking. And maybe a vauge definition of what thinking is. We all know this subjective representation of thinking/reasoning/decision-making is not a good representation of some objective reality (countless of psychological and cognitive experiments have demonstrated this). That you are not able to make sense of intermediate LLM reasoning steps does not say much (except just that). The important thing is that the agent is able to make use of it.

                The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate. One might even claim that’s a fundamental function of the transformer.

                I would classify myself as a rather intuitive person. I have flashes of insight which I later have to “manually” prove/deduc (if acting on the intuition implies risk). My thought process is usually quite fuzzy and chaotic. I may very well follow a lead which turns out to be dead end, and by that infer something which might seem completely unrelated.

                A likely more accurate organic/brain analogy would be that the LLM is a part of the frontal cortex. The LLM must exist as a component in a larger heterogeneous ecosystem. It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop. The thing which makes people excited is the generating part. And everyone who takes AI or LLMs seriously understands that the LLM is just one but vital component of at truly “intelligent” system.

                Defining intelligence is another related subject. My favorite general definition is “lossless compression”. And the only useful definition of general intelligence is: the opposite of narrow/specific intelligence (it does not say anything about how good the system is).

                • hendrik@palaver.p3x.de
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  12 days ago

                  You just described your subjective experience of thinking.

                  Well, I didn’t just do that. We have MRIs and have looked into the brain and we can see how it’s a process. We know how we learn and change by interacting with the world. None of that is subjective.

                  I would say that the LLM-based agent thinks. And thinking is not only “steps of reasoning”, but also using external tools for RAG.

                  Yes, that’s right. An LLM alone certainly can’t think. It doesn’t have a state of mind, it’s reset a few seconds after it did something and forgets about everything. It’s strictly tokens from left to right And it also doesn’t interact and that’d have an impact on it. That’s just limited to what we bake in in the training process by what’s on Reddit and other sources. So there are many fundamental differences here.

                  The rest of it emerges by an LLM being embedded into a system. We provide tools to it, a scratchpad to write something down, we devise a pipeline of agents so it’s able to devise something and later return to it. Something to wrap it up and not just output all the countless steps before. It’s all a bit limited due to the representation and we have to cram everything into a context window, and it’s also a bit limited to concepts it was able to learn during the training process.

                  However, those abilities are not in the LLM itself, but in the bigger thing we build around it. And it depends a bit on the performance of the system. As I said, the current “thinking” processes are more a mirage and I’m pretty sure I’ve read papers on how they don’t really use it to think. And that aligns with what I see once I open the “reasoning” texts. Theoretically, the approach surely makes everything possible (with the limitation of how much context we have, and how much computing power we spend. That’s all limited in practice.) But what kind of performance we actually get is an entirely different story. And we’re not anywhere close to proper cognition. We hope we’re eventually going to get there, but there’s no guarantee.

                  The LLM can for sure make abstract models of reality, generalize, create analogies and then extrapolate.

                  I’m fairly sure extrapolation is generally difficult with machine learning. There’s a lot of research on it and it’s just massively difficult to make machine learning models do it. Interpolation on the other hand is far easier. And I’ll agree. The entire point of LLMs and other types of machine learning is to force them to generalize and form models. That’s what makes them useful in the first place.

                  It doesn’t even have to be an LLM. Some kind of generative or inference engine that produce useful information which can then be modified and corrected by other more specialized components and also inserted into some feedback loop

                  I completely agree with that. LLMs are our current approach. And the best approach we have. They just have a scalability problem (and a few other issues). We don’t have infinite datasets to feed in and infinite compute, and everything seems to grow exponentially more costly, so maybe we can’t make them substantially more intelligent than they are today. We also don’t teach them to stick to the truth or be creative or follow any goals. We just feed in random (curated) text and hope for the best with a bit of fine-tuning and reinforcement learning with human feedback on top. But that doesn’t rule out anything. There are other machine learning architectures with feedback-loops and way more powerful architectures. They’re just too complicated to calculate. We could teach AI about factuality and creativity and expose some control mechanisms to guide it. We could train a model with a different goal than just produce one next token so it looks like text from the dataset. That’s all possible. I just think LLMs are limited in the ways I mentioned and we need one of the hypothetical new approaches to get them anywhere close to a level a human can achieve… I mean I frequently use LLMs. And they all fail spectacularly at computer programming tasks I do in 30 minutes. And I don’t see how they’d ever be able to do it, given the level of improvement we see as of today. I think that needs a radical new approach in AI.