The only fundamental issue with the CPU and tensors is the L2 to L1 cache bus width. This cannot be altered and maintain the speed. This is not a real issue in the grand scheme of things. It is only an issue with the total design cycle. Don’t get sucked into the little world of marketing nonsense surrounding specific fab nodes and whatever spin nonsense the sales fools are pedaling. Real hardware takes 10 years from initial concept to first market availability. Nvidia was lucky because their plans happened to align with the AI boom. They could adjust a few minor packaging tweaks to tailor the existing designs in the pipeline to the present market, but they had no prescient genius about how AI would explode like the last two years. Such a premise assumes they began the 40 series knowing about the AI boom in 2012, nearly 4 years before OpenAI was founded.
The FPGA does not work for AI. It does not scale like you assume and the power required is untenable. You can find information about Intel/Altera AI researchers that were well funded and traversed this path before the constraints were discovered. You need simpler architecture with a lower transistor count. This is like the issue with static RAM versus DRAM. Static is functionally superior in nearly every way, but it simply can’t scale due to power and space requirements.
With tensors all that is needed is throughput. That is a solvable problem. Single thread speeds in CPUs is a sales gimmick and nothing more. Your brain is a much more powerful biological computer and operates on 3 main clocks the fastest of which is only around 100 Hz. Parallelism can be used to create an even faster and more rich user experience than the present. This is the future. The dual processor paradigm has been done before in the x286 - x386 era and it failed because data centers rejected such a dual processor paradigm in favor of slightly better hardware that was nearly good enough. This is the reality of the present too. Any hardware that is good enough to do both workloads will be adopted by data centers and therefore the market. This is where the real design edge is made and all consumer products are derived.
None of Nvidia’s products are relevant 8 years from now. They are a temporary hack. This is why they must use their enormous capital to buy a new future beyond the GPU, and they will.
The only fundamental issue with the CPU and tensors is the L2 to L1 cache bus width. This cannot be altered and maintain the speed. This is not a real issue in the grand scheme of things. It is only an issue with the total design cycle. Don’t get sucked into the little world of marketing nonsense surrounding specific fab nodes and whatever spin nonsense the sales fools are pedaling. Real hardware takes 10 years from initial concept to first market availability. Nvidia was lucky because their plans happened to align with the AI boom. They could adjust a few minor packaging tweaks to tailor the existing designs in the pipeline to the present market, but they had no prescient genius about how AI would explode like the last two years. Such a premise assumes they began the 40 series knowing about the AI boom in 2012, nearly 4 years before OpenAI was founded.
The FPGA does not work for AI. It does not scale like you assume and the power required is untenable. You can find information about Intel/Altera AI researchers that were well funded and traversed this path before the constraints were discovered. You need simpler architecture with a lower transistor count. This is like the issue with static RAM versus DRAM. Static is functionally superior in nearly every way, but it simply can’t scale due to power and space requirements.
With tensors all that is needed is throughput. That is a solvable problem. Single thread speeds in CPUs is a sales gimmick and nothing more. Your brain is a much more powerful biological computer and operates on 3 main clocks the fastest of which is only around 100 Hz. Parallelism can be used to create an even faster and more rich user experience than the present. This is the future. The dual processor paradigm has been done before in the x286 - x386 era and it failed because data centers rejected such a dual processor paradigm in favor of slightly better hardware that was nearly good enough. This is the reality of the present too. Any hardware that is good enough to do both workloads will be adopted by data centers and therefore the market. This is where the real design edge is made and all consumer products are derived.
None of Nvidia’s products are relevant 8 years from now. They are a temporary hack. This is why they must use their enormous capital to buy a new future beyond the GPU, and they will.