Every day, I’m glad I dumped Windows on all of my personal machines for Linux.
I would rather fight a wonky config file for 4 hours than run Windows.
If it’s learning based on screenshots, it can only learn to play really slow games. FPS games would require video.
You’re massively underestimating the power or big data. Think about a dataset of millions of screenshot sequences.
You could have said something very similar about LLMs learning semantic meaning while being training on basically random garbage text from the internet
Also it wouldn’t need to use actual video it could just use the buttons, check mouse inputs, and see what app you’re in. An occasional screencap could be used, but if you can be listened to with a high DPI mouse, you can have this work too.



