

I am not sure, I have tried to avoid this whole situation in the last few years :-) IIRC it can have its own CUDA version, but double check that.


I am not sure, I have tried to avoid this whole situation in the last few years :-) IIRC it can have its own CUDA version, but double check that.


The CUDA version is what matters the most (assuming you are on NVidia). Later CUDA versions have optimizations that earlier don’t, this may in turn dictate the actual driver version you can use.
I guess some models will simply deactivate some optimizations if you don’t have an appropriate version, though I mostly am aware of them failing in that case :-/
If you compare a model running on CUDA 11 vs a model running on CUDA 12, people may point out that it could be unfair, though this is generally nitpicky.
If you are worried about your perfs not being optimal, look in the log for messages like “<fast attention scheme XYZ> was deactivated because <cudaSuperOptimizedMegaSparseMatMult> was not available”
Found it not very compliant, asked it about french politics and it fared worse than other 70B models I tested through openrouter.
Mistral 24B gave better answers.
I like the group that is behind though. EPFL and ETH are some of the best research institutes in Europe. Catching up with big corporate models is hard, and it is nice they are giving it a shot. Don’t expect them to be there yet on the first try though.