Are you saying that you’re running an Nvidia/AMD multi-GPU system, and they can work together during inference? So, your LLM-relevant VRAM is 10gb(3080)+20gb(7900xt)?
Are you saying that you’re running an Nvidia/AMD multi-GPU system, and they can work together during inference? So, your LLM-relevant VRAM is 10gb(3080)+20gb(7900xt)?
Wow, I had no idea! Nor did I know that Vulkan performs so well. I’ll have to read more, because this could really simplify my planned build.
Count me as someone who would be interested in a post!