Being able to run AI models locally is one of the coolest tech things 2023 brought to the table. But given the extremely heterogeneous software stack for AI accelerators, doing that in an efficient way on your own laptop is not always as easy as it should be.
For instance, if you’re running macOS on an Apple Silicon machine, you can easily build llama.cpp with its Metal backend and offload the inference work to the M-based GPU.
It’s been almost a year since I transitioned from the Virtualization to the Automotive team at Red Hat with the goal of ensuring RHIVOS ships with a powerful Virtualization stack. While there’s a large overlap between a Virtualization stack for Servers and the one for Automotive platforms, the latter is much more demanding on one particular aspect: GPU acceleration.
For me, personally, that meant having to delve into the Linux graphics stack, both kernel (DRM, GEM, KMS…) and userspace (Mesa, virglrenderer…), something in which, so far, I only had a superficial knowledge.