Enabling containers to access the GPU on macOS
Being able to run AI models locally is one of the coolest tech things 2023 brought to the table. But given the extremely heterogeneous software stack for AI accelerators, doing that in an efficient way on your own laptop is not always as easy as it should be.
For instance, if you’re running macOS on an Apple Silicon machine, you can easily build llama.cpp with its Metal backend and offload the inference work to the M-based GPU. But can you do that from a container?
A Virtualization Game
Running Linux containers on macOS implies the use of some for of Virtualization and/or Emulation. That is, software tools such as Podman Desktop, while hiding it to provide a better UX, are actually running the containers inside a Linux Virtual Machine.
This means that if we want containers to be able to offload AI workloads to the host’s GPU, we need to introduce some mechanism in the Virtual Machine that would allow the guest to communicate with this device without losing access to it from the host (device passthrough is off the table).
Luckily, virtio-gpu already provides the two main mechanisms we need to build a transport for the GPU between the guest and the host: shared memory and a communication channel. Additionally, we can leverage on Venus to serialize Vulkan commands and MoltenVK to translate Vulkan shaders to MSL (Metal Shading Language).
So it looks like we have all the pieces we need, it should be just a matter of assembling them, shouldn’t it? Well, as you can imagine, things are rarely so simple.
libkrun-efi, an Open Source alternative to Virtualization.framework
The Virtualization stack on macOS is composed by two software packages: Hypervisor.framework and Virtualization.framework. The first is a low-level interface to interact with the kernel’s virtualization facilities (akin to KVM on Linux), while the second is roughly a Virtual Machine Monitor implementation (equivalent to libkrun or QEMU).
Virtualization.framework is a nice library that covers many virtualization use cases but, being closed-source, it can’t be extended in any significant way. That means you can’t implement new emulated devices to be exposed to guests, nor alter the functionality of the already implemented.
Here’s where libkrun enters the game. This library provides a modern, Rust-based Virtual Machine Monitor, that links directly against Hypervisor.framework to create Virtual Machines on macOS. Being fully Open Source (Apache 2.0), we can extend it as much as we need.
So far, libkrun was focused on the microVM use case (as provided by krunvm and podman’s crun-krun), but here we need to run a full VM with it’s own kernel and initrd, and a relatively large set of devices.
To cover this use case, we decided to introduce a new flavor of the library, libkrun-efi, which features a functionality level that’s very close to what Virtualization.framework provides. And, on top of that, we added the Venus-capable virtio-gpu device we needed, leading to this:
|
|
Integrating libkrun-efi with Podman
Podman recently gained the ability to use Virtualization.framework on macOS for managing the Linux VM where containers are spawned, and it does that through vfkit, a command-line interface with the latter.
To simplify testing and ensure an easy transition path for Podman, Tyler Fanelli has created krunkit, which mimics vfkit operation to the point of being able to act as a drop-in replacement for it, but linking against libkrun-efi instead of Virtualization.framework.
Trying out libkrun-efi and krunkit with Podman
Install or upgrade podman version 4.3.x from brew:
|
|
or
|
|
Configure podman-machine to use applehv
In case you haven’t already, you’ll need to configure podman-machine to use
applehv as the machine engine. You can do this by editing
~/.config/containers/containers.conf
and setting provider = “applehv”
in the
[machine]
section:
|
|
If you don’t want to make the change persistent, you can export the
CONTAINERS_MACHINE_PROVIDER=applehv
environment variable instead:
|
|
If this is your first time using the applehv machine provider, you’ll also need to initialize the machine. If you were already using applehv with vfkit, you can skip this step, as the disk image is fully compatible with krunkit.
|
|
Install libkrun-efi and libkrun:
|
|
Replace vfkit with krunkit (this step will be eventually dropped)
|
|
Verify it’s working properly
The easiest way to ensure you’re using krunkit instead of vfkit, it’s starting the VM using podman-machine and verifying the GPU render nodes are present:
|
|
Running an AI workload offloaded to the GPU
To be able to easily test the ability to offload AI workloads to the GPU, we prepared a container image bundling llama.cpp (built with its Vulkan backend) with a patched mesa package (needs a couple of minor fixes that aren’t yet upstream).
First, you need to download a compatible model. For testing, I recommend using the Mistral-7B-Instruct model. If you want to use a different model, please make sure it doesn’t require more than 8GB of GPU RAM (this is a temporary hard limitation).
Assuming the model has been downloaded to ~/Downloads, you can run it this way (make sure you have enough free RAM in you machine to load the model):
|
|
Preparing a custom AI workload
If you want to prepare a custom AI workload, you can either use the quay.io/slopezpa/fedora-vgpu image as a base, or use a regular fedora:39 image combined with the patched mesa package you can find in this COPR repo.
Final remarks
If you happen to find any bugs while testing this, please report them to either the libkrun or krunkit issue trackers. Pull requests are of course more than welcome. Over the next months we’ll also be working on improving further the integration with Podman and upstreaming all the missing bits.
Our goal is clear, provide a modern, extensible open-source alternative to Virtualization.framework. If this is something that entices you, come join us!