Contents

Enabling containers to access the GPU on macOS

Being able to run AI models locally is one of the coolest tech things 2023 brought to the table. But given the extremely heterogeneous software stack for AI accelerators, doing that in an efficient way on your own laptop is not always as easy as it should be.

For instance, if you’re running macOS on an Apple Silicon machine, you can easily build llama.cpp with its Metal backend and offload the inference work to the M-based GPU. But can you do that from a container?


A Virtualization Game

Running Linux containers on macOS implies the use of some for of Virtualization and/or Emulation. That is, software tools such as Podman Desktop, while hiding it to provide a better UX, are actually running the containers inside a Linux Virtual Machine.

This means that if we want containers to be able to offload AI workloads to the host’s GPU, we need to introduce some mechanism in the Virtual Machine that would allow the guest to communicate with this device without losing access to it from the host (device passthrough is off the table).

Luckily, virtio-gpu already provides the two main mechanisms we need to build a transport for the GPU between the guest and the host: shared memory and a communication channel. Additionally, we can leverage on Venus to serialize Vulkan commands and MoltenVK to translate Vulkan shaders to MSL (Metal Shading Language).

So it looks like we have all the pieces we need, it should be just a matter of assembling them, shouldn’t it? Well, as you can imagine, things are rarely so simple.


libkrun-efi, an Open Source alternative to Virtualization.framework

The Virtualization stack on macOS is composed by two software packages: Hypervisor.framework and Virtualization.framework. The first is a low-level interface to interact with the kernel’s virtualization facilities (akin to KVM on Linux), while the second is roughly a Virtual Machine Monitor implementation (equivalent to libkrun or QEMU).

Virtualization.framework is a nice library that covers many virtualization use cases but, being closed-source, it can’t be extended in any significant way. That means you can’t implement new emulated devices to be exposed to guests, nor alter the functionality of the already implemented.

Here’s where libkrun enters the game. This library provides a modern, Rust-based Virtual Machine Monitor, that links directly against Hypervisor.framework to create Virtual Machines on macOS. Being fully Open Source (Apache 2.0), we can extend it as much as we need.

So far, libkrun was focused on the microVM use case (as provided by krunvm and podman’s crun-krun), but here we need to run a full VM with it’s own kernel and initrd, and a relatively large set of devices.

To cover this use case, we decided to introduce a new flavor of the library, libkrun-efi, which features a functionality level that’s very close to what Virtualization.framework provides. And, on top of that, we added the Venus-capable virtio-gpu device we needed, leading to this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ vulkaninfo --summary
==========
VULKANINFO
==========

(...)

Devices:
========
GPU0:
	apiVersion         = 1.2.0
	driverVersion      = 23.3.3
	vendorID           = 0x106b
	deviceID           = 0xe0203ef
	deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
	deviceName         = Virtio-GPU Venus (Apple M1)
	driverID           = DRIVER_ID_MESA_VENUS
	driverName         = venus
	driverInfo         = Mesa 23.3.3
	conformanceVersion = 1.3.0.0
	deviceUUID         = 7c236e60-e863-d8c5-870a-bfb16e1f882b
	driverUUID         = 725e9387-92e6-0ef4-e948-726a3b821ec1

Integrating libkrun-efi with Podman

Podman recently gained the ability to use Virtualization.framework on macOS for managing the Linux VM where containers are spawned, and it does that through vfkit, a command-line interface with the latter.

To simplify testing and ensure an easy transition path for Podman, Tyler Fanelli has created krunkit, which mimics vfkit operation to the point of being able to act as a drop-in replacement for it, but linking against libkrun-efi instead of Virtualization.framework.


Trying out libkrun-efi and krunkit with Podman

Install or upgrade podman version 4.3.x from brew:
1
$ brew install podman

or

1
2
3
$ brew upgrade podman
$ podman --version
podman version 4.9.3
Configure podman-machine to use applehv

In case you haven’t already, you’ll need to configure podman-machine to use applehv as the machine engine. You can do this by editing ~/.config/containers/containers.conf and setting provider = “applehv” in the [machine] section:

1
2
[machine]
  provider = "applehv"

If you don’t want to make the change persistent, you can export the CONTAINERS_MACHINE_PROVIDER=applehv environment variable instead:

1
$ export CONTAINERS_MACHINE_PROVIDER=applehv

If this is your first time using the applehv machine provider, you’ll also need to initialize the machine. If you were already using applehv with vfkit, you can skip this step, as the disk image is fully compatible with krunkit.

1
$ podman machine init
Install libkrun-efi and libkrun:
1
2
$ brew tap slp/krunkit
$ brew install krunkit
Replace vfkit with krunkit (this step will be eventually dropped)
1
2
3
$ cd /opt/homebrew/Cellar/podman/4.9.3/libexec/podman/
$ mv vfkit vfkit.bak
$ ln -s /opt/homebrew/bin/krunkit vfkit
Verify it’s working properly

The easiest way to ensure you’re using krunkit instead of vfkit, it’s starting the VM using podman-machine and verifying the GPU render nodes are present:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ podman machine start
Starting machine "podman-machine-default"
$ podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Fedora CoreOS 39.20240210.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

Last login: Tue Feb 27 00:10:53 2024 from 192.168.127.1
core@localhost:~$ ls /dev/dri
by-path  card0  renderD128
Running an AI workload offloaded to the GPU

To be able to easily test the ability to offload AI workloads to the GPU, we prepared a container image bundling llama.cpp (built with its Vulkan backend) with a patched mesa package (needs a couple of minor fixes that aren’t yet upstream).

First, you need to download a compatible model. For testing, I recommend using the Mistral-7B-Instruct model. If you want to use a different model, please make sure it doesn’t require more than 8GB of GPU RAM (this is a temporary hard limitation).

Assuming the model has been downloaded to ~/Downloads, you can run it this way (make sure you have enough free RAM in you machine to load the model):

1
2
$ podman run --rm -ti --device /dev/dri -v ~/Downloads:/models:Z quay.io/slopezpa/fedora-vgpu-llama
[root@b5f4915ee057 /]# main -m models/mistral-7b-instruct-v0.2.Q4_0.gguf -b 512 -ngl 99 -p "Tell me a story"
Preparing a custom AI workload

If you want to prepare a custom AI workload, you can either use the quay.io/slopezpa/fedora-vgpu image as a base, or use a regular fedora:39 image combined with the patched mesa package you can find in this COPR repo.


Final remarks

If you happen to find any bugs while testing this, please report them to either the libkrun or krunkit issue trackers. Pull requests are of course more than welcome. Over the next months we’ll also be working on improving further the integration with Podman and upstreaming all the missing bits.

Our goal is clear, provide a modern, extensible open-source alternative to Virtualization.framework. If this is something that entices you, come join us!