There’s this interesting idea of adding support for running Wasm/WASI
payloads in libkrun, which is
something we could easily achieve by simply embedding a Wasm runtime,
statically built for Linux, into initrd
.
Now, the problem with this approach is that, despite having a
payload (the Wasm runtime) with a well-known behavior, we would
still be using a (built with a minimal config, but otherwise complete)
Linux kernel, despite only needed a small amount of its
functionality. In other words, the workload’s
TCB would not
be optimal.
But, what if the Wasm runtime was also the kernel?
Wait, do you really need Virtulization for running a Wasm workload?
Yes and no. In most cases, no. The isolation provided by the Wasm
runtime, combined with container isolation (namespaces, cgroups,
selinux…) for the runtime itself, provides an excellent degree of
security.
But there’s an scenario where Virtualization is not optional, and
that’s when you want to protect the workload with Confidential
Computing technologies such as SEV
or
TDX,
as both of them are built on top of the existing Virtualization
capabilities provided by the hardware.
Choosing a Unikernel + WASM Runtime combo
The first idea that came to my mind was to use
RustyHermit, which is
supported as a target by the Rust toolchain, to build a Rust-based
runtime, such as Wasmer or
Wasmtime.
After giving it a quick try, I’ve noticed that, with both runtimes,
there are a number of dependencies that include platform-dependent
code that would need to be ported to RustyHermit
. Since I didn’t
have much time to invest in this experiment, I decided to look for a
simpler solution.
And the simplest one would’ve probably been using
OSv. OSv
is able to run unmodified, dynamically
linked Linux binaries, by playing this cool trick in which their
linker resolves the symbols of some well-known libraries to
kernel-provided functions, so it still behaves like a unikernel.
While this one was tempting, the main goal of this experiment was to
find out how small the TCB can get for this use case, so OSv
’s approach
wasn’t really a good fit, since you still have a potentially larger
kernel than what you really need.
Finally, I came across Unikraft, which does
really behave like a library
OS, which is
exactly what I was looking for in the context of this experiment. It’s
also well documented, and the project looks pretty alive in
GitHub.
Now we just need a Wasm runtime written in C to pair it up with
Unikraft
.
The low-hanging fruit
There’s certainly quite a number of Wasm runtimes out
there, but the
one that first caught my attention was
Wasm3, as it’s small, simple, and
written in C. The build process is so simple that they even provide a
one-liner to build it manually with gcc
. Identifying which source
files and headers need to get built wasn’t going to be a problem.
Said and done, in a very small amount of time I got it running a
simple Wasm hello world
program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
[ 0.000000] Info: [libkvmplat] <setup.c @ 472> Entering from KVM (x86)...
[ 0.000000] Info: [libkvmplat] <setup.c @ 473> multiboot: 0
[ 0.000000] Info: [libkvmplat] <setup.c @ 407> HEAP area @ 400000000 - 63b56a000 (9585467392 bytes)
[ 0.000000] Info: [libkvmplat] <setup.c @ 499> initrd: 0x1b1000
[ 0.000000] Info: [libkvmplat] <setup.c @ 501> heap start: 0x400000000
[ 0.000000] Info: [libkvmplat] <setup.c @ 505> stack top: 0x63b56a000
[ 0.000000] Info: [libkvmplat] <setup.c @ 532> Switch from bootstrap stack to stack @0x63b57a000
[ 0.000000] Info: [libukboot] <boot.c @ 199> Unikraft constructor table at 0x167000 - 0x167028
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: netdev, 72
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: vfs, 96
[ 0.000000] Info: [libuklibparam] <param.c @ 594> No library arguments found
[ 0.000000] Info: [libukboot] <boot.c @ 213> Found 0 library args
[ 0.000000] Info: [libukboot] <boot.c @ 221> Initialize memory allocator...
[ 0.000000] Info: [libukallocregion] <region.c @ 202> Initialize allocregion allocator @ 0x400000000,2
[ 0.000000] Info: [libukboot] <boot.c @ 264> Initialize IRQ subsystem...
[ 0.000000] Info: [libukboot] <boot.c @ 271> Initialize platform time...
[ 0.000000] Info: [libkvmplat] <tscclock.c @ 253> Calibrating TSC clock against i8254 timer
[ 0.100001] Info: [libkvmplat] <tscclock.c @ 274> Clock source: TSC, frequency estimate is 2592089840z
[ 0.104876] Info: [libukschedcoop] <schedcoop.c @ 232> Initializing cooperative scheduler
[ 0.111274] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000000d0 created without libpthr.
[ 0.123340] Info: [libuksched] <thread.c @ 180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[ 0.131926] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000203d8 created without libpthr.
[ 0.143563] Info: [libuksched] <thread.c @ 180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[ 0.150728] Info: [libukboot] <boot.c @ 95> Init Table @ 0x167028 - 0x167058
[ 0.157502] Info: [libukswrand] <swrand.c @ 86> Initialize random number generator...
[ 0.164264] Info: [libukbus] <bus.c @ 134> Initialize bus handlers...
[ 0.168237] Info: [libukbus] <bus.c @ 136> Probe buses...
[ 0.171996] Info: [liblwip] <init.c @ 152> Initializing lwip
[ 0.175937] Info: [libuksched] <thread.c @ 180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[ 0.193270] Info: [libvfscore] <rootfs.c @ 98> Mount ramfs to /...
[ 0.201977] Info: [libvfscore] <mount.c @ 122> VFS: mounting ramfs at /
[ 0.208725] Info: [libvfscore] <rootfs.c @ 106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[ 0.217546] Info: [libukcpio] <cpio.c @ 233> Extracting /main.aot (136428 bytes)
Powered by
o. .o _ _ __ _
Oo Oo ___ (_) | __ __ __ _ ' _) :_
oO oO ' _ `| | |/ / _)' _` | |_| _)
oOo oOO| | | | | (| | | (_) | _) :_
OoOoO ._, ._:_:_,\_._, .__,_:_, \___)
Phoebe 0.10.0~9bf6e63-custom
[ 0.245898] Info: [libukboot] <boot.c @ 125> Pre-init table at 0x1763f0 - 0x1763f0
[ 0.251770] Info: [libukboot] <boot.c @ 136> Constructor table at 0x1763f0 - 0x1763f0
[ 0.257999] Info: [libukboot] <boot.c @ 146> Calling main(2, ['build/wasm3_kvm-x86_64', 'main.wasm'])
[ 0.264377] Warn: [libukmmap] <mmap.c @ 196> __uk_syscall_r_mprotect() stubbed
[ 0.270256] Warn: [libukmmap] <mmap.c @ 190> __uk_syscall_r_madvise() stubbed
Hello, Unikraft + Wasm3!
[ 0.278786] Info: [libukboot] <boot.c @ 155> main returned 0, halting system
[ 0.294673] Info: [libkvmplat] <shutdown.c @ 35> Unikraft halted
|
Gotta go fast!
Wasm3
is neat piece of software and a good starting point, but it’s
just an interpreter. It doesn’t have a
JIT, nor the
ability to run
AOT
code. So I started looking for some other option.
Soon I came across
WAMR, which
is also small, written in C, and pretty portable. It’s build process
is a bit more complex, but looking at a regular build log generated on
Linux I was able to figure out which source files, headers and flags I
needed for building it with Unikraft
.
Another interesting aspect of WAMR
is that it provides an AOT
compiler, based on LLVM
, to compile the Wasm payload into native
code. And you can also tune the runtime build process to include just
the code to load and run AOT
code, leaving out the interpreter and the
JIT
, leading to a pretty small TCB
.
After a bit of tweaking, I was able to build the Unikraft + WAMR
bundle, an run it with the Wasm hello world
program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
[ 0.000000] Info: [libkvmplat] <setup.c @ 472> Entering from KVM (x86)...
[ 0.000000] Info: [libkvmplat] <setup.c @ 473> multiboot: 0
[ 0.000000] Info: [libkvmplat] <setup.c @ 407> HEAP area @ 400000000 - 43f5dc000 (1063108608 bytes)
[ 0.000000] Info: [libkvmplat] <setup.c @ 499> initrd: 0x1b1000
[ 0.000000] Info: [libkvmplat] <setup.c @ 501> heap start: 0x400000000
[ 0.000000] Info: [libkvmplat] <setup.c @ 505> stack top: 0x43f5dc000
[ 0.000000] Info: [libkvmplat] <setup.c @ 532> Switch from bootstrap stack to stack @0x43f5ec000
[ 0.000000] Info: [libukboot] <boot.c @ 199> Unikraft constructor table at 0x167000 - 0x167028
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: netdev, 72
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: vfs, 96
[ 0.000000] Info: [libuklibparam] <param.c @ 594> No library arguments found
[ 0.000000] Info: [libukboot] <boot.c @ 213> Found 0 library args
[ 0.000000] Info: [libukboot] <boot.c @ 221> Initialize memory allocator...
[ 0.000000] Info: [libukallocregion] <region.c @ 202> Initialize allocregion allocator @ 0x400000000,8
[ 0.000000] Info: [libukboot] <boot.c @ 264> Initialize IRQ subsystem...
[ 0.000000] Info: [libukboot] <boot.c @ 271> Initialize platform time...
[ 0.000000] Info: [libkvmplat] <tscclock.c @ 253> Calibrating TSC clock against i8254 timer
[ 0.100001] Info: [libkvmplat] <tscclock.c @ 274> Clock source: TSC, frequency estimate is 2592141140z
[ 0.106867] Info: [libukschedcoop] <schedcoop.c @ 232> Initializing cooperative scheduler
[ 0.114889] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000000d0 created without libpthr.
[ 0.125951] Info: [libuksched] <thread.c @ 180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[ 0.132773] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000203d8 created without libpthr.
[ 0.142529] Info: [libuksched] <thread.c @ 180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[ 0.148912] Info: [libukboot] <boot.c @ 95> Init Table @ 0x167028 - 0x167058
[ 0.154532] Info: [libukswrand] <swrand.c @ 86> Initialize random number generator...
[ 0.160847] Info: [libukbus] <bus.c @ 134> Initialize bus handlers...
[ 0.164679] Info: [libukbus] <bus.c @ 136> Probe buses...
[ 0.168079] Info: [liblwip] <init.c @ 152> Initializing lwip
[ 0.171448] Info: [libuksched] <thread.c @ 180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[ 0.178030] Info: [libvfscore] <rootfs.c @ 98> Mount ramfs to /...
[ 0.181563] Info: [libvfscore] <mount.c @ 122> VFS: mounting ramfs at /
[ 0.192104] Info: [libvfscore] <rootfs.c @ 106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[ 0.204603] Info: [libukcpio] <cpio.c @ 233> Extracting /main.aot (136428 bytes)
Powered by
o. .o _ _ __ _
Oo Oo ___ (_) | __ __ __ _ ' _) :_
oO oO ' _ `| | |/ / _)' _` | |_| _)
oOo oOO| | | | | (| | | (_) | _) :_
OoOoO ._, ._:_:_,\_._, .__,_:_, \___)
Phoebe 0.10.0~9bf6e63-custom
[ 0.233750] Info: [libukboot] <boot.c @ 125> Pre-init table at 0x1763f0 - 0x1763f0
[ 0.239519] Info: [libukboot] <boot.c @ 136> Constructor table at 0x1763f0 - 0x1763f0
[ 0.245658] Info: [libukboot] <boot.c @ 146> Calling main(2, ['build/wamr_kvm-x86_64', 'main.aot'])
[ 0.252124] Warn: [libukmmap] <mmap.c @ 196> __uk_syscall_r_mprotect() stubbed
AOT module instantiate failed: mmap memory failed
[ 0.261049] Info: [libukboot] <boot.c @ 155> main returned 0, halting system
[ 0.266754] Info: [libkvmplat] <shutdown.c @ 35> Unikraft halted
|
… but it failed with AOT module instantiate failed: mmap memory failed
. What’s happening here?
Tracking down the error message in WAMR
’s source code we get to this
section from aot_runtime.c
:
1
2
3
4
5
6
7
8
9
10
|
/* Totally 8G is mapped, the opcode load/store address range is 0 to 8G:
* ea = i + memarg.offset
* both i and memarg.offset are u32 in range 0 to 4G
* so the range of ea is 0 to 8G
*/
if (!(p = mapped_mem =
os_mmap(NULL, map_size, MMAP_PROT_NONE, MMAP_MAP_NONE))) {
set_error_buf(error_buf, error_buf_size, "mmap memory failed");
return NULL;
}
|
So WAMR
needs to mmap()
a 8GB chunk of anonymous memory, but
Unikraft
does not yet support on-demand paging (though it looks like,
after merging PR#338
they’re pretty close to having it). So it seems like we’ve hit a wall,
don’t we?
Well, while Unikraft
does not have on-demand paging, our host system
does! Which means we can simply create a VM machine with more than 8GB
of RAM and comment out the memset
in ukmmap/mmap.c
to avoid
touching every page from that region in advance. (NOTE: this hack
wouldn’t work in a SEV/TDX
TEE,
since the VM’s memory is pre-allocated and pinned in those cases, I’m
just using it to be able to continue with the experiment).
And, after doing so, it works:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
|
[ 0.000000] Info: [libkvmplat] <setup.c @ 472> Entering from KVM (x86)...
[ 0.000000] Info: [libkvmplat] <setup.c @ 473> multiboot: 0
[ 0.000000] Info: [libkvmplat] <setup.c @ 407> HEAP area @ 400000000 - 63b56a000 (9585467392 bytes)
[ 0.000000] Info: [libkvmplat] <setup.c @ 499> initrd: 0x1b1000
[ 0.000000] Info: [libkvmplat] <setup.c @ 501> heap start: 0x400000000
[ 0.000000] Info: [libkvmplat] <setup.c @ 505> stack top: 0x63b56a000
[ 0.000000] Info: [libkvmplat] <setup.c @ 532> Switch from bootstrap stack to stack @0x63b57a000
[ 0.000000] Info: [libukboot] <boot.c @ 199> Unikraft constructor table at 0x167000 - 0x167028
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: netdev, 72
[ 0.000000] Info: [libuklibparam] <param.c @ 113> libname: vfs, 96
[ 0.000000] Info: [libuklibparam] <param.c @ 594> No library arguments found
[ 0.000000] Info: [libukboot] <boot.c @ 213> Found 0 library args
[ 0.000000] Info: [libukboot] <boot.c @ 221> Initialize memory allocator...
[ 0.000000] Info: [libukallocregion] <region.c @ 202> Initialize allocregion allocator @ 0x400000000,2
[ 0.000000] Info: [libukboot] <boot.c @ 264> Initialize IRQ subsystem...
[ 0.000000] Info: [libukboot] <boot.c @ 271> Initialize platform time...
[ 0.000000] Info: [libkvmplat] <tscclock.c @ 253> Calibrating TSC clock against i8254 timer
[ 0.100001] Info: [libkvmplat] <tscclock.c @ 274> Clock source: TSC, frequency estimate is 2592107600z
[ 0.105883] Info: [libukschedcoop] <schedcoop.c @ 232> Initializing cooperative scheduler
[ 0.113099] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000000d0 created without libpthr.
[ 0.123215] Info: [libuksched] <thread.c @ 180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[ 0.130177] Warn: [libpthread-embedded] <pte_osal.c @ 215> Thread 0x4000203d8 created without libpthr.
[ 0.140264] Info: [libuksched] <thread.c @ 180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[ 0.146943] Info: [libukboot] <boot.c @ 95> Init Table @ 0x167028 - 0x167058
[ 0.152705] Info: [libukswrand] <swrand.c @ 86> Initialize random number generator...
[ 0.158711] Info: [libukbus] <bus.c @ 134> Initialize bus handlers...
[ 0.162330] Info: [libukbus] <bus.c @ 136> Probe buses...
[ 0.165668] Info: [liblwip] <init.c @ 152> Initializing lwip
[ 0.169172] Info: [libuksched] <thread.c @ 180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[ 0.180469] Info: [libvfscore] <rootfs.c @ 98> Mount ramfs to /...
[ 0.189926] Info: [libvfscore] <mount.c @ 122> VFS: mounting ramfs at /
[ 0.196627] Info: [libvfscore] <rootfs.c @ 106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[ 0.205720] Info: [libukcpio] <cpio.c @ 233> Extracting /main.aot (136428 bytes)
Powered by
o. .o _ _ __ _
Oo Oo ___ (_) | __ __ __ _ ' _) :_
oO oO ' _ `| | |/ / _)' _` | |_| _)
oOo oOO| | | | | (| | | (_) | _) :_
OoOoO ._, ._:_:_,\_._, .__,_:_, \___)
Phoebe 0.10.0~9bf6e63-custom
[ 0.232435] Info: [libukboot] <boot.c @ 125> Pre-init table at 0x1763f0 - 0x1763f0
[ 0.238147] Info: [libukboot] <boot.c @ 136> Constructor table at 0x1763f0 - 0x1763f0
[ 0.243980] Info: [libukboot] <boot.c @ 146> Calling main(2, ['build/wamr_kvm-x86_64', 'main.aot'])
[ 0.250338] Warn: [libukmmap] <mmap.c @ 196> __uk_syscall_r_mprotect() stubbed
[ 0.256167] Warn: [libukmmap] <mmap.c @ 190> __uk_syscall_r_madvise() stubbed
Hello, Unikraft + WAMR!
[ 0.264487] Info: [libukboot] <boot.c @ 155> main returned 0, halting system
[ 0.270064] Info: [libkvmplat] <shutdown.c @ 35> Unikraft halted
|
Now give me the numbers
With the option Drop unused functions and data
enabled in
Unikraft
’s config, the size of the stripped binary for the WAMR
unikernel is 642K
:
1
2
|
[slopezpa@toolbox wamr]$ ls -l build/wamr_kvm-x86_64
-rwxr-xr-x. 1 slopezpa slopezpa 656856 Sep 27 17:05 build/wamr_kvm-x86_64
|
That’s kind of nice, but can be better. Right now, I’m building WAMR
with Unikraft
using the POSIX
compatibility layer, which means
including a number of external libraries (newlib
,
pthread-embedded
, lwip
) into the build. If, instead, we ported
WAMR
to support Unikraft
’s libraries directly, we would significantly
reduce the size of the unikernel (and, with it, the TCB
).
Now, let’s take a look at the memory consumption of our unikernel
while running the example Wasm payload:
1
2
|
[slopezpa@mhamilton libkrunfw.wamr]$ ps -axuww |grep chroot_vm
slopezpa 71854 0.9 0.0 9517320 13828 pts/8 Sl+ 17:04 0:00 ./chroot_vm
|
That’s less than 14MB
of RSS
, and that’s including the VMM
(libkrun) internal structures, the guest’s memory usage, and without
discounting shared pages. Not bad, I guess… ;-)
Where to go from there
I think from this experiment we can conclude that it is, indeed,
feasible to build a Wasm runtime in unikernel form factor in a
reasonable amount of time, and that would come with significant
benefits in TCB
reduction and, perhaps, in performance (yet to be
tested).
Some things I’d like to do next (if I manage to find the time):
-
Clean up both Wasm3
and WAMR
build repositories and see if it’s
worth getting them upstream (there’s already a port for WAMR
,
albeit quite old, so in this case it’d be just a PR updating it).
-
Evaluate the best way to integrate Unikraft
-based unikernels into
libkrunfw
. Right now, Unikraft
only supports the multiboot
specification, while libkrun
only provides a Linux Zero Page
. For this experiment I’ve hacked the needed values manually into
Unikraft’s setup.c
, but of course we need a more reasonable
solution.
-
Implement support for libkrun
’s TSI
(Transparent Socket
Impersonation) into Unikraft
. This would mean implementing support
for virtio vsock
first, and then writing a library to support socket
semantics using TSI+vsock
. I think TSI
is a very good fit for this
use case, as it would give us network support with a very little
amount of code (we won’t even need a TCP/IP stack!) and good
performance.
Sounds like fun! ;-)