Messing with your initramfs - Hare edition

This is part one of a series exploring the initramfs and how to use Hare to write code running in it.

When Linux boots, it usually uses a small system image called “initramfs” (or sometimes “initrd”) as a helper. The initramfs is quite special. It has to set up everything your real system takes for granted - the file system, devices, etc. - from absolute scratch. The initramfs is usually provided by the distribution. In fact, it usually gets generated (e.g. after kernel updates) by the distribution-specific initramfs generator. There are also some distribution-agnostic initramfs generators (e.g. dracut or booster).

Most initramfs images are pretty much small embedded Linux system. A lot of them are built on busybox, some even on systemd. As long as the system setup is performed properly, you can in fact run just about anything in an initramfs. Most initramfs generators provide “hooks” or “modules” that you can use to add custom stuff to your initramfs. Those will usually run stuff after most of the system setup has happened. The exceptions are those that serve a narrowly defined purpose, such as decrypting the root file system. But the real fun is of course: how to get to that point in the first place?

Hare, being a systems programming language, should make for a good tool to handle this. Its statically linked binaries do not need any dependencies and it offers access to many low-level facilities, like system calls. So lets take a closer look at an initramfs, how we can get our own files in there, and if we can get some Hare code to execute in there!

Initramfs basics

I am using Arch Linux with GRUB for demonstration purposes, but most other distros and/or boot loaders should work equally well. You may of course have to slightly adapt some of the instructions.

Let’s start with some basics. When the system boots, the boot loader is run. It usually loads a kernel and an initramfs. The initramfs is essentially a small system image. On a running system, you should be able to see the initramfs file that was used to bring up the system by inspecting the kernel command line:

conrad@serotonin ~ $ cat /proc/cmdline | grep -oE 'initrd=[^ ]*'
initrd=\intel-ucode.img
initrd=\initramfs-linux.img

Note how there are actually two of those, we’ll get to that in a moment. You should be able to find the files referenced in the command line in the /boot directory. An initramfs file is a CPIO archive, but a compressed one. The compression type depends on your system, you can find out what it is with file:

conrad@serotonin ~ $ file /boot/initramfs-linux.img
/boot/initramfs-linux.img: Zstandard compressed data (v0.8+), Dictionary ID: None

Knowing this, we can examine the contents of the initramfs:

conrad@serotonin ~ $ zstdcat /boot/initramfs-linux.img | cpio --quiet -it | head
bin
buildconfig
config
consolefont.psf
dev/
etc/
etc/fstab
etc/initrd-release
etc/ld.so.cache
etc/ld.so.conf

Looks familiar? Yep, there’s a complete Linux system in the initramfs! So how does this work? The Kernel does a bare minimum of setup work, then executes /init in the initramfs (with PID 1). The expectation is that /init performs the remaining setup, including mounting the real root file system, “switches” to the new root, and executes /sbin/init in the new root (still with PID 1).

So how does this /init work? Well, we can simply inspect it:

conrad@serotonin ~ $ zstdcat /boot/initramfs-linux.img | cpio --quiet -i init
conrad@serotonin ~ $ head init
#!/usr/bin/ash

udevd_running=0
mount_handler=default_mount_handler
init=/sbin/init
rd_logmask=0

. /init_functions

mount_setup

A “simple” shell script! Note that this may potentially be different for you, depending on your distribution and configuration. But many are in fact based on busybox and utilize shell scripts, to make the setup easier.

One last important fact before we get to work: you may have heard that you can use the init= kernel command line parameter to override the executable to run as init process on the real root file system. But more interestingly, you can use the rdinit= parameter to override the executable run as init process in the initramfs. Armed with that knowledge, let’s have some fun!

NOTE: either try the following in a VM or in a non-permanent way, e.g. by pressing e in GRUB to modify the boot entry without making the changes permanent, so you don’t accidentally brick your system.

Modify your regular boot loader entry to add rdinit=/bin/sh to kernel command line and boot. You should be greeted with something like this:

Loading Linux linux ...
Loading initial ramdisk ...
/bin/sh: can't access tty; job control turned off
/ #

That’s it - you have a shell in your initramfs! The message about the tty should already tell us that something might be a bit unusual here, though. Let’s take a look around:

/ # ls
VERSION         hareinit        new_root        tmp
bin             hooks           proc            usr
buildconfig     init            root            var
config          init_functions  run
dev             lib             sbin
etc             lib64           sys
/ # ls /dev/
console
/ # ls /proc/
/ #

Look Ma, no devfs! As pointed out earlier, at this point the kernel has done the absolute bare minimum to get us going. Feel free to poke around. You’ll notice that many things don’t quite work as “expected” yet (try e.g. ps).

There is even a graceful way out of here. The shell is PID 1 (verify with echo $$). So all we need to do is continue with what was supposed to happen - execute /init. We just have to make sure to use exec, so that it also gets run as PID 1 and not in a sub-shell:

/ # exec /init
Starting version 251.4-1-arch
/dev/vda1: clean, 46901/262144 files, 328628/1048320 blocks

Arch Linux 5.19.11-arch1-1 (ttyS0)

initiator login:

Magic!

Put some Hare in it!

I think by now it is clear what needs to be done to write an initramfs in Hare. However, getting the system up and running is a lot of work. Maybe we can try something smaller for demonstration purposes? For sure. Remember the two initramfs that showed up in my kernel command line? Most boot loaders support passing multiple initramfs to the kernel. They will get merged into a single file system. So let’s try the bare minimum: sneak a Hare executable into the initramfs (by adding one with just one file in it), have the kernel execute it, and simply delegate the system setup to the existing /init by execing from Hare code.

A first try could look like this:

use fmt;
use os::exec;

export fn main() void = {
	fmt::println("Hello from the Hare initramfs!")!;
	let cmd = exec::cmd("/init")!;
	fmt::println("Bye from the Hare initramfs!")!;
	exec::exec(&cmd);
};

Build it and create a compressed CPIO archive:

conrad@serotonin ~/hareinit $ echo hareinit | cpio --quiet -H newc -o | zstd > hareinit.img

NOTE: hareinit is the name of the executable created by hare build, hareinit.img is our initramfs file. You can use whatever names you want, but modify the following instructions accordingly.

Copy the resulting hareinit.img to your boot partition. Modify your regular boot loader entry to add rdinit=/hareinit to kernel command line and boot. Also change the existing initrd directive to include the new image. In GRUB it should look somewhat like this:

initrd	initramfs-linux-fallback.img hareinit.img

Boot. Contrary to what you may have expected, it will not work:

Loading Linux linux ...
Loading initial ramdisk ...
Hello from the Hare initramfs!
Bye from the Hare initramfs!
/usr/bin/ash: can't open '/dev/fd/3': No such file or directory

Potentially accompanied by a little kernel panic. So what happened? Well, remember our little investigation in the shell earlier? There is no devfs at this point, so there is no /dev/fd/3. Hare’s os::exec::exec() is a high-level wrapper around the execveat() syscall. It does a few things for you that are in fact very convenient, but unfortunately not suited for this low-level use-case (it tries to pass the file to be executed as a file descriptor).

However, Hare’s strength lies in the fact that you are free to assume control over everything and simply do the low-level syscall yourself. It, too, is part of the standard library. We’ll have to do some of the setup ourselves, but that is a small price to pay. Here is a modified version:

use errors;
use fmt;
use os;
use os::exec;
use rt;
use strings;

export fn main() void = {
	fmt::println("Hello from the Hare initramfs!")!;

	let real_init = strings::to_c("/init");

	let args: []nullable *const char = alloc([real_init, null], 2);
	let argv = args: *[*]nullable *const char;

	let cur_env = os::getenvs();
	let env: []nullable *const char = alloc([], len(cur_env) + 1);
	for (let i = 0z; i < len(cur_env); i += 1) {
		append(env, strings::to_c(cur_env[i]));
	};
	append(env, null);
	let envp = env: *[*]nullable *const char;

	fmt::println("Bye from the Hare initramfs!")!;
	let err = errors::errno(rt::execveat(0, real_init, argv, envp, 0));
};

Compile, re-create the initramfs as described above, and reboot. Make sure your boot loader entry still (or again) includes the two initrd files and the kernel command line includes the rdinit=/hareinit parameter.

And at last, you should see:

Loading Linux linux ...
Loading initial ramdisk ...
Hello from the Hare initramfs!
Bye from the Hare initramfs!
Starting version 251.4-1-arch
/dev/vda1: clean, 46901/262144 files, 328630/1048320 blocks

Arch Linux 5.19.11-arch1-1 (ttyS0)

initiator login:

Wrapping it up

Let’s re-iterate what we have achieved here:

  • We use an approach based on two initramfs files to start playing with a custom initramfs without having to do all the work that is usually required by an initramfs.
  • We’ve written a small Hare program that is perfectly capable of being executed as an initramfs init process.
  • Even though some functionality may not work at the earliest stages of system setup, Hare provides the low-level primitives to work around this.

None of this should come as a surprised to anyone who has used Hare a bit. It is also worth noting that the problem with os::exec::exec() is a bit artificial and mostly due to the fact that the we leave system setup to “real” init process. One could for example just mount devfs, which would probably be enough to fix the issue. But it was also a good demonstration of how Hare can work at lower levels when required.

I hope this was a good introduction to hacking your initramfs without messing with your distribution’s generator. In the next installment, we will explore more of what the initramfs does, why it gets re-generated so often, and how we can apply the learnings to adding more functionality to our custom code.