For some months now I’ve been working on some patches to Consfigurator to add support for Linux containers. My goal is to make Consfigurator capable of both performing the initial setup of a container and of entering the running container to apply configuration. For the case of unprivileged LXCs running as non-root, my work-in-progress branch can now do both of these things. As Consfigurator enters the container directly using system calls, it should be decently fast at configuring multiple containers on a host, and it will also be possible to have it do this in parallel. The initial setup for the container uses Consfigurator’s existing support for building root filesystems, and it should be easy to extend that to support arbitrary GNU/Linux distributions by teaching Consfigurator how to invoke bootstrapping tools other than debootstrap(8).

Here’s an example:

(defhost lxc1.silentflame.com ()
  (os:debian-stable "bullseye" :amd64)
  (basic-props)
  (apt:installed "systemd" "netcat")
  (apache:https-vhost ...))

(defhost lxctest.laptop.silentflame.com ()
  (os:debian-stable "bullseye" :amd64)
  (apt:proxy "http://192.168.122.1:3142")
  (basic-props)
  (apt:installed "linux-image-amd64" "lxc")

  (lxc:usernet-usable-by "spwhitton" "lxcbr0")
  (lxc:user-containers-autostart "spwhitton")
  (lxc:user-container-for '(:additional-lines
                            ("lxc.net.0.type = veth"
                             "lxc.net.0.flags = up"
                             "lxc.net.0.link = lxcbr0"
                             ...))
                          "spwhitton"
                          lxc1.silentflame.com))

(defhost laptop.silentflame.com ()
  ...
  (libvirt:kvm-boots-chroot-for '(:always-deploys t)
                                lxctest.laptop.silentflame.com))

This code is a simplified definition of my testing setup for this work. It defines three hosts: a container lxc1, a container host lxctest, and my laptop. When Consfigurator is asked to deploy the laptop, it will set up the root filesystem for lxctest and then boot it as a KVM virtual machine. Preparing that root filesystem will include setting up the root filesystem for lxc1, too, including shifting the ownership and ACLs to match the user namespace LXC will use when booting the container. Thus, once the deployment of the laptop is finished, it will be possible to boot the lxctest VM, connect to it as the user spwhitton, and start lxc1.

Consfigurator includes only minimal support for setting up container networking, as there are so many different ways in which you might want to do it. In my own consfig I’ve been developing properties to connect containers directly to my tinc VPN. A single tinc daemon runs on the container host, and other tinc daemons route a whole subnet, containing the addresses for each of the containers, to the container host’s tinc daemon. As the LXCs Consfigurator sets up run as non-root, some sort of setuid facility is required to configure this networking. Consfigurator’s ability to dump executable Lisp images is helping here. I define a function which runs as root to set up the networking:

(defun route-athenet-container-veth (host)
  (let ((user (getenv "USERV_USER"))
        (peer (getenv "USERV_U_PEER"))
        (ip (car (uiop:command-line-arguments))))
    (unless (string-prefix-p (format nil "veth~D_" (getenv "USERV_UID")) peer)
      (error "~A does not belong to requester." peer))
    (unless (member (cons user ip) (get-hostattrs 'veth-ips host) :test #'equal)
      (error "~A does not have permission to route ~A." user ip))
    (flet ((r (&rest args)
             ;; Explicitly passing nil means UIOP will not invoke a shell.
             (run-program args :force-shell nil)))
      (eswitch ((getenv "USERV_U_HOOK_TYPE") :test #'string=)
        ("up"
         (apply #'r
                "sysctl" "-w"
                "net.ipv6.conf.all.forwarding=1"
                ...)
         (r "ip" "addr" "flush" "dev" peer "scope" "link")
         (r "ip" "-6" "addr" "add" "fe80::1/64" "dev" peer)
         (r "ip" "-6" "route" "add" (strcat ip "/128") "dev" peer)
         ...)
        ("down"
         ...
         (r "ip" "-6" "route" "del" (strcat ip "/128") "dev" peer))))))

and then apply the following property to lxctest to dump an image which will call this function and then exit:

(image-dumped
 "/usr/lib/userv/route-athenet-container-veth"
 `(route-athenet-container-veth ,(intern (string-upcase (get-hostname)))))

I’m using GNU userv to enable ordinary users to run this image as root, so there there’s a small script which converts LXC’s LXC_HOOK_* environment variables into appropriate command line arguments to userv(1) such that the function above is able to access that information from its environment (the USERV_U_* variables above). You could just as easily do this with sudo, by giving permission for the relevant LXC_HOOK_* environment variables to survive the switch to root.

What’s particularly nice about this is that there’s no need to write any code to keep a config file updated, specifying which users are allowed to route which IPs to their containers. ROUTE-ATHENET-CONTAINER-VETH receives a HOST value for the container host and can just look at the metadata established by properties for particular containers. Each time this metadata is updated and lxctest is deployed, a fresh image is dumped containing the updated metadata.

This work has provided opportunities to make various other improvements to Consfigurator, especially with regard to dumping and reinvoking images. Making SBCL capable of entering user namespaces required a change upstream, which made it into the recent SBCL 2.1.8 release. I’m very grateful to the SBCL developers for their engagement with my project. I’ve been able to add a workaround so that Consfigurator can still enter user namespaces when run on the version of SBCL included in Debian stable. I also discovered that deploying all of my laptop, lxctest and lxc1 at once generates enough output to fill up a pipe, thus revealing a deadlock in Consfigurator’s IPC, which it was good to become aware of and fix. That involved writing my first multi-threaded Lisp, as there are two pipes that need to be kept from filling up, and to my surprise it worked first time. Take that Haskell :)