Setting up our Raspberry Pi Cluster at DockerCon 2019

As a company, one of InfoSiftr’s many areas of prowess is our skills around multi-architecture support and the images that are used in such an environment. In order to showcase this skill set (as well as showcase our partnership with Arm) at DockerCon in San Francisco, we used a Raspberry Pi cluster built on Kubernetes 1.14.1. Here we’ll talk about the steps we used to build the cluster, so you can recreate this demo on your own.

We started off by going back to Kasper Nissen’s excellent work documenting building a Kubernetes 1.11 cluster on Raspberry Pis (https://kubecloud.io/setting-up-a-kubernetes-1-11-raspberry-pi-cluster-using-kubeadm-952bbda329c8), and many people trying to build on Pis used this as a starting point. Unfortunately, we found that as Kubernetes and the various components have evolved over the past year, there are some issues with it today, as written. Pretty much all of those issues stemmed from Raspbian only having a 32-bit kernel (armv7l) and what we believe to be insufficient integration testing and end-to-end testing in the arm32 space by those creating the various components we were using. We’ll describe the issues we ran into near the end of this article.

For physical layout, we had a set of five Raspberry Pi 3 boards, each with a 16GB U1 Micro SDHC card, in an enclosure with a switch, and power. The enclosure also had one set of USB and HDMI connectors, which we previously needed for the initial installations, but not in this version. The uplink from the switch connected to our laptop via a USB hub with Ethernet to keep the ensemble as portable as possible.

Last time we documented a build like this, we commented on potentially choosing a different OS, and this time it became necessary. Due to the problems we encountered with Raspbian, we chose a classic Ubuntu 18.04 image for arm64/Pi 3B (https://wiki.ubuntu.com/ARM/RaspberryPi) as the OS we would use on the Pis.

Next, we needed to get the OS onto the boards. Burning the ISO to the SD card was simply a matter of correctly using dd. Our laptop conveniently had an SD card slot, which was /dev/sdb in our case, so all we had to do was the following, five times: umount /dev/sdb1; umount /dev/sdb2; time sudo dd bs=1M if=./ubuntu-18.04.2-preinstalled-server-arm64+raspi3.img of=/dev/sdb conv=fsync

Since the cards had previously been used for burning another OS image, there were two partitions to unmount, so watch out for that. Once the images were burned to the cards, we got ready to boot.

One key component of this setup was having some form of router involved to keep all of our networking self-contained (simulating an air-gapped environment). We opted for an Untangle VM running on the laptop over a physical router to minimize what we needed to bring to the show floor. This conveniently gave us a UI on the laptop to see the initial DHCP leases (we already knew which MAC was which Pi), so we could ssh over to each of the nodes using the default Ubuntu logins. Having this access from the laptop also allowed us to manage all 5 Pis at once, by using a terminal program with group broadcast capabilities (Terminator, in our case). We also used ssh-agent to load an ssh key so we could connect easily to the ubuntu and root users.

For the next part of the networking, we wanted all static IPs, in order to ensure we had no issues with the generated TLS certificates. So, the five nodes (master, and node1 – node4) were 172.23.0.100 – .104, the laptop itself was given an address on this inside network by Untangle’s DHCP (.151, that’ll be important later), and the Untangle VM on the laptop was the gateway at 172.23.0.1. Note that, if you’re used to earlier Ubuntu versions, in Ubuntu 18.04, the networking is a yaml file in /etc/netplan. The only other thing we added was some DNS for the nodes, the sum total of which was simply adding entries to /etc/hosts on each of the nodes.

With the IP addressing chosen, we booted all five Pis, and started the Kubernetes system install. We followed the Kubernetes docs instructions for installing kubeadm (https://kubernetes.io/docs/setup/independent/install-kubeadm/) on all five nodes, then got ready for the actual cluster installation. We started with a basic ‘sudo kubeadm init –pod-network-cidr=10.244.0.0/16’, followed by ‘sudo kubeadm join [token information]’ on all the worker nodes.

Once this is setup, we have a running cluster, however, it’s not really usable due to missing network functionality. In Kubernetes 1.14, the networking plugin defaults to CNI, so without a CNI plugin even basic access functionality will fail, and all the nodes are in ‘Not Ready’ status. We ended up using Flannel for our networking (we were going to use Calico, but there were some conflicts with the currently shipping version of Calico and the currently shipping version of CoreDNS). The ‘–pod-network-cidr= ‘ line from kubeadm init is necessary for your CNI plugins to work correctly, and the defaults are all listed in the Kubernetes docs (https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, about halfway down the page).

Once the networking was in place, we had a functional cluster we could deploy apps to. We tested with a simple NGINX deployment with a NodePort to make sure things were working smoothly (Spoiler: it all worked great). Next, we had to deploy the application, as found in our GitHub repo (https://github.com/infosiftr/kubecon-multiarch-demo/tree/kubectl) but first we needed to build the images from that repo.

First, we set up an insecure registry running only on the internal IP with docker run -d -p 172.23.0.151:5000:5000 –restart always –name registry registry. (Remember how I said that internal DHCP IP would be important later?) We also had to add the following to /etc/docker/daemon.json on the laptop as well as all the nodes: { “insecure-registries” : [“172.23.0.151:5000”] }. Next we performed a docker build and docker push on both the laptop and one of the Pis to the insecure registry, and built a manifest list (https://github.com/estesp/manifest-tool) from the two images.

Now that we had an air-gapped demo environment, including the registry and images for all the architectures, it was on to actually performing the demo. Every time the images (and manifest list) was rebuilt, the build was checked in to the registry, and we would pull the newest version any time it scaled up and down.

Since the action of our “Blinky Lights” demo was saturating network traffic on the switch, but the heart of it was the multi-arch support, we also needed to deploy a version on an AMD64 node, so there was one additional Kubernetes node, running in a VM on the laptop that joined the cluster. The build process took care of both architectures together, built with a manifest list, and we had our images ready.

Once the images were built, we managed the image distribution by scaling a DaemonSet that had a nodeSelector (app: blinky-node vs blinky-nope), dynamically retagging the individual nodes, and having ‘imagePullPolicy: Always’ in place. This forced the demonstration of the newer image build in fewer steps than having to build a webhook. The UI that managed all this was running on the laptop, and was part of the same image that was generating traffic on the Pis.

Issues We Avoided: We had previously run into significant issues around DNS and NTP. Keeping everything effectively air-gapped on the show floor removed any DNS concerns (image pulls, apt-get, etc), and we made sure to give NTP a slight kick anytime the Pis had been unplugged, with a simple bash script. Also, if you’re interested in the CoreDNS-vs-Calico networking issues we avoided by using Flannel, you can find many discussions of it on GitHub.

This version took slightly longer than previous iterations, simply because we hit some snags in how other projects and teams are handling their multi-architecture support and testing. (HINT, HINT, Call Us!). Following this guide, however, should have you running in a couple hours or less.

Weird 32-bit issues: As mentioned earlier, there was some weird behavior using the 32-bit Raspbian. Most of the weirdness of the 32-bit kernel revolved around the CNI plugins. It took a while to notice that Calico was only delivering arm64, so the containers kept crashing on arm32. Flannel also had some strangeness around sometimes giving a NIC multiple IPs, but that went away with the 64-bit version. CoreDNS also had some problems with pulling the wrong nameservers, which put a hard stop on our ability to test, but in the end, a 64-bit kernel fixed almost all of our Kubernetes issues. I would like to point out that our arm32 systems otherwise worked without any hiccups at all (tested in bare Docker), and all the issues appeared to be upstream. It’s also worth noting that of the major CNI plugins, only Flannel and Weave work on arm32.

So, TL;DR: We dynamically built an image to run on both Arm and AMD64, deployed to a freshly built Kubernetes cluster of Raspberry Pis and AMD64 VMs, and made a whole lot of lights blink, in a fairly short amount of time… and you can too.

If you would like to discuss more about how InfoSiftr can help you with multi-architecture support, image builds, general application design, or anything else along your containerization journey, please contact us at [email protected].

 

2019-05-07T13:30:09+00:00

About the Author: