Our colleagues at Raspberry bring us a great tutorial about setting up your own SBC ARM cluster:

We are going to remember the steps used to unify a cluster of Raspberry cards from the original post, but it has helped us to develop our own ARM Raspberry Pi CM4 server in a U Rack:

What we’re going to build ?

We’re going to put together an eight-node cluster connected to a single managed switch. One of the nodes will be the so-called “head” node: this node will have a second Gigabit Ethernet connection out to the LAN/WAN via a USB3 Ethernet dongle, and an external 1TB SSD mounted via a USB3-to-SATA connector. While the head node will boot from an SD card as normal, the other seven nodes — the “compute” nodes — will be configured to network boot, with the head node acting as the boot server and the OS images being stored on the external disk. As well as serving as the network boot volume, the 1TB disk will also host a scratch partition that is shared to all the compute nodes in the cluster.

 

All eight of our Raspberry Pi boards will have a Raspberry Pi PoE+ HAT attached. This means that, since we’re using a PoE+ enabled switch, we only need to run a single Ethernet cable to each of our nodes and don’t need a separate USB hub to power them.

Wiring diagram for the cluster

Raspberry Pi cluster is a low-cost, versatile system

What you’ll need

Shopping list

  • 8 x Raspberry Pi 4
  • 8 x Raspberry Pi PoE+ HAT
  • 8-port Gigabit PoE-enabled switch
  • USB 3 to Gigabit Ethernet adaptor
  • USB 3 to SATA adaptor
  • SSD SATA drive
  • 8 x Ethernet cables
  • 16 GB SD card
  • Cluster case

We will not go into construction details of the case or fans, you can go to the original publication to see the complete process: https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/

Configuring the Raspberry Pi operating system

We’re going to bring up the head node from an SD card. The easiest, and recommended, way to install Raspberry Pi OS is to use Raspberry Pi Imager. So go ahead and install Imager on your laptop, and then grab a microSD card (minimum 16GB) and an adapter if you need one, and start the installation process.

Raspberry Pi Imager running under macOS

Click on the “CHOOSE OS” button and select “Raspberry Pi OS (other)” and then “Raspberry Pi OS Lite (32-bit)”. Then click on “CHOOSE STORAGE” and select your SD card from the drop-down list.

Setting “Advanced” options

Next hit Ctrl-Shift-X, or click on the Cog Wheel which appeared after you selected your OS, to open the “Advanced” menu. This will let you set the hostname (I went with “cluster”), as well as enable the SSH server and set up the default user — I went with “pi” for simplicity — along with configuring the wireless interface so your head node will pop up on your home LAN.

Afterwards, click on the “SAVE” button and then the “WRITE” button to write your operating system to the card.

Building your head node

Head node with SSD disk and external Ethernet dongle connected

The exact way you plug things together is going to depend on your cluster components and whether you picked up a case, or more likely what sort of case you have. I’m going to slot my head node into the far left-hand side of my case. This lets me mount the SSD drive against one wall of the case using a mounting screw to secure it in place.

View of the head node from the other side, showing the SSD disk attached to the cluster frame

Connecting over wireless

We configured the head node to know about our local wireless network during setup, so we should just be able to ssh directly into the head node using the name we gave it during setup:

$ ssh pi@cluster.local
pi@cluster.local's password:
$

If we take a look at the network configuration

$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.253.7 netmask 255.255.0.0 broadcast 169.254.255.255
inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet)
RX packets 15 bytes 2150 (2.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 29 bytes 4880 (4.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 14 bytes 1776 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 1776 (1.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet)
RX packets 81 bytes 12704 (12.4 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 66 bytes 11840 (11.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$

You can see that wlan0 is connected to our local network with a 192.168.* address, while eth0 which we’ve plugged into our switch has a self-assigned 169.245.* address. We get this self-assigned IP address because the PoE switch I’m using here is a managed switch, rather than a dumb switch. We’ll resolve this later in the project by turning our head node into a DHCP server that will assign an IP address to each of the compute nodes, as well as to our smart switch.

Adding a second Ethernet connection

We’ve been able to reach our head node over the network because we configured our wireless interface wlan0 when we set up our SD card. However, it would be good to hardwire our cluster to the network rather than rely on wireless, because we might want to transfer large files back and forth, and wired interfaces are a lot more stable.

To do that we’re going to need an additional Ethernet connection, so I’m going to add a USB 3-to-Gigabit Ethernet adaptor to the head node. We’ll leave the onboard Ethernet socket (eth0) connected to our PoE switch to serve as the internal connection to the cluster, while we use the second Ethernet connection (eth1) to talk to the outside world.

We’ll therefore configure eth1 to pick up an IP address from our LAN’s DHCP server. Go ahead and create a new file called /etc/network/interfaces.d/eth1 which should like this:

auto eth1
allow-hotplug eth1
iface eth1 inet dhcp

We’ll leave eth0, the onboard Ethernet socket, connected to the Ethernet switch to serve as the internal connection to the cluster. Internally we’ll allocate 192.168.50.* addresses to the cluster, with our head node having the IP address 192.168.50.1.

Create a new file called /etc/network/interfaces.d/eth0 which, this time, should like this:

auto eth0
allow-hotplug eth0
iface eth0 inet static
address 192.168.50.1
netmask 255.255.255.0
network 192.168.50.0
broadcast 192.168.50.255

Afterwards, reboot. Then, if everything has gone to plan, you should see something like this:

$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.50.1 netmask 255.255.255.0 broadcast 192.168.50.255
inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet)
RX packets 14 bytes 840 (840.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 37 bytes 5360 (5.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.166 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::9350:f7d2:8ccd:151f prefixlen 64 scopeid 0x20<link>
ether 00:e0:4c:68:1d:da txqueuelen 1000 (Ethernet)
RX packets 164 bytes 26413 (25.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 95 bytes 15073 (14.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 14 bytes 1776 (1.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 14 bytes 1776 (1.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet)
RX packets 120 bytes 22780 (22.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0

TX packets 38 bytes 5329 (5.2 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$

Configuring the DHCP server

Now we have a “second” Gigabit Ethernet connection out to the world via eth1, and our onboard Ethernet is configured with a static IP address, it’s time to make our Raspberry Pi into a DHCP server for our cluster on eth0.

Start by installing the DHCP server itself

$ sudo apt install isc-dhcp-server

and then edit the /etc/dhcp/dhcpd.conf file as follows:

ddns-update-style none;
authoritative;
log-facility local7;

# No service will be given on this subnet
subnet 192.168.1.0 netmask 255.255.255.0 {
}

# The internal cluster network
group {
option broadcast-address 192.168.50.255;
option routers 192.168.50.1;
default-lease-time 600;
max-lease-time 7200;
option domain-name "cluster";
option domain-name-servers 8.8.8.8, 8.8.4.4;
subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;

# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}

}
}

Then edit the /etc/default/isc-dhcp-server file to reflect our new server setup

DHCPDv4_CONF=/etc/dhcp/dhcpd.conf
DHCPDv4_PID=/var/run/dhcpd.pid
INTERFACESv4="eth0"

as well as the /etc/hosts file

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

127.0.1.1 cluster

192.168.50.1 cluster

and then you can reboot the head node to start the DHCP service.

We’ve set things up so that known hosts that aren’t known are allocated an IP address starting from 192.168.50.20. Once we know the MAC addresses of our compute nodes we can add them to the /etc/dhcp/dhcpd.conf file so they grab static IP addresses going forward rather than getting a random one as they come up.

Logging back into your head node after the reboot if you have a managed switch for your cluster, like the NETGEAR switch I’m using which will grab an IP address of its own, you can check your DHCP service is working.

$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
==================================================================================
80:cc:9c:94:53:35 192.168.50.20 GS308EPP 2021-12-06 14:19:52 NETGEAR
$

Otherwise, you’ll have to wait until you add your first node as unmanaged switches won’t request their own address.

However, if you do have a managed switch, you might well want to give it a static IP address inside the cluster by adding one to the  /etc/dhcp/dhcpd.conf and /etc/hosts files in a similar fashion to the head node. I went with switch as the hostname,

192.168.50.1 cluster
192.168.50.254 switch

and 192.168.50.254 as the allocated IP address.

subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;

# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}

# NETGEAR Switch
host switch {
hardware ethernet 80:cc:9c:94:53:35;
fixed-address 192.168.50.254;
}
}

Adding an external disk

If we’re going to network boot our compute nodes, we’re going to need a bit more space. You could do this by plugging a flash stick into one of the USB ports on the head node, but I’m going to use a USB 3 to SATA Adaptor Cable to attach a 1TB SSD that I had on the shelf in the lab to give the cluster plenty of space for data.

Plugging the disk into one of the USB 3 sockets on the head node I’m going to format it with a GUID partition table, and a creat single ext4 partition on the disk.

$ sudo parted -s /dev/sda mklabel gpt
$ sudo parted --a optimal /dev/sda mkpart primary ext4 0% 100%
$ sudo mkfs -t ext4 /dev/sda1
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 244175218 4k blocks and 61046784 inodes
Filesystem UUID: 1a312035-ffdb-4c2b-9149-c975461de8f2
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
$

We can then mount the disk manually to check everything is okay,

$ sudo mkdir /mnt/usb
$ sudo mount /dev/sda1 /mnt/usb

and then make sure it will automatically mount on boot by adding the following to the /etc/fstab file.

/dev/sda1 /mnt/usb auto defaults,user 0 1

You should ensure that you can mount the disk manually before rebooting, as adding it as an entry in the /etc/fstab file might cause the Raspberry Pi to hang during boot if the disk isn’t available.

Making the disk available to the cluster

We’re going to want to make the disk available across the cluster. You’ll need to install the NFS server software,

$ sudo apt install nfs-kernel-server

create a mount point which we can share,

$ sudo mkdir /mnt/usb/scratch
$ sudo chown pi:pi /mnt/usb/scratch
$ sudo ln -s /mnt/usb/scratch /scratch

and then edit the /etc/exports file to add a list of IP addresses from which you want to be able to mount your disk.

/mnt/usb/scratch 192.168.50.0/24(rw,sync)

Here we’re exporting it to 192.168.50.0/24 which is shorthand for “…all the IP addresses between 192.168.50.0 and 192.168.50.254.”

After doing this you should enable, and then start, both the rpcbind and nfs-server services,

$ sudo systemctl enable rpcbind.service
$ sudo systemctl start rpcbind.service
$ sudo systemctl enable nfs-server.service
$ sudo systemctl start nfs-server.service

and then reboot.

$ sudo reboot

Adding the first node

We’re going to set up our compute node to network boot from our head node. To do that we’re first going to have to configure our nodes for network boot. How to do this is different between Raspberry Pi models. However, for Raspberry Pi 4 the board will need to be booted a single time from an SD card and the boot order configured using the raspi-config command-line tool.

Enabling for network boot

The easiest way to proceed is to use the Raspberry Pi Imager software to burn a second SD card with Raspberry Pi OS Lite (32-bit). There isn’t any need to specially configure this installation before booting the board as we did for the head node, except to enable SSH.

Next boot the board attached to the cluster switch.

A second Raspberry Pi 4 powered using PoE+ next to our original head node.

The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list.

 

$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
===============================================================================================
dc:a6:32:6a:16:87 192.168.50.21 raspberrypi 2021-12-07 11:54:29 Raspberry Pi Ltd
$

We can now go ahead and SSH into the new board and enable network booting using raspi-config from the command line.

$ ssh pi@192.168.50.21
$ sudo raspi-config

Choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.

 

If you get an error when trying to enable network boot complaining that “No EEPROM bin file found” then you need to update the firmware on your Raspberry Pi before proceeding. You should do this,

$ sudo apt install rpi-eeprom
$ sudo rpi-eeprom-update -d -a
$ sudo reboot

and then after the node comes back up from its reboot, try to set up network boot once again.

 

Once the Raspberry Pi has rebooted, check that the boot order using vcgencmd,

$ vcgencmd bootloader_config
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0

[all]
BOOT_ORDER=0xf21
$

which should now show that the BOOT_ORDER is 0xf21 which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.

$ ethtool -P eth0
Permanent address: dc:a6:32:6a:16:87
$ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16
6a5ef8b0
$

Afterwards, you can shut down the board, at least for now, and remove the SD card.

Setting up the head node as a boot server

We now need to configure our head node to act as a boot server. There are several options here, but we’re going to use our existing DHCP server, along with a standalone TFTP server. You should create a mount point for the server, and install it:

$ sudo apt install tftpd-hpa
$ sudo apt install kpartx
$ sudo mkdir /mnt/usb/tftpboot
$ sudo chown tftp:tftp /mnt/usb/tftpboot

edit the /etc/default/tftpd-hpa file:

TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/mnt/usb/tftpboot"
TFTP_ADDRESS=":69"
TFTP_OPTIONS="--secure --create"

and restart the service.

$ sudo systemctl restart tftpd-hpa

We then need to set up our boot image, and we’re going to need to create one image per client. The first step is to grab the latest image from the web and mount it so we can make some changes, and then mount the partitions inside the image so we can copy the contents to our external disk.

$ sudo su
# mkdir /tmp/image
# cd /tmp/image
# wget -O raspbian_lite_latest.zip https://downloads.raspberrypi.org/raspbian_lite_latest
# unzip raspbian_lite_latest.zip
# rm raspbian_lite_latest.zip
# kpartx -a -v *.img
# mkdir bootmnt
# mkdir rootmnt
# mount /dev/mapper/loop0p1 bootmnt/
# mount /dev/mapper/loop0p2 rootmnt/
# mkdir -p /mnt/usb/rpi1
# mkdir -p /mnt/usb/tftpboot/6a5ef8b0
# cp -a rootmnt/* /mnt/usb/rpi1
# cp -a bootmnt/* /mnt/usb/rpi1/boot

Afterwards, we can customise the root file system:

# touch /mnt/usb/rpi1/boot/ssh
# sed -i /UUID/d /mnt/usb/rpi1/etc/fstab
# echo "192.168.50.1:/mnt/usb/tftpboot /boot nfs defaults,vers=4.1,proto=tcp 0 0" >> /mnt/usb/rpi1/etc/fstab
# echo "console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi1,vers=4.1,proto=tcp rw ip=dhcp rootwait" > /mnt/usb/rpi1/boot/cmdline.txt

add it to the /etc/fstab and /etc/exports files on the head node:

# echo "/mnt/usb/rpi1/boot /mnt/usb/tftpboot/6a5ef8b0 none defaults,bind 0 0" >> /etc/fstab
# echo "/mnt/usb/rpi1 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports

and then clean up after ourselves.

# systemctl restart rpcbind
# systemctl restart nfs-server
# umount bootmnt/
# umount rootmnt/
# cd /tmp; rm -rf image
# exit
$

Finally, we need to edit the /etc/dhcp/dhcpd.conf file as follows:

ddns-update-style none;
authoritative;
log-facility local7;
option option-43 code 43 = text;
option option-66 code 66 = text;

# No service will be given on this subnet
subnet 192.168.1.0 netmask 255.255.255.0 {
}

# The internal cluster network
group {
option broadcast-address 192.168.50.255;
option routers 192.168.50.1;
default-lease-time 600;
max-lease-time 7200;
option domain-name "cluster";
option domain-name-servers 8.8.8.8, 8.8.4.4;
subnet 192.168.50.0 netmask 255.255.255.0 {
range 192.168.50.20 192.168.50.250;

# Head Node
host cluster {
hardware ethernet dc:a6:32:6a:16:90;
fixed-address 192.168.50.1;
}

# NETGEAR Switch
host switch {
hardware ethernet 80:cc:9c:94:53:35;
fixed-address 192.168.50.254;
}

host rpi1 {
option root-path "/mnt/usb/tftpboot/";
hardware ethernet dc:a6:32:6a:16:87;
option option-43 "Raspberry Pi Boot";
option option-66 "192.168.50.1";
next-server 192.168.50.1;
fixed-address 192.168.50.11;
option host-name "rpi1";
}

}
}

and reboot our Raspberry Pi.

$ sudo reboot

Network booting our node

Make sure you’ve removed the SD card from the compute node, and plug the Raspberry Pi back into your switch. If you’ve got a spare monitor handy it might be a good idea to plug it into the HDMI port so you can watch the diagnostics screen as the node boots.

Network booting our first compute node for the first time. It’s connected to a display for debugging.

If all goes to plan the board should boot up without incident. Although there are a few things we will need to tidy up, you should now be able to SSH directly into the compute node.

$ ssh 192.168.50.11
pi@192.168.50.11's password:
$

If you were watching the boot messages on a monitor, or if you check in the logs, you can see that our image didn’t come up entirely cleanly. If you log back into the compute node you can make sure that doesn’t happen in future by turning off the feature where the Raspberry Pi tries to resize its filesystem on the first boot, and also by uninstalling the swap daemon.

$ sudo systemctl disable resize2fs_once.service
$ sudo apt remove dphys-swapfile

Next, we can make things slightly easier on ourselves, so that we don’t have to use the IP address of our compute and head nodes every time, by adding our current and future compute nodes to the /etc/hosts file on both our head and compute nodes.

127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

127.0.1.1 cluster

192.168.50.1 cluster
192.168.50.254 switch

192.168.50.11 rpi1
192.168.50.12 rpi2
192.168.50.13 rpi3
192.168.50.14 rpi4
192.168.50.15 rpi5
192.168.50.16 rpi6
192.168.50.17 rpi7

Finally, we should change the hostname from the default raspberrypi to rpi1 using the raspi-config command-line tool.

$ sudo raspi-config

Select “Network Options,” then “Hostname” to change the hostname of the compute node, and select “Yes” to reboot.

Mounting the scratch disk

Normally if we were mounting a network disk we’d make use autofs rather than adding it as an entry directly into the /etc/fstab file. However here, with our entire root filesystem mounted via the network, that seems like unnecessary effort.

After it reboots log back into your compute node, add a mount point:

$ sudo mkdir /scratch
$ sudo chown pi:pi scratch

and edit the /etc/fstab file there to add the scratch disk.

192.168.50.1:/mnt/usb/scratch /scratch nfs defaults 0 0

Then reboot the compute node.

$ sudo reboot

Secure shell without a password

It’s going to get pretty tiresome secure-shelling between the cluster head node and the compute nodes and having to type your password each time. So let’s enable secure shell without a password by generating a public/private key pair.

On the compute node you should edit the /etc/ssh/sshd_config file to enable public key login:

PubkeyAuthentication yes
PasswordAuthentication yes
PermitEmptyPasswords no

and then restart the sshd server.

$ sudo systemctl restart ssh

Then going back to the head node we need to generate our public/private key pair and distribute the public key to the compute node. Use a blank passphrase when asked.

$ ssh-keygen -t rsa -b 4096 -C "pi@cluster"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/pi/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/pi/.ssh/id_rsa
Your public key has been saved in /home/pi/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:XdaHog/sAf1QbFiZj7sS9kkFhCJU9tLN0yt8OvZ52gA pi@cluster
The key's randomart image is:
+---[RSA 4096]----+
| ...o *+o |
| ...+o+*o . |
| .o.=.B++ .|
| = B.ooo |
| S * Eoo |
| .o+o= |
| ..+=o. |
| ..+o +.|
| . +o.|
+----[SHA256]-----+
$ ssh-copy-id -i /home/pi/.ssh/id_rsa.pub pi@rpi1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/pi/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
pi@rpi1's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'pi@rpi1'"
and check to make sure that only the key(s) you wanted were added.
$

Afterwards, you should be able to login to the compute node without having to type your password.

Access to the outside world

One thing our compute node doesn’t have right now is access to the LAN. Right now the compute node can only see the head node and eventually, once we add them, the rest of the compute nodes. But we can fix that! On the head node go and edit the /etc/sysctl.conf file by uncommenting the line saying,

net.ipv4.ip_forward=1

After activating forwarding we’ll need to configure iptables:

$ apt install iptables
$ sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
$ sudo iptables -A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
$ sudo iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT
$ sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"

and then add a line — just above the exit 0 line — in the /etc/rc.local file a line to load the tables on boot:

_IP=$(hostname -I) || true
if [ "$_IP" ]; then
printf "My IP address is %s\n" "$_IP"
fi

iptables-restore < /etc/iptables.ipv4.nat

exit 0

and reboot.

$ sudo reboot

Note that if you still have the compute node running, you should log on to that first and shut it down, as the root filesystem for that lives on a disk attached to our head node.

Adding the next compute node

Adding the rest of the compute nodes is going to be much more straightforward than adding our first node as we can now use our customised image and avoid some of the heavy lifting we did for the first compute node.

Go ahead and grab your SD card again and boot your next Raspberry Pi attached to the cluster switch.

Booting the second compute node.

The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list.

$ dhcp-lease-list
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC IP hostname valid until manufacturer
===============================================================================================
dc:a6:32:6a:15:e2 192.168.50.21 raspberrypi 2021-12-08 21:15:00 Raspberry Pi Ltd
$

We can now go ahead and SSH into the new board and again enable network booting for this board using raspi-config from the command line:

$ rm /home/pi/.ssh/known_hosts
$ ssh pi@129.168.50.21
$ sudo raspi-config

choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.

Once the Raspberry Pi has rebooted, check the boot order using vcgencmd:

$ vcgencmd bootloader_config
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0

[all]
BOOT_ORDER=0xf21
$

which should now show that the BOOT_ORDER is 0xf21 which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.

$ ethtool -P eth0
Permanent address: dc:a6:32:6a:15:e2
$ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16
54e91338
$

Afterwards, you can shut down the board, at least for now, and remove the SD card.

Moving back to our head node we can use our already configured image as the basis of the operating system for the next compute node.

$ sudo su
# mkdir -p /mnt/usb/rpi2
# cp -a /mnt/usb/rpi1/* /mnt/usb/rpi2
# mkdir -p /mnt/usb/tftpboot/54e91338
# echo "/mnt/usb/rpi2/boot /mnt/usb/tftpboot/54e91338 none defaults,bind 0 0" >> /etc/fstab
# echo "/mnt/usb/rpi2 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
# exit
$

Then we need to edit the /mnt/usb/rpi2/boot/cmdline.txt, replacing “rpi1” with “rpi2“:

console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi2,vers=4.1,proto=tcp rw ip=dhcp rootwait

and similarly for /mnt/usb/rpi2/etc/hostname.

rpi2

Finally, we need to edit the /etc/dhcp/dhcpd.conf file on the head node:

host rpi2 {
option root-path "/mnt/usb/tftpboot/";
hardware ethernet dc:a6:32:6a:15:e2;
option option-43 "Raspberry Pi Boot";
option option-66 "192.168.50.1";
next-server 192.168.50.1;
fixed-address 192.168.50.12;
option host-name "rpi2";
}

and reboot our head node.

$ sudo reboot

Afterwards, you should see both rpi1 and rpi2 are up and running. If you’re interested, we can get a better look at our cluster network by installing nmap on the head node.

$ sudo apt install nmap
$ nmap 192.168.50.0/24
Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-09 11:40 GMT
Nmap scan report for cluster (192.168.50.1)
Host is up (0.0018s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
2049/tcp open nfs

Nmap scan report for rpi1 (192.168.50.11)
Host is up (0.0017s latency).
Not shown: 999 closed ports
PORT STATE SERVICE
22/tcp open ssh

Nmap scan report for rpi2 (192.168.50.12)
Host is up (0.00047s latency).
Not shown: 999 closed ports
PORT STATE SERVICE
22/tcp open ssh

Nmap scan report for switch (192.168.50.254)
Host is up (0.014s latency).
Not shown: 999 filtered ports
PORT STATE SERVICE
80/tcp open http

Nmap done: 256 IP addresses (4 hosts up) scanned in 6.91 seconds
$

Adding the rest of the nodes

The final Bramble

Adding the remaining five compute nodes is now more or less a mechanical process. You’ll need to follow the process we went through for rpi2 for rpi3, rpi4, rpi5, rpi6, and rpi7. Substituting the appropriate MAC address, serial number, and hostname for each of the new compute nodes.

Hostname MAC Address Serial Number
rpi1 dc:a6:32:6a:16:87 6a5ef8b0
rpi2 dc:a6:32:6a:15:e2 54e91338
rpi3 dc:a6:32:6a:15:16 6124b5e4
rpi4 dc:a6:32:6a:15:55 52cddb85
rpi5 dc:a6:32:6a:16:1b a0f55410
rpi6 dc:a6:32:6a:15:bb c5fb02d3
rpi7 dc:a6:32:6a:15:4f f57fbb98

The compute nodes

When bringing the last compute node up I also went ahead and plugged the two remaining franken-cables into the final node to power the right-most fans in my case.

Controlling your Raspberry Pi cluster

Now we have all our nodes up and running, we need some cluster control tools. One of my favourites is the parallel-ssh toolkit. You can install this on the head node from the command line,

$ apt install pssh

and, along with the excellent Python library allowing you to build your own cluster automation, this will install a number of command-line tools; parallel-ssh, parallel-scp, parallel-rsync, parallel-slurp, and parallel-nuke. These tools can help you run and control jobs, and move and copy files, between the head node and the compute nodes.

To use the command line tools you’ll need to create a hosts file listing all the compute nodes, I saved mine as .ppsh_hosts in my home directory.

$ cat .pssh_hosts
rpi1
rpi2
rpi3
rpi4
rpi5
rpi6
rpi7
$

After creating the file we can use the command line tools to, amongst other things, execute a command on all seven of our compute nodes.

$ parallel-ssh -i -h .pssh_hosts free -h
[1] 12:10:15 [SUCCESS] rpi4
total used free shared buff/cache available
Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[2] 12:10:15 [SUCCESS] rpi1
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[3] 12:10:15 [SUCCESS] rpi2
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
[4] 12:10:15 [SUCCESS] rpi7
total used free shared buff/cache available
Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 97Mi 3.6Gi
Swap: 0B 0B 0B
[5] 12:10:15 [SUCCESS] rpi3
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 16Mi 104Mi 3.6Gi
Swap: 0B 0B 0B
[6] 12:10:15 [SUCCESS] rpi5
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 16Mi 72Mi 3.6Gi
Swap: 0B 0B 0B
[7] 12:10:15 [SUCCESS] rpi6
total used free shared buff/cache available
Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi
Swap: 0B 0B 0B
$

Although you should take note that the results will come back in a random order depending on how quickly the command was executed on each of the compute nodes.

Project Website:  https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/