Our colleagues at Raspberry bring us a great tutorial about setting up your own SBC ARM cluster:
We are going to remember the steps used to unify a cluster of Raspberry cards from the original post, but it has helped us to develop our own ARM Raspberry Pi CM4 server in a U Rack:
What we’re going to build ?
We’re going to put together an eight-node cluster connected to a single managed switch. One of the nodes will be the so-called “head” node: this node will have a second Gigabit Ethernet connection out to the LAN/WAN via a USB3 Ethernet dongle, and an external 1TB SSD mounted via a USB3-to-SATA connector. While the head node will boot from an SD card as normal, the other seven nodes — the “compute” nodes — will be configured to network boot, with the head node acting as the boot server and the OS images being stored on the external disk. As well as serving as the network boot volume, the 1TB disk will also host a scratch partition that is shared to all the compute nodes in the cluster.
All eight of our Raspberry Pi boards will have a Raspberry Pi PoE+ HAT attached. This means that, since we’re using a PoE+ enabled switch, we only need to run a single Ethernet cable to each of our nodes and don’t need a separate USB hub to power them.
Wiring diagram for the cluster
Raspberry Pi cluster is a low-cost, versatile system
What you’ll need
Shopping list
- 8 x Raspberry Pi 4
- 8 x Raspberry Pi PoE+ HAT
- 8-port Gigabit PoE-enabled switch
- USB 3 to Gigabit Ethernet adaptor
- USB 3 to SATA adaptor
- SSD SATA drive
- 8 x Ethernet cables
- 16 GB SD card
- Cluster case
We will not go into construction details of the case or fans, you can go to the original publication to see the complete process: https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/
Configuring the Raspberry Pi operating system
We’re going to bring up the head node from an SD card. The easiest, and recommended, way to install Raspberry Pi OS is to use Raspberry Pi Imager. So go ahead and install Imager on your laptop, and then grab a microSD card (minimum 16GB) and an adapter if you need one, and start the installation process.
Raspberry Pi Imager running under macOS
Click on the “CHOOSE OS” button and select “Raspberry Pi OS (other)” and then “Raspberry Pi OS Lite (32-bit)”. Then click on “CHOOSE STORAGE” and select your SD card from the drop-down list.
Setting “Advanced” options
Next hit Ctrl-Shift-X, or click on the Cog Wheel which appeared after you selected your OS, to open the “Advanced” menu. This will let you set the hostname (I went with “cluster”), as well as enable the SSH server and set up the default user — I went with “pi” for simplicity — along with configuring the wireless interface so your head node will pop up on your home LAN.
Afterwards, click on the “SAVE” button and then the “WRITE” button to write your operating system to the card.
Building your head node
Head node with SSD disk and external Ethernet dongle connected
The exact way you plug things together is going to depend on your cluster components and whether you picked up a case, or more likely what sort of case you have. I’m going to slot my head node into the far left-hand side of my case. This lets me mount the SSD drive against one wall of the case using a mounting screw to secure it in place.
View of the head node from the other side, showing the SSD disk attached to the cluster frame
Connecting over wireless
We configured the head node to know about our local wireless network during setup, so we should just be able to ssh directly into the head node using the name we gave it during setup:
$ ssh pi@cluster.local pi@cluster.local's password: $
If we take a look at the network configuration
$ ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 169.254.253.7 netmask 255.255.0.0 broadcast 169.254.255.255 inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link>
ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet) RX packets 15 bytes 2150 (2.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 29 bytes 4880 (4.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 14 bytes 1776 (1.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14 bytes 1776 (1.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link> ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet) RX packets 81 bytes 12704 (12.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 66 bytes 11840 (11.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $
You can see that wlan0 is connected to our local network with a 192.168.* address, while eth0 which we’ve plugged into our switch has a self-assigned 169.245.* address. We get this self-assigned IP address because the PoE switch I’m using here is a managed switch, rather than a dumb switch. We’ll resolve this later in the project by turning our head node into a DHCP server that will assign an IP address to each of the compute nodes, as well as to our smart switch.
Adding a second Ethernet connection
We’ve been able to reach our head node over the network because we configured our wireless interface wlan0 when we set up our SD card. However, it would be good to hardwire our cluster to the network rather than rely on wireless, because we might want to transfer large files back and forth, and wired interfaces are a lot more stable.
To do that we’re going to need an additional Ethernet connection, so I’m going to add a USB 3-to-Gigabit Ethernet adaptor to the head node. We’ll leave the onboard Ethernet socket (eth0) connected to our PoE switch to serve as the internal connection to the cluster, while we use the second Ethernet connection (eth1) to talk to the outside world.
We’ll therefore configure eth1 to pick up an IP address from our LAN’s DHCP server. Go ahead and create a new file called /etc/network/interfaces.d/eth1 which should like this:
auto eth1 allow-hotplug eth1 iface eth1 inet dhcp
We’ll leave eth0, the onboard Ethernet socket, connected to the Ethernet switch to serve as the internal connection to the cluster. Internally we’ll allocate 192.168.50.* addresses to the cluster, with our head node having the IP address 192.168.50.1.
Create a new file called /etc/network/interfaces.d/eth0 which, this time, should like this:
auto eth0 allow-hotplug eth0 iface eth0 inet static address 192.168.50.1 netmask 255.255.255.0 network 192.168.50.0 broadcast 192.168.50.255
Afterwards, reboot. Then, if everything has gone to plan, you should see something like this:
$ ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.50.1 netmask 255.255.255.0 broadcast 192.168.50.255 inet6 fe80::6aae:4be3:322b:33ce prefixlen 64 scopeid 0x20<link> ether dc:a6:32:6a:16:90 txqueuelen 1000 (Ethernet) RX packets 14 bytes 840 (840.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 37 bytes 5360 (5.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.166 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::9350:f7d2:8ccd:151f prefixlen 64 scopeid 0x20<link> ether 00:e0:4c:68:1d:da txqueuelen 1000 (Ethernet) RX packets 164 bytes 26413 (25.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 95 bytes 15073 (14.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 14 bytes 1776 (1.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14 bytes 1776 (1.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.120 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::acae:64b:43ea:8b4f prefixlen 64 scopeid 0x20<link> ether dc:a6:32:6a:16:91 txqueuelen 1000 (Ethernet) RX packets 120 bytes 22780 (22.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 38 bytes 5329 (5.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $
Configuring the DHCP server
Now we have a “second” Gigabit Ethernet connection out to the world via eth1, and our onboard Ethernet is configured with a static IP address, it’s time to make our Raspberry Pi into a DHCP server for our cluster on eth0.
Start by installing the DHCP server itself
$ sudo apt install isc-dhcp-server
and then edit the /etc/dhcp/dhcpd.conf file as follows:
ddns-update-style none; authoritative; log-facility local7; # No service will be given on this subnet subnet 192.168.1.0 netmask 255.255.255.0 { } # The internal cluster network group { option broadcast-address 192.168.50.255; option routers 192.168.50.1; default-lease-time 600; max-lease-time 7200; option domain-name "cluster"; option domain-name-servers 8.8.8.8, 8.8.4.4; subnet 192.168.50.0 netmask 255.255.255.0 { range 192.168.50.20 192.168.50.250; # Head Node host cluster { hardware ethernet dc:a6:32:6a:16:90; fixed-address 192.168.50.1; } } }
Then edit the /etc/default/isc-dhcp-server file to reflect our new server setup
DHCPDv4_CONF=/etc/dhcp/dhcpd.conf DHCPDv4_PID=/var/run/dhcpd.pid INTERFACESv4="eth0"
as well as the /etc/hosts file
127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 127.0.1.1 cluster 192.168.50.1 cluster
and then you can reboot the head node to start the DHCP service.
We’ve set things up so that known hosts that aren’t known are allocated an IP address starting from 192.168.50.20. Once we know the MAC addresses of our compute nodes we can add them to the /etc/dhcp/dhcpd.conf file so they grab static IP addresses going forward rather than getting a random one as they come up.
Logging back into your head node after the reboot if you have a managed switch for your cluster, like the NETGEAR switch I’m using which will grab an IP address of its own, you can check your DHCP service is working.
$ dhcp-lease-list Reading leases from /var/lib/dhcp/dhcpd.leases MAC IP hostname valid until manufacturer ================================================================================== 80:cc:9c:94:53:35 192.168.50.20 GS308EPP 2021-12-06 14:19:52 NETGEAR $
Otherwise, you’ll have to wait until you add your first node as unmanaged switches won’t request their own address.
However, if you do have a managed switch, you might well want to give it a static IP address inside the cluster by adding one to the /etc/dhcp/dhcpd.conf and /etc/hosts files in a similar fashion to the head node. I went with switch as the hostname,
192.168.50.1 cluster 192.168.50.254 switch and 192.168.50.254 as the allocated IP address. subnet 192.168.50.0 netmask 255.255.255.0 { range 192.168.50.20 192.168.50.250; # Head Node host cluster { hardware ethernet dc:a6:32:6a:16:90; fixed-address 192.168.50.1; } # NETGEAR Switch host switch { hardware ethernet 80:cc:9c:94:53:35; fixed-address 192.168.50.254; } }
Adding an external disk
If we’re going to network boot our compute nodes, we’re going to need a bit more space. You could do this by plugging a flash stick into one of the USB ports on the head node, but I’m going to use a USB 3 to SATA Adaptor Cable to attach a 1TB SSD that I had on the shelf in the lab to give the cluster plenty of space for data.
Plugging the disk into one of the USB 3 sockets on the head node I’m going to format it with a GUID partition table, and a creat single ext4 partition on the disk.
$ sudo parted -s /dev/sda mklabel gpt $ sudo parted --a optimal /dev/sda mkpart primary ext4 0% 100% $ sudo mkfs -t ext4 /dev/sda1 mke2fs 1.46.2 (28-Feb-2021) Creating filesystem with 244175218 4k blocks and 61046784 inodes Filesystem UUID: 1a312035-ffdb-4c2b-9149-c975461de8f2 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: done $
We can then mount the disk manually to check everything is okay,
$ sudo mkdir /mnt/usb $ sudo mount /dev/sda1 /mnt/usb
and then make sure it will automatically mount on boot by adding the following to the /etc/fstab file.
/dev/sda1 /mnt/usb auto defaults,user 0 1
You should ensure that you can mount the disk manually before rebooting, as adding it as an entry in the /etc/fstab file might cause the Raspberry Pi to hang during boot if the disk isn’t available.
Making the disk available to the cluster
We’re going to want to make the disk available across the cluster. You’ll need to install the NFS server software,
$ sudo apt install nfs-kernel-server
create a mount point which we can share,
$ sudo mkdir /mnt/usb/scratch $ sudo chown pi:pi /mnt/usb/scratch $ sudo ln -s /mnt/usb/scratch /scratch
and then edit the /etc/exports file to add a list of IP addresses from which you want to be able to mount your disk.
/mnt/usb/scratch 192.168.50.0/24(rw,sync) Here we’re exporting it to 192.168.50.0/24 which is shorthand for “…all the IP addresses between 192.168.50.0 and 192.168.50.254.”
After doing this you should enable, and then start, both the rpcbind and nfs-server services,
$ sudo systemctl enable rpcbind.service $ sudo systemctl start rpcbind.service $ sudo systemctl enable nfs-server.service $ sudo systemctl start nfs-server.service
and then reboot.
$ sudo reboot
Adding the first node
We’re going to set up our compute node to network boot from our head node. To do that we’re first going to have to configure our nodes for network boot. How to do this is different between Raspberry Pi models. However, for Raspberry Pi 4 the board will need to be booted a single time from an SD card and the boot order configured using the raspi-config command-line tool.
Enabling for network boot
The easiest way to proceed is to use the Raspberry Pi Imager software to burn a second SD card with Raspberry Pi OS Lite (32-bit). There isn’t any need to specially configure this installation before booting the board as we did for the head node, except to enable SSH.
Next boot the board attached to the cluster switch.
A second Raspberry Pi 4 powered using PoE+ next to our original head node.
The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list.
$ dhcp-lease-list Reading leases from /var/lib/dhcp/dhcpd.leases MAC IP hostname valid until manufacturer =============================================================================================== dc:a6:32:6a:16:87 192.168.50.21 raspberrypi 2021-12-07 11:54:29 Raspberry Pi Ltd $
We can now go ahead and SSH into the new board and enable network booting using raspi-config from the command line.
$ ssh pi@192.168.50.21 $ sudo raspi-config
Choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.
If you get an error when trying to enable network boot complaining that “No EEPROM bin file found” then you need to update the firmware on your Raspberry Pi before proceeding. You should do this,
$ sudo apt install rpi-eeprom $ sudo rpi-eeprom-update -d -a $ sudo reboot
and then after the node comes back up from its reboot, try to set up network boot once again.
Once the Raspberry Pi has rebooted, check that the boot order using vcgencmd,
$ vcgencmd bootloader_config BOOT_UART=0 WAKE_ON_GPIO=1 POWER_OFF_ON_HALT=0 [all] BOOT_ORDER=0xf21 $
which should now show that the BOOT_ORDER is 0xf21 which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.
$ ethtool -P eth0 Permanent address: dc:a6:32:6a:16:87 $ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16 6a5ef8b0 $
Afterwards, you can shut down the board, at least for now, and remove the SD card.
Setting up the head node as a boot server
We now need to configure our head node to act as a boot server. There are several options here, but we’re going to use our existing DHCP server, along with a standalone TFTP server. You should create a mount point for the server, and install it:
$ sudo apt install tftpd-hpa $ sudo apt install kpartx $ sudo mkdir /mnt/usb/tftpboot $ sudo chown tftp:tftp /mnt/usb/tftpboot
edit the /etc/default/tftpd-hpa file:
TFTP_USERNAME="tftp" TFTP_DIRECTORY="/mnt/usb/tftpboot" TFTP_ADDRESS=":69" TFTP_OPTIONS="--secure --create"
and restart the service.
$ sudo systemctl restart tftpd-hpa
We then need to set up our boot image, and we’re going to need to create one image per client. The first step is to grab the latest image from the web and mount it so we can make some changes, and then mount the partitions inside the image so we can copy the contents to our external disk.
$ sudo su # mkdir /tmp/image # cd /tmp/image # wget -O raspbian_lite_latest.zip https://downloads.raspberrypi.org/raspbian_lite_latest # unzip raspbian_lite_latest.zip # rm raspbian_lite_latest.zip # kpartx -a -v *.img # mkdir bootmnt # mkdir rootmnt # mount /dev/mapper/loop0p1 bootmnt/ # mount /dev/mapper/loop0p2 rootmnt/ # mkdir -p /mnt/usb/rpi1 # mkdir -p /mnt/usb/tftpboot/6a5ef8b0 # cp -a rootmnt/* /mnt/usb/rpi1 # cp -a bootmnt/* /mnt/usb/rpi1/boot
Afterwards, we can customise the root file system:
# touch /mnt/usb/rpi1/boot/ssh # sed -i /UUID/d /mnt/usb/rpi1/etc/fstab # echo "192.168.50.1:/mnt/usb/tftpboot /boot nfs defaults,vers=4.1,proto=tcp 0 0" >> /mnt/usb/rpi1/etc/fstab # echo "console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi1,vers=4.1,proto=tcp rw ip=dhcp rootwait" > /mnt/usb/rpi1/boot/cmdline.txt
add it to the /etc/fstab and /etc/exports files on the head node:
# echo "/mnt/usb/rpi1/boot /mnt/usb/tftpboot/6a5ef8b0 none defaults,bind 0 0" >> /etc/fstab # echo "/mnt/usb/rpi1 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
and then clean up after ourselves.
# systemctl restart rpcbind # systemctl restart nfs-server # umount bootmnt/ # umount rootmnt/ # cd /tmp; rm -rf image # exit $
Finally, we need to edit the /etc/dhcp/dhcpd.conf file as follows:
ddns-update-style none; authoritative; log-facility local7; option option-43 code 43 = text; option option-66 code 66 = text; # No service will be given on this subnet subnet 192.168.1.0 netmask 255.255.255.0 { } # The internal cluster network group { option broadcast-address 192.168.50.255; option routers 192.168.50.1; default-lease-time 600; max-lease-time 7200; option domain-name "cluster"; option domain-name-servers 8.8.8.8, 8.8.4.4; subnet 192.168.50.0 netmask 255.255.255.0 { range 192.168.50.20 192.168.50.250; # Head Node host cluster { hardware ethernet dc:a6:32:6a:16:90; fixed-address 192.168.50.1; } # NETGEAR Switch host switch { hardware ethernet 80:cc:9c:94:53:35; fixed-address 192.168.50.254; } host rpi1 { option root-path "/mnt/usb/tftpboot/"; hardware ethernet dc:a6:32:6a:16:87; option option-43 "Raspberry Pi Boot"; option option-66 "192.168.50.1"; next-server 192.168.50.1; fixed-address 192.168.50.11; option host-name "rpi1"; } } }
and reboot our Raspberry Pi.
$ sudo reboot
Network booting our node
Make sure you’ve removed the SD card from the compute node, and plug the Raspberry Pi back into your switch. If you’ve got a spare monitor handy it might be a good idea to plug it into the HDMI port so you can watch the diagnostics screen as the node boots.
Network booting our first compute node for the first time. It’s connected to a display for debugging.
If all goes to plan the board should boot up without incident. Although there are a few things we will need to tidy up, you should now be able to SSH directly into the compute node.
$ ssh 192.168.50.11 pi@192.168.50.11's password: $
If you were watching the boot messages on a monitor, or if you check in the logs, you can see that our image didn’t come up entirely cleanly. If you log back into the compute node you can make sure that doesn’t happen in future by turning off the feature where the Raspberry Pi tries to resize its filesystem on the first boot, and also by uninstalling the swap daemon.
$ sudo systemctl disable resize2fs_once.service $ sudo apt remove dphys-swapfile
Next, we can make things slightly easier on ourselves, so that we don’t have to use the IP address of our compute and head nodes every time, by adding our current and future compute nodes to the /etc/hosts file on both our head and compute nodes.
127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 127.0.1.1 cluster 192.168.50.1 cluster 192.168.50.254 switch 192.168.50.11 rpi1 192.168.50.12 rpi2 192.168.50.13 rpi3 192.168.50.14 rpi4 192.168.50.15 rpi5 192.168.50.16 rpi6 192.168.50.17 rpi7
Finally, we should change the hostname from the default raspberrypi to rpi1 using the raspi-config command-line tool.
$ sudo raspi-config
Select “Network Options,” then “Hostname” to change the hostname of the compute node, and select “Yes” to reboot.
Mounting the scratch disk
Normally if we were mounting a network disk we’d make use autofs rather than adding it as an entry directly into the /etc/fstab file. However here, with our entire root filesystem mounted via the network, that seems like unnecessary effort.
After it reboots log back into your compute node, add a mount point:
$ sudo mkdir /scratch $ sudo chown pi:pi scratch
and edit the /etc/fstab file there to add the scratch disk.
192.168.50.1:/mnt/usb/scratch /scratch nfs defaults 0 0
Then reboot the compute node.
$ sudo reboot
Secure shell without a password
It’s going to get pretty tiresome secure-shelling between the cluster head node and the compute nodes and having to type your password each time. So let’s enable secure shell without a password by generating a public/private key pair.
On the compute node you should edit the /etc/ssh/sshd_config file to enable public key login:
PubkeyAuthentication yes PasswordAuthentication yes PermitEmptyPasswords no
and then restart the sshd server.
$ sudo systemctl restart ssh
Then going back to the head node we need to generate our public/private key pair and distribute the public key to the compute node. Use a blank passphrase when asked.
$ ssh-keygen -t rsa -b 4096 -C "pi@cluster" Generating public/private rsa key pair. Enter file in which to save the key (/home/pi/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/pi/.ssh/id_rsa Your public key has been saved in /home/pi/.ssh/id_rsa.pub The key fingerprint is: SHA256:XdaHog/sAf1QbFiZj7sS9kkFhCJU9tLN0yt8OvZ52gA pi@cluster The key's randomart image is: +---[RSA 4096]----+ | ...o *+o | | ...+o+*o . | | .o.=.B++ .| | = B.ooo | | S * Eoo | | .o+o= | | ..+=o. | | ..+o +.| | . +o.| +----[SHA256]-----+ $ ssh-copy-id -i /home/pi/.ssh/id_rsa.pub pi@rpi1 /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/pi/.ssh/id_rsa.pub" /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys pi@rpi1's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'pi@rpi1'" and check to make sure that only the key(s) you wanted were added. $
Afterwards, you should be able to login to the compute node without having to type your password.
Access to the outside world
One thing our compute node doesn’t have right now is access to the LAN. Right now the compute node can only see the head node and eventually, once we add them, the rest of the compute nodes. But we can fix that! On the head node go and edit the /etc/sysctl.conf file by uncommenting the line saying,
net.ipv4.ip_forward=1
After activating forwarding we’ll need to configure iptables:
$ apt install iptables $ sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE $ sudo iptables -A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT $ sudo iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT $ sudo sh -c "iptables-save > /etc/iptables.ipv4.nat"
and then add a line — just above the exit 0 line — in the /etc/rc.local file a line to load the tables on boot:
_IP=$(hostname -I) || true if [ "$_IP" ]; then printf "My IP address is %s\n" "$_IP" fi iptables-restore < /etc/iptables.ipv4.nat exit 0
and reboot.
$ sudo reboot
Note that if you still have the compute node running, you should log on to that first and shut it down, as the root filesystem for that lives on a disk attached to our head node.
Adding the next compute node
Adding the rest of the compute nodes is going to be much more straightforward than adding our first node as we can now use our customised image and avoid some of the heavy lifting we did for the first compute node.
Go ahead and grab your SD card again and boot your next Raspberry Pi attached to the cluster switch.
Booting the second compute node.
The board should come up and be visible on the cluster subnet after it gets given an IP address by the head node’s DHCP server, and we can look at the cluster network from the head node using dhcp-lease-list.
$ dhcp-lease-list Reading leases from /var/lib/dhcp/dhcpd.leases MAC IP hostname valid until manufacturer =============================================================================================== dc:a6:32:6a:15:e2 192.168.50.21 raspberrypi 2021-12-08 21:15:00 Raspberry Pi Ltd $
We can now go ahead and SSH into the new board and again enable network booting for this board using raspi-config from the command line:
$ rm /home/pi/.ssh/known_hosts $ ssh pi@129.168.50.21 $ sudo raspi-config
choose “Advanced Options,” then “Boot Order,” then “Network Boot.” You’ll then need to reboot the device for the change to the boot order to be programmed into the bootloader EEPROM.
Once the Raspberry Pi has rebooted, check the boot order using vcgencmd:
$ vcgencmd bootloader_config BOOT_UART=0 WAKE_ON_GPIO=1 POWER_OFF_ON_HALT=0 [all] BOOT_ORDER=0xf21 $
which should now show that the BOOT_ORDER is 0xf21 which indicates that the Raspberry Pi will try to boot from an SD card first followed by the network. Before proceeding any further, we need to take a note of both the Ethernet MAC address and serial number of the Raspberry Pi.
$ ethtool -P eth0 Permanent address: dc:a6:32:6a:15:e2 $ grep Serial /proc/cpuinfo | cut -d ' ' -f 2 | cut -c 9-16 54e91338 $
Afterwards, you can shut down the board, at least for now, and remove the SD card.
Moving back to our head node we can use our already configured image as the basis of the operating system for the next compute node.
$ sudo su # mkdir -p /mnt/usb/rpi2 # cp -a /mnt/usb/rpi1/* /mnt/usb/rpi2 # mkdir -p /mnt/usb/tftpboot/54e91338 # echo "/mnt/usb/rpi2/boot /mnt/usb/tftpboot/54e91338 none defaults,bind 0 0" >> /etc/fstab # echo "/mnt/usb/rpi2 192.168.50.0/24(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports # exit $
Then we need to edit the /mnt/usb/rpi2/boot/cmdline.txt, replacing “rpi1” with “rpi2“:
console=serial0,115200 console=tty root=/dev/nfs nfsroot=192.168.50.1:/mnt/usb/rpi2,vers=4.1,proto=tcp rw ip=dhcp rootwait
and similarly for /mnt/usb/rpi2/etc/hostname.
rpi2
Finally, we need to edit the /etc/dhcp/dhcpd.conf file on the head node:
host rpi2 { option root-path "/mnt/usb/tftpboot/"; hardware ethernet dc:a6:32:6a:15:e2; option option-43 "Raspberry Pi Boot"; option option-66 "192.168.50.1"; next-server 192.168.50.1; fixed-address 192.168.50.12; option host-name "rpi2"; }
and reboot our head node.
$ sudo reboot
Afterwards, you should see both rpi1 and rpi2 are up and running. If you’re interested, we can get a better look at our cluster network by installing nmap on the head node.
$ sudo apt install nmap $ nmap 192.168.50.0/24 Starting Nmap 7.80 ( https://nmap.org ) at 2021-12-09 11:40 GMT Nmap scan report for cluster (192.168.50.1) Host is up (0.0018s latency). Not shown: 997 closed ports PORT STATE SERVICE 22/tcp open ssh 111/tcp open rpcbind 2049/tcp open nfs Nmap scan report for rpi1 (192.168.50.11) Host is up (0.0017s latency). Not shown: 999 closed ports PORT STATE SERVICE 22/tcp open ssh Nmap scan report for rpi2 (192.168.50.12) Host is up (0.00047s latency). Not shown: 999 closed ports PORT STATE SERVICE 22/tcp open ssh Nmap scan report for switch (192.168.50.254) Host is up (0.014s latency). Not shown: 999 filtered ports PORT STATE SERVICE 80/tcp open http Nmap done: 256 IP addresses (4 hosts up) scanned in 6.91 seconds $
Adding the rest of the nodes
The final Bramble
Adding the remaining five compute nodes is now more or less a mechanical process. You’ll need to follow the process we went through for rpi2 for rpi3, rpi4, rpi5, rpi6, and rpi7. Substituting the appropriate MAC address, serial number, and hostname for each of the new compute nodes.
Hostname | MAC Address | Serial Number |
rpi1 | dc:a6:32:6a:16:87 | 6a5ef8b0 |
rpi2 | dc:a6:32:6a:15:e2 | 54e91338 |
rpi3 | dc:a6:32:6a:15:16 | 6124b5e4 |
rpi4 | dc:a6:32:6a:15:55 | 52cddb85 |
rpi5 | dc:a6:32:6a:16:1b | a0f55410 |
rpi6 | dc:a6:32:6a:15:bb | c5fb02d3 |
rpi7 | dc:a6:32:6a:15:4f | f57fbb98 |
The compute nodes
When bringing the last compute node up I also went ahead and plugged the two remaining franken-cables into the final node to power the right-most fans in my case.
Controlling your Raspberry Pi cluster
Now we have all our nodes up and running, we need some cluster control tools. One of my favourites is the parallel-ssh toolkit. You can install this on the head node from the command line,
$ apt install pssh
and, along with the excellent Python library allowing you to build your own cluster automation, this will install a number of command-line tools; parallel-ssh, parallel-scp, parallel-rsync, parallel-slurp, and parallel-nuke. These tools can help you run and control jobs, and move and copy files, between the head node and the compute nodes.
To use the command line tools you’ll need to create a hosts file listing all the compute nodes, I saved mine as .ppsh_hosts in my home directory.
$ cat .pssh_hosts rpi1 rpi2 rpi3 rpi4 rpi5 rpi6 rpi7 $
After creating the file we can use the command line tools to, amongst other things, execute a command on all seven of our compute nodes.
$ parallel-ssh -i -h .pssh_hosts free -h [1] 12:10:15 [SUCCESS] rpi4 total used free shared buff/cache available Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 64Mi 3.7Gi Swap: 0B 0B 0B [2] 12:10:15 [SUCCESS] rpi1 total used free shared buff/cache available Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi Swap: 0B 0B 0B [3] 12:10:15 [SUCCESS] rpi2 total used free shared buff/cache available Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi Swap: 0B 0B 0B [4] 12:10:15 [SUCCESS] rpi7 total used free shared buff/cache available Mem: 3.8Gi 56Mi 3.7Gi 8.0Mi 97Mi 3.6Gi Swap: 0B 0B 0B [5] 12:10:15 [SUCCESS] rpi3 total used free shared buff/cache available Mem: 3.8Gi 55Mi 3.7Gi 16Mi 104Mi 3.6Gi Swap: 0B 0B 0B [6] 12:10:15 [SUCCESS] rpi5 total used free shared buff/cache available Mem: 3.8Gi 55Mi 3.7Gi 16Mi 72Mi 3.6Gi Swap: 0B 0B 0B [7] 12:10:15 [SUCCESS] rpi6 total used free shared buff/cache available Mem: 3.8Gi 55Mi 3.7Gi 8.0Mi 64Mi 3.7Gi Swap: 0B 0B 0B $
Although you should take note that the results will come back in a random order depending on how quickly the command was executed on each of the compute nodes.
Project Website: https://www.raspberrypi.com/tutorials/cluster-raspberry-pi-tutorial/