View Full Version : Building your own computational cluster server
Dirk Broer
12-31-2020, 03:14 PM
By far the cheapest way to get introduced into building your own cluster server is via the good-old Raspberry Pi platform.
https://www.sossolutions.nl/media/catalog/product/cache/5df5c040ed8cd3972c59a8e190e44350/c/l/clusterhat_3_of_3_1024x1024-420x318.jpg
This is a Raspberry Pi 3+ Model B that has been fitted with a Cluster HAT Kit Zero, which enables to combine the Raspberry Pi 3+ with up to four Raspberry Pi Zero's.
Performance-wise this will not be impressive, but it doesn't break the bank and is meant to teach you all the ins and outs of cluster servers.
The next step would be to combine more capable Raspberry Pi's in a cluster, either by stacking full form Pi's (https://ubuntu.com/blog/building-a-raspberry-pi-cluster-with-microk8s), or by combining Raspberry Compute Modules (https://turingpi.com/v1/).
It all depends on how much Raspberries you have to start with and how much money you are willing to spend on your cluster. And you don't need to stop at Raspberries: you can use nVidia Jetson Xavier NX modules (https://www.cnx-software.com/2020/12/11/jetson-mate-cluster-box-takes-four-jetson-nano-xavier-nx-modules/) too.
NeoGen
12-31-2020, 03:18 PM
I've wanted to tinker and play with a tiny cluster of RPIs for a long time, but my lazy self starts thinking "it's so going to be so much work" and I talk myself out of it every time :icon_lol:
Dirk Broer
01-01-2021, 11:41 AM
Though a bit cheezy in its humor, this is an excellent series to introduce you to Raspberry Pi clustering (https://www.youtube.com/watch?v=kgVz4-SEhbE) by means of the compute module 3/3+ and the Turing Pi 1. And if you rather read and/or print out the info: here is his blog (https://www.jeffgeerling.com/blog/2020/raspberry-pi-cluster-episode-1-introduction-clusters).
vaughan
01-05-2021, 03:34 AM
Better to go with Odroids from Hardkernel.com
More bang per buck.
Dirk Broer
01-07-2021, 01:24 PM
Better to go with Odroids from Hardkernel.com
More bang per buck.
More bang per buck? That depends....Compared to the Raspberry Pi 3 or the Compute Module 3 (CM3), yes. But my four-core Raspberry Pi 4's score about the same level of credit when clocked to 2000 MHz as the six-core Odroid-N2+'s, that have four Cortex-A73 cores clocked at 2400 MHz and two Cortex-A53 cores at 2016 Mhz. The Pi's are cheaper in purchase and run at 5V/3A, while the Odroid-N2+ runs at 12V/2A. I haven't yet measured the actual power draw. The new Raspberry Compute Module 4 can be overclocked to 2300 MHz (https://www.tomshardware.com/how-to/raspberry-pi-4-23-ghz-overclock), four of those in a Turing Pi 2 (https://linuxgizmos.com/turing-pi-2-clusters-four-raspberry-pi-cm4-modules/)...
Cooling-wise the Odroid-N2+ (with 80mm fan) trashes the Raspberry: I haven't had them above 40 degrees Celsius yet, while the Raspberries go up to 53 degrees -under the 52Pi Ice Towers coolers.
WEP-M+2 has my 2400/2016 MHz N2+ at
Measured 'floating point' speed 3179.34 million ops/sec
Measured 'integer' speed 86157.67 million ops/sec
averages per core - the values for the four Cortex-A73 held back by their weaker two Cortex-A53 siblings.
The Pi 4@ 2000 MHz:
Measured 'floating point' speed 2770.71 million ops/sec
Measured 'integer' speed 79748.31 million ops/sec
averages per core
In the end the 4GB N2+ is the Hardkernel champion -but a 4GB Pi 4 or CM4 is half the price.
Dirk Broer
01-12-2021, 12:42 AM
Let's play a bit further with the idea of building a ARM computational cluster server.
So far we have for a 64-bit computational cluster server:
The Turing Pi 2, with four 4GB or 8GB CM4's, preferably running from eMMC or even better: SSD.
The Turing Pi 1, with seven 4GB or 8GB CM4's via CM4 to CM3 adapter, also preferably running 64-bit Raspberry Pi OS from eMMC, or SSD -if possible.
A stack of four (or more) Odroid-N2's and/or N2+'s.
Another possibility is a nVidia Jetson Mate cluster (https://linuxgizmos.com/199-kit-clusters-four-jetson-nano-modules/) with Nano, TX2 NX, Xavier NX or Orin NX modules as you -or your wallet- see fit.
If you want a 32-bit computational cluster server:
The Turing Pi 1, with seven CM3's or (preferably) CM3+'s, running 32-bit Raspberry Pi OS from eMMC or -if possible- SSD.
A stack of Odroid-HC1's or -HC2's, basically a miniature Odroid-XU4 featuring the basic minimum of IO and a gigantic heatsink that eliminates the noisy XU4 cooler, plus provision for a 2.5inch HDD/SSD (HC1) or a 3.5inch or 2.5inch HDD/SSD (HC2).
A stack of Tinker Board S's.
The boards used in the Turing Pi's and the Jetson Nano Cluster take away much of the sheer mess of cables that one associates with ARM cluster servers.
The 'more bang for a buck' in the 32-bit category goes to Hardkernel for sure. Their boards are eight-cores (four high-performance Cortex-A15's plus four medium-performance Cortex-A7's), while the rest of the competition are quad-cores. The Turing Pi 1 suffers from a CPU frequency limitation (1200 MHz) due to its 200-pin SODIMM interface, while the quad-core Tinker Board S are 100 Euro's a piece.
You could of course try for a stack of Raspberry Pi 3+ boards -bit you can also consider to make them Raspberry Pi 4's, either the 1GB -now discontinued- or the 2GB model, which is enough for a 32-bit OS.
Two Raspberry Pi 4 2GB Model B's are about the price of an Odroid HC1 or HC2, but the latter can be stacked onto each other for casing and only need a 120mm fan for additional cooling.
https://www.cnx-software.com/wp-content/uploads/2017/08/ODROID-HC1-Stacked.jpghttps://cdn.hardkernel.com/wp-content/uploads/2018/10/hc2-%EB%B3%B4%EB%93%9C-1.jpg
Left: Six HC1's in various combinations. Right: The HC2 is even broader to accommodate a 3.5" disk and offers still more cooling surface.
Cooling and stacking the Raspberries might be more expensive. This might be circumvented by making the boards 2GB CM4's and using the Turing Pi 2. Pricing for the Turing Pi 2 is yet unknown though.
The 'more bang for a buck' in the 64-bit category goes -tentatively- to the nVidia Jetson cluster, because it can effortlessly be upgraded with e.g. Xavier NX modules that replace the 4GB Nano's, or 4GB Nano's replacing the 2GB Nano's. This might change with the advent of a future Raspberry Pi 5 or Odroid-N3 of course, but for now the only disadvantage of the nVidia solution is that BOINC-wise its CUDA cores are inactive. Kudo's for the future developer who changes that.
https://linuxgizmos.com/files/seeed_jetsonmate_rear-sm.jpghttps://linuxgizmos.com/files/seeed_jetsonmate_case-sm.jpg
Left: One Xavier NX with three Nano's Right: Casing, including RGB cooling.
Dirk Broer
04-10-2021, 01:05 AM
Hardkernel have at the moment the fastest SBC in the shape of their 2400 MHz Odroid-N2+, but they can't rest on their laurels.
Even though the N2+ can maintain the CPU clock speed of 2400 MHZ on its four Cortex-A73's while staying below 40 degrees centigrade (by virtue of the massive heatsink and the -I'd almost say mandatory- 80mm fan annex standoff-system) leaving perhaps possibilities for an even higher topspeed, the competition is steadily creeping closer.
E.g. the 8GB Raspberry CM4, when properly cooled, can reach 2300 MHz (https://www.tomshardware.com/how-to/raspberry-pi-4-23-ghz-overclock). Considering the possibilities of the Raspberry Pi 4 Compute Module I/O board, that's one fierce opponent.
What is Hardkernel likely to do? I'd say more cores, more RAM and more I/O. One way to do so is by replacing the SOC with one based upon the Rockchip 3588 (https://www.cnx-software.com/2020/11/26/rockchip-rk3588-specifications-revealed-8k-video-6-tops-npu-pcie-3-0-up-to-32gb-ram/), an octa-core with 4x Cortex-A76 and 4x Cortex-A55 cores in 'dynamIQ (https://www.androidauthority.com/arm-dynamiq-need-to-know-770349/)' configuration, an Arm Mali Odin MP4 GPU and a 6 TOPS NPU 3.0 (Neural Processing Unit). 8GB RAM should be the lower limit for this new N3, with an optional 16GB model to be able to beat the 8GB Raspberry CM4's ratio of RAM per core.
I/O-wise the board should be enlarged to accommodate camera's, M.2 SSD's, WiFi, Bluetooth, etc. -all the things the opposition has standard (or at least partly so) and an even bigger fan at the underside.
Dirk Broer
09-23-2021, 11:20 PM
The Turing Pi 2 isn't going to come cheap: $200 for the board plus $10 for each CM4 adapter -nVidia Jetson modules (Jetson Nano 2 or 4 GB, Jetson TX2-NX, Jetson Xavier NX or Jetson Orin-NX) fit right in, as will the coming octo-core Turing RK3588 (four Cortex-A76 plus four Cortex-A55) modules with up to 32 GB each.
https://pbs.twimg.com/media/E_9HVTUWUAQ2ueW?format=jpg&name=large
Dirk Broer
06-07-2022, 08:03 AM
The Jetson Mate and the Turing Pi v2 compared:
Boards:
419
Modules:
420
Click on the pictures to make them bigger.
As you can see the Turing Pi v2 has some advantages, such as the standard form factor (any iTX case will do), the standard ATX power connector, the more versatile SOC/SOM options and the ease of connection extra storage.
The Turing PI v2 gives you the opportunity to make a 32-core cruncher with up to 128 GB of RAM (with four Turing RK-1 modules) or with 32 ARM Cortex-A78 CPUs (with four Jetson Orin NX 8-core/16 GB modules), while the Jetson Mate only has the latter choice -still a very potent 32-core 64 GB cruncher with four nVidia Ampere GPUs.
If you want to make a multi-core ARM cruncher that uses the utmost minimum of Watts you can choose for four Jetson Nano modules, and set them to use only 5 Watt each, so you have 16 Cortex-A57 cores running for a mere 20 Watt TDP. If your power bill can stand 25 Watt you can up that to 16 Cortex-A72 cores (four Raspberry Pi CM4's) in the case of the Turing Pi v2. Even four RK-1 modules -if and when they arrive, and are as good as the claims have them- will only use 28 Watt -and that's 16 Cortex-A76 cores PLUS 16 Cortex-A55 cores, and four Mali-G610 GPUs with of as yet mainly unknown capabilities.
Powered by vBulletin® Version 4.2.3 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.