I regret building this $3000 Pi AI cluster

2025-09-19 10:28:00
www.jeffgeerling.com

I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago. In that time Raspberry Pi upgraded the CM4 to a CM5, so I ordered a set of 10 16GB CM5 Lite modules for my blade cluster. That should give me 160 GB of total RAM to play with.

This was the biggest Pi cluster I’ve built, and it set me back around $3,000, shipping included:

There’s another Pi-powered blade computer, the Xerxes Pi. It’s smaller and cheaper, but it just wrapped up its own Kickstarter. Will it ship in less than two years? Who knows, but I’m a sucker for crowdfunded blade computers, so of course I backed it!

But my main question, after sinking in a substantial amount of money: are Pi clusters even worth it anymore? I There’s no way this cluster could beat the $8,000, 4-node Framework Desktop cluster in performance. But what about in price per gigaflop, or in efficiency or compute density?

There’s only one way to find out.

Compute Blade Cluster Build

I made a video going over everything in this blog post—and the entire cluster build (and rebuild, and rebuild again) process. You can watch it here, or on YouTube:

But if you’re on the blog, you’re probably not the type to sit through a video anyway. So moving on…

Clustering means doing everything over n times

In the course of going from ‘everything’s in the box’ to ‘running AI and HPC benchmarks reliably’, I rebuilt the cluster basically three times:

First, my hodgepodge of random NVMe SSDs laying around the office was unreliable. Some drives wouldn’t work with the Pi 5’s PCIe bus, it seems, other ones were a little flaky (there’s a reason these were spares sitting around the place, and not in use!)
After replacing all the SSDs with Patriot P300s, they were more reliable, but the CM5s would throttle under load
I put these CM heatsinks on without screwing them in… then realized they would pop off sometimes, so I took all the blades out again and screwed them into the CM5s/Blades so they were more secure for the long term.

Compute Blade Cluster HPL Top500 Test

The first benchmark I ran was my top500 High Performance Linpack cluster benchmark. This is my favorite cluster benchmark, because it’s the traditional benchmark they’d run on massive supercomputers to get on the top500 supercomputer list.

Before I installed heatsinks, the cluster got 275 Gflops, which is an 8.5x speedup over a single 8 GB CM5. Not bad, but I noticed the cluster was only using 105 Watts of power during the run. Definitely more headroom available.

After fixing the thermals, the cluster did not throttle, and used around 130W. At full power, I got 325 Gflops, which is a 10x performance improvement (for 10x 16GB CM5s) over a single 8 GB CM5.

Compared to the $8,000 Framework Cluster I benchmarked last month, this cluster is about 4 times slower:

But the Pi cluster is slightly more energy efficient, on a Gflops/W basis:

But what about price?

The Pi is a little less cost-effective for HPC applications than a Framework Desktop running a AMD’s fastest APU. So discounting the fact we’re only talking CPUs, I don’t think any hyperscalers are looking to swap out a few thousand AMD EPYC systems for 10,000+ Raspberry Pis 🙂

But what about AI use cases?

Compute Blade Cluster AI Test

With 160 GB of total RAM, shared by the CPU and iGPU, this could be a small, efficient AI Cluster, right? Well, you’d think.

But no: currently llama.cpp can’t speed up AI using Vulkan on the Pi 5 iGPU. That means we have 160 GB of RAM, but only CPU-powered inference. On pokey Arm Cortex A76 CPU cores with 10 GB/sec or so of memory bandwidth.

A small model (Llama 3.2:3B), running on a single Pi, isn’t horrible; you get about 6 tokens per second. But that is pretty weak compared to even an Intel N100 (much less a single Framework Desktop):

You could have 10 nodes running 10 models, and that might be a very niche use case, but the real test would be running a larger AI model across all nodes. So I switched tracks to Llama 3.3:70B, which is a 40 GB model. It has to run across multiple Pis, since no single Pi has more than 16 GB of RAM.

Just as with the Framework cluster, llama.cpp RPC was very slow, since it splits up the model layers on all the cluster members, then goes round-robin style asking each node to perform its prompt processing, then token generation.

The Pi cluster couldn’t even make it to token generation (tg) on my default settings, so I had to dial things back and only generate 16 tokens at a time to allow it to complete.

And after all that? Only 0.28 tokens per second, which is 25x slower than the Framework Cluster, running the same model (except on AI Max iGPUs with Vulkan).

I also tried Exo and distributed-llama. Exo was having trouble even running a small 3B model on even a 2 or 3 node Pi cluster configuration, so I stopped trying to get that working.

Distributed llama worked, but only with up to 8 nodes for the 70B model. Doing that, I got a more useful 0.85 tokens/s, but that’s still 5x slower than the Framework cluster (and it was a bit more fragile than llama.cpp RPC—the tokens were sometimes gibberish):

You can find all my AI cluster benchmarking results in the issue Test various AI clustering setups on 10 node Pi 5 cluster over on GitHub.

Gatesworks and Conclusion

Bottom line: this cluster’s not a powerhouse. And dollar for dollar, if you’re spending over $3k on a compute cluster, it’s not the best value.

It is efficient, quiet, and compact. So if density is important, and if you need lots of small, physically separate nodes, this could actually make sense.

Like the only real world use case besides learning is for CI jobs or high security edge deployments, where you’re not allowed to run multiple things on one server.

That’s what Unredacted Labs is building Pi clusters for: they’re building Tor exit relays on blades, after they found the Pi was the most efficient way to run massive amounts of nodes. If your goal is efficiency and node density, this does win, ever so slightly.

But for 99% of you reading this: this is not the cluster you’re looking for.

Two years ago, when I originally ordered the Blades, Gateworks reached out. They were selling a souped up version of the Compute Blade, made to an industrial spec. The GBlade is around Pi 4 levels of performance, but with 10 gig networking, along with a 1 gig management interface.

But… it’s discontinued. It doesn’t look like any type of compute blade really lit the world on fire, and like the Blade movie series, the Compute Blade is more of a cult classic than a mainstream hit.

This is a bad cluster. Except for maybe blade 9, which dies every time I run a benchmark. But I will keep it going, knowing it’s definitely easier to maintain than the 1,050 node Pi cluster at UC Santa Barbera, which to my knowledge is still the world’s largest!

Before I go, I just wanted to give a special thanks to everyone who supports my on Patreon, GitHub, YouTube Memberships, and Floatplane. It really helps when I take on these months- (or years!) long projects.

Parts Used

You might not want to replicate my cluster setup — but I always get asked what parts I used (especially the slim Ethernet cables… everyone asks about those!), so here’s the parts list:

Source Link

Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!

Unlock unlimited streaming with a free Amazon Prime trial!
Sign up today!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo