What is the architecture of an AI Cluster and how does it differ from other distributed computing architectures? These are my notes, a work in progress.

Schwartz (or someone that advised him) articulated that there were three super-architectures, Virtualisation Hosts, Web Farm servers, and Super-computers. In my article on IS Strategy, on LinkedIn, (April 2020) I described them as

Today I describe the systems as business systems, community systems and High Performance Computing.

I hadn’t realised that it was that old. It would seem that now I have to add AI clusters as 4th super-architecture.

I asked ChatGPT, who now it seems serve AI with ads, who said

A supercomputer is engineered to solve physics equations about reality with extreme precision, while an AI cluster node is engineered to multiply giant matrices as fast as physically possible to train neural networks.

And

  1. “AI doesn’t usually need extreme numerical precision — it needs throughput.”
  2. They use different interconnects
  3. Power and power density are different, AI systems draw more power, with a higher density and thus require more expensive cooling.
  4. Different RAM and Chips

I looked for white papers, and found these

  1. https://insideainews.com/2019/11/27/supercomputers-and-machine-learning-a-perfect-match/
  2. https://www.ibm.com/think/topics/hpc
  3. https://www.glennklockwood.com/garden/HPC-vs-AI, Glenn works for VASTdata, previously at Microsoft.

In order to understand the software models, I asked gemini, , and it answered with this. More to read.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.