Facebook’s parent company, Meta, has announced the AI Research SuperCluster (RSC), which it believes is among the fastest AI supercomputers running today and will be the fastest supercomputer in the world once fully built out in mid-2022.
Facebook and Meta have been working on various AI technologies since 2013, including self-supervised learning, where algorithms can learn from vast numbers of unlabeled examples and transformers, which allow AI models to reason more effectively by focusing on certain areas of their input. To fully realize the benefits of advanced AI, various domains, whether vision, speech, language, will require training increasingly large and complex models.
Meta began the project of designing a new computing infrastructure in early 2020, stating, “developing the next generation of AI will require powerful supercomputers capable of quintillions of operations per second”. RSC will help Meta’s AI researchers build better AI models that can learn from trillions of examples; work across hundreds of different languages; analyze text, images and video together; develop new augmented reality tools and more.
The first generation of this infrastructure, designed in 2017, has 22,000 NVIDIA V100 Tensor Core GPUs in a single cluster that performs 35,000 training jobs a day. Up until now, this infrastructure set the bar for Meta’s researchers in terms of its performance, reliability, and productivity. However, Meta employees started afresh in 2020 with new GPU and network fabric technology. The aim was for the infrastructure to be able to train models with more than a trillion parameters on data sets as large as an exabyte, which is the equivalent of 36,000 years of high-quality video.
Early benchmarks on RSC, compared with Meta’s legacy production and research infrastructure, have shown that it runs computer vision workflows up to 20 times faster. It also runs the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. That means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.
Meta is in phase 2 of building RSC, with work leading up to its completion in mid-2022 aimed at increasing the number of GPUs from 6,080 to 16,000, which will increase AI training performance by more than 2.5x. The work done with RSC will ultimately pave the way toward building technologies for what the company believes is its “next major computing platform” — the metaverse.
You can find more information about Meta and RSC, which will soon be the world’s fastest supercomputer, on its website.
Stay up to date with the most recent automation, computer vision, machine vision and robotics news on Automate Pro Europe, CVPro, MVPro and RBPro.