Join today’s leading executives online at the Data Summit on March 9th. Register here.
As large, powerful AI models like OpenAI’s text-generating GPT-3 come into wider use, the hardware traditionally used to train these models — mostly GPUs — will have to be replaced with more efficient solutions. In a 2020 study, MIT researchers found that reducing the error rate on a popular image classification benchmark, ImageNet, would require 105 more compute power versus two years ago. OpenAI has estimated, in fact, the amount of compute used to train AI systems has doubled at a rate seven times faster from 2012 to 2019 compared with the AI field’s nascent years (1959 to 2012).
With AI systems on current hardware costing millions of dollars to train and producing hundreds of tons of greenhouse gases — GPT-3 alone produced the equivalent of the yearly carbon dioxide emissions of 58 U.S. homes during training — the hunt is on for alternatives. One of the technologies that has gained currency is accelerator chips, which are purpose-built for the types of mathematical operations involved in training AI systems. For example, Google has invested heavily in its line of tensor processing units (TPU) — liquid-cooled chips that deliver over 100 petaflops of compute and power services like Google Search and Google Translate.
It’s not just Google. Among other startups, Graphcore is developing accelerator chips that it claims can speed up the training of AI systems while reducing power consumption. During a press briefing this week, the company detailed the latest generation of its “intelligence processing unit” (IPU) — the Bow — with up to 40% higher performance and 16% better power efficiency (in terms of performance-per-Watt) compared with the previous generation. Perhaps more intriguingly, Graphcore announced that it plans to develop an AI supercomputer, called the Good Computer, that the company claims will be among the most powerful of its kind when it comes online in 2024.
AI accelerators like Graphcore’s are designed to speed up particular types of AI applications including artificial neural networks, deep learning, and machine learning. They’re often multicore in design and focus on low-precision arithmetic or in-memory computing, both of which can boost the performance of large AI algorithms and lead to state-of-the-art results in natural language processing, computer vision, and other domains.
Graphcore’s Bow IPU is no different. A “3D wafer-on-wafer” processor, the Bow powers Graphcore’s new lineup of hardware systems for AI workloads, which includes the Bow Pod 16, Bow Pod 22, Bow Pod 64, Bow Pod 256, and Bow Pod 1024. The numbers in the product names correspond to the number of IPUs in the system; the Bow Pod 16 has 16 IPUs, for example.
“The new Bow-2000 IPU machine – the building block of every Bow Pod system — is based on the same robust systems architecture as our second-generation IPU-M2000 machine, but now with four powerful Bow IPU processors, delivering 1.4 petaflops of AI compute,” Graphcore explained in a blog post shared with VentureBeat earlier this week. “Fully backward compatible with existing IPU-Pod systems, the Bow-2000’s high speed, low-latency IPU fabric, and flexible … form factor all stay the same.”
The Bow’s 3D wafer-on-wafer design — which comes from chip manufacturer TSMC — connects chips on two silicon wafers, one stacked on top of the other, through a special connection called a through-silicon via. Stacking silicon might normally lead to problems with heat density, but Graphcore claims that the Bow IPU’s low power draw mitigates this.
“With wafer-on-wafer on the Bow IPU, two wafers are bonded together to generate a new 3D die: one wafer for AI processing which is architecturally compatible with the GC200 IPU processor with 1,472 independent IPU-Core tiles, capable of running more than 8,800 threads, with 900MB of in-processor memory, and a second wafer with power delivery die,” Graphcore said in the blog post. “By adding deep trench capacitors in the power delivery die, right next to the processing cores and memory, we are able to deliver power much more efficiently — enabling 350 teraflops of AI compute.”
Graphcore says that Bow-based Pod systems deliver up to 40% better performance in “a wide range” of applications versus Pod systems with the company’s second-generation IPU — specifically the IPU-M2000. Moreover, Graphcore claims that the Bow Pod 16 delivers over five times better performance than a comparable Nvidia DGX A100 system at around half the price. (DGX A100 systems start at $199,000.)
Graphcore’s partners began selling Bow Pod systems today. Early Bow Pod adopters include the U.S. Department of Energy’s Pacific Northwest National Laboratory (PNNL), which is using Graphcore’s accelerators for applications including cybersecurity and computational chemistry. Cloud service provider Cirrascale is also deploying the new Bow Pod systems, which it’s making available today to customers as part of its Graphcloud IPU bare metal service.
Sutanay Choudhury, codirector of PNNL’s Computational and Theoretical Chemistry Institute, said that Graphcore’s technology allowed the lab to “significantly” reduce both training and inferencing times from days to hours. “This speedup shows promise in helping us incorporate the tools of machine learning into our research mission in meaningful ways,” she added in a statement.
G-Core Labs, a cloud provider based in Europe, plans to launch Bow IPU-based cloud instances in Q2 2022.
The Bow IPU is designed to work with Graphcore’s bespoke Poplar, a graph toolchain optimized for AI and machine learning. It integrates with Google’s TensorFlow framework and the Open Neural Network Exchange (an ecosystem for interchangeable AI models), in the latter’s case providing a full training runtime. Preliminary compatibility with Facebook’s PyTorch arrived in Q4 2019, with full feature support following in early 2020.
The growing compute requirements of AI systems has given rise to a cohort of supercomputers designed specifically for AI training. Microsoft two years ago announced that it created a 10,000-GPU AI supercomputer running on its Azure platform primarily for the benefit of OpenAI. Nvidia has its own in-house supercomputer, Selene, that it uses for AI research including training natural language and computer vision models. And Meta (formerly Facebook) recently announced what it’s calling the AI Research SuperCluster, which the company claims will enable its AI models to learn from trillions of examples.
Not to be outdone, Graphcore hopes to build an AI supercomputer of its own using the next generation of its IPU technology. Called the Good Supercomputer in honor of Jack Good, the first person to describe a machine that could exceed the capacity of the human brain, Graphcore says that the supercomputer — if created — would deliver over 10 exaflops of AI floating point compute and up to 4 petabytes of memory with a bandwidth of over 10 petabytes per second. That’s over four times the performance (in exaflops) of Intel’s forthcoming Frontier supercomputer, which can theoretically hit 2.4 exaflops at peak. (An exaflop is equal to one quintillion floating-point operations per second.)
Graphcore believes that the Good Supercomputer could support AI models in sizes exceeding 500 trillion parameters, or about 25 times the size of GPT-3. In machine learning, parameters are the part of the model that’s learned from historical training data. The correlation between the number of parameters and sophistication has held up remarkably well, generally speaking.
The Good Supercomputer would, theoretically, support larger models than AI chip developer Cerebras’ recently announced system, which the company has said can power AI models with 120 trillion parameters. Cerebras’ system clusters 192 of the company’s C-2s together; Graphcore didn’t reveal how many of its next-generation IPUs the Good Supercomputer might contain, but said that the AI supercomputer would cost around $120 million to build depending on the configuration.
It’s not clear if the Good Supercomputer will ever come to fruition as described. Graphcore notes in a blog post that it’s “keen to engage” with collaborators to make the AI supercomputer a reality, but that it hasn’t yet secured the necessary partnerships.
Whether a machine of the Good Supercomputer’s compute capacity is even necessary is another matter. While some researchers argue that “scaling up” AI models is the only way to achieve more capable, human-like systems, others say that innovations at the algorithmic level could lessen — or even eliminate — the need for ultra-powerful hardware. For example, Alphabet-backed research lab DeepMind’s recent language model — RETRO — can beat others 25 times its size by using “external memory” techniques.
Still, Graphcore appears to be committed to the idea, with plans to provide updates on the Good Supercomputer “in the coming quarters.” We’ll report on new developments as they happen.
Graphcore, which was founded in 2016 by Simon Knowles and Nigel Toon, has raised over $682 million to date from Robert Bosch Venture Capital, Samsung, Dell Technologies Capital, BMW, Microsoft, and AI luminaries Arm cofounder Hermann Hauser and DeepMind cofounder Demis Hassabis. Its first commercial product was a 16-nanometer PCI Express card — C2 — that became available in 2018, and it’s this package that launched on Microsoft Azure in November 2019. (Microsoft is also using Graphcore’s products internally for various AI initiatives.)
Beyond Cerebras, Graphcore’s competitors include Intel-owned Habana Labs, whose chips for AI training power Amazon Web Services’ recently launched DL1 instances. SambaNova is another major competitor, having raised $676 million for its AI training and inferencing chips.