We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The story of artificial intelligence (AI) development over the past five years has been dominated by scale. Huge progress has been made in natural language processing (NLP), image understanding, voice recognition and more by taking strategies that were developed in the mid-2010s and putting more computing power and more data behind them. This has brought about an interesting power dynamic in the usage and distribution of AI systems; one that makes the AI look a lot like the electrical grid.
For NLP, bigger really is better
The current state-of-the-art in NLP is being powered by neural networks with billions of parameters trained on terabytes of text. Simply holding these networks in memory requires multiple cutting-edge GPUs, and training these networks requires supercomputer clusters well beyond the reach of all but the largest organizations.
One could, using the same techniques, train a significantly smaller neural network on significantly less text but the performance would be significantly worse. So much worse, in fact, that it becomes a difference in kind instead of just a difference of degree; there are tasks such as text classification, summarization and entity extraction at which large language models excel and small language models perform no better than chance.
As someone who has been working with neural networks for about a decade, I am genuinely surprised by this development. It’s not obvious from a technical standpoint that increasing the number of parameters in a neural network would lead to such a drastic improvement in capability. However, here we are in 2022, training neural networks nearly identical to architectures first published in 2017, but with orders of magnitude more compute, and getting better results.
This points to a new and interesting dynamic in the field. State-of-the-art models are too computationally expensive for nearly any company – let alone an individual – to create or even deploy. In order for a company to make use of such models, they need to use one created and hosted by someone else – similar to the way electricity is created and distributed today.
Sharing AI like it’s a metered utility
Every office building needs electricity, but no office building can house the required infrastructure to generate its own power. Instead, they get hooked up to a centralized power grid and pay for the power they use.
In the same way, a multitude of companies can benefit from integrating NLP into their operations, though few have the resources to build their own AI models. This is exactly why companies have created large AI models and made them available via an easy-to-use API. By offering a way for businesses to “hook up” to the proverbial NLP power grid, the cost of training these large-scale state-of-the-art models is amortized over various customers, thereby enabling them to access this cutting-edge technology, without the cutting-edge infrastructure.
To give a concrete example, let’s say a company that stores legal documents wants to display a summary of each document in its possession. They could hire a few law students to read and summarize each document alone, or they could leverage a neural network. Large-scale neural networks working in tandem with a law student’s workflow would drastically increase efficiency in summarization. Training one from scratch, though, would cost orders of magnitude more than it would to just hire more law students, but if said company had access to a state-of-the-art neural network via a network-based API, they could just hook up to the AI “power grid,” and pay for the summarization usage.
This analogy has some interesting implications if we follow it to its logical extreme. Electricity is a utility, like water and transportation infrastructure. These services are so crucial to the functioning of our society that in Ontario (from where I am writing) they are successfully maintained by crown corporations (owned and regulated by the federal or provincial governments). These crown corporations are responsible for not only infrastructure and distribution, but also evaluation and quality assurance, such as water-quality testing.
Regulating the use of AI is also key
Furthermore, just like electricity, this technology can be misused. It has also been shown to have several limitations and potential misuses. There has been a lot of scholarship on how these models can potentially cause harm via astroturfing and the propagation of biases. Given the way this technology is poised to fundamentally transform the way we operate, its governing body and regulation are important to consider. Several providers of these NLP APIs have recently released a set of best practices for deploying these models, but this is obviously just a first step, building on this previous work.
Andrew Ng famously said that “AI is the new electricity.” I believe he meant that it will power a wave of progress and innovation, becoming crucial to the functioning of our economy with the same scale impact as the introduction of electricity. The statement is perhaps a bit hyperbolic, but it may be more apt than I originally thought. If AI is the new electricity, then it will need to be enabled by a new set of power plants.
Nick Frosst is a cofounder at Cohere.