Should AI Be Scaled Down?
The case for maximizing AI models’ efficiency—not size
The case for maximizing AI models’ efficiency—not size
A driving force of the competition between AI companies is the belief that bigger is better. GPT-4, the model that powers the most advanced version of ChatGPT, contains an estimated 1.8 trillion parameters, or variables that determine how it responds to inputs. That’s six times more than the 175 billion parameters possessed by its predecessor, GPT-3—and 1,200 times more than those contained in GPT-2. The datasets used to train these models also continue to grow. OpenAI used 570GB of Internet text data to train GPT-3—a massive expansion beyond the 8 million web pages used to train GPT-2.
Bigger large language models (LLMs) have gotten better at producing humanlike, coherent, and contextually appropriate text. But this improvement has come with costs. Researchers estimated that the training process for GPT-3 consumed roughly the amount of energy consumed by 120 American households over the course of a year. An October study projected that by 2027, the AI sector could have an annual energy consumption roughly equivalent to that of the Netherlands.
The exponential growth of LLMs’ size—and energy consumption—isn’t likely to stop any time soon. As OpenAI, Google, Meta, and other companies race to develop better models, they’ll probably rely on expanding datasets and adding more parameters. But some researchers question the rationale behind this competition. At the first session of the Harvard Efficient ML Seminar Series, researcher Sara Hooker likened the pursuit of larger and larger models to “building a ladder to the moon”: costly and inefficient, with no realistic endpoint.
“Why do we need [these models] so big in the first place?” asks Hooker, head of the nonprofit research group Cohere For AI. “What is this scale giving us?” Some benefits of size can also be achieved through other techniques, researchers say, such as efficient parameterizations (activating only the relevant parameters for a given input) and meta-learning (teaching models to learn independently). Instead of pouring immense resources into the pursuit of ever-bigger models, AI companies could contribute to research developing these efficiency methods.
One draw of bigger models is that they seem to develop “emergent properties,” or behaviors that weren’t explicitly programmed into a system: the sudden ability to produce multilingual responses, for instance, or complete math problems. When these abilities emerged from LLMs, it “was mind-blowing to a lot of people in the field,” says seminar lead organizer and AI researcher at Harvard Medical School Jonathan Richard Schwarz. Before LLMs, previous AI models had been explicitly programmed to learn from experience, such as image recognition systems that can recognize new categories after being shown fewer than a dozen examples. But LLMs did not contain algorithms that allowed them to learn and adapt; they were simply trained to predict Internet text. The fact that they seemed to be learning from experience anyway was exciting—and many assumed this was happening because of their sheer size.
© 2024 by the President and Fellows of Harvard College