Log In

Large Language Models and Machine Learning: The Way Ahead

Published 1 day ago5 minute read

Three years ago, the introduction of ChatGPT took the world by storm. All of a sudden, everyone could ask questions and have dialogues in their natural language with a computational system trained on the world’s knowledge and the answers were indistinguishable from those an expert would give. This tool made artificial intelligence and machine learning usable by everyone and presented a “future shock” (Toffler, 1970) to our society. In this special issue, we are discussing how this “future shock” affects the field of science as well as our society at large.

Interestingly, the beginnings of this future shock date back many years before ChatGPT, and even though it has changed our society in profound ways, there are still many challenges ahead. In 1992, when I began studying computer science, neural network algorithms were already a topic of instruction at universities and an active area of research. However, practical implementation of these algorithms was highly challenging for students and researchers due to several limitations, including insufficient computational power, a lack of real-world training data sets, and the absence of well-maintained software libraries for training and deploying neural networks. At the time, the Stuttgart Neural Network Simulator1 (SNNS) was the de facto standard, but it never achieved widespread distribution or adoption.

Fast forward to 2022 when specialized hardware for fast matrix- and tensor-operations—so called graphics processing units (GPUs) and tensor processing units (TPUs)—have become a commodity, when the volume of training data reaches tens of terabytes,2 and when there is a vast array of well-supported software libraries such as PyTorch and TensorFlow. This is the world in which ChatGPT entered as a large language model (LLM) that was trained on extensive data sets comprising both textual and image data from across the Internet and made accessible to the public.

The primary shift between 1992 and the present is the nature of the interface for deriving intelligent predictions from data. Previously, researchers engaged with linear algebra and low-level software implementations to encode training data as feature vectors for input into systems such as SNNS. Today, the interface has evolved to a more intuitive form, where practitioners can interact with AI systems through natural language prompts and receive textual outputs. This process, known as ‘zero-shot learning,’ has dramatically expanded the number of AI practitioners and has led to a shift in focus from academic AI research to widespread AI application in society and industry.

Despite the disruptive impact of this transformation, one critical issue has gained prominence: energy consumption. Whereas AI research 30 years ago focused primarily on predictive accuracy, the applicability of modern LLMs is increasingly constrained by the energy required both to train these models and to make predictions. For instance, training a model like GPT-3 demands approximately 1.2 GWh of energy, and generating two image predictions via LLMs consumes energy equivalent to charging a modern smartphone (Luccioni, Jernite, & Strubell, 2024; Luccioni, Viguier, & Ligozat, 2022). Although the adoption of these models has become more accessible to the general public, they require three to four orders of magnitude more energy than models from the 1990s. This presents a significant technical barrier, particularly for tasks of smaller scale or for industries with thin profit margins (e.g., search and advertising).

Looking ahead, several technical challenges remain for LLMs. While the foundational algorithms have been established since the early 1990s, and computational power, data availability, and software have all become abundant, current LLMs still face notable limitations, particularly the issue of ‘hallucinations.’ Hallucinations refer to the generation of factually incorrect responses without a quantification of uncertainty. This issue arises from the inherent structure of LLMs, which represent both queries and responses as sequences of tokens (i.e., variable-length sequences of letters and symbols). These models predict the likelihood of each token in a fixed dictionary based on preceding tokens, constructing text from the most probable sequences. However, current LLMs only account for one form of uncertainty—related to the token sequence's representation of human text (so called aleatoric uncertainty). A second form, known as epistemic uncertainty, which stems from insufficient training data, is not currently modeled. For example, GPT-3 has approximately the same number of parameters as its training data set size, implying, from an information-theoretic standpoint, that there is insufficient data to reliably estimate all parameters.

In conclusion, LLMs have fundamentally transformed the interface for obtaining accurate predictions, shifting from complex linear algebra and feature engineering to a more accessible paradigm of prompt engineering. This shift is a primary factor behind their widespread adoption in recent years. However, from a machine learning perspective, substantial work remains, including reducing the energy consumption of these models through new architectures and learning algorithms and addressing uncertainty quantification related to epistemic uncertainty. These challenges must be resolved to sustain and extend the impact of LLMs on society.

Ralf Herbrich has no financial or nonfinancial disclosures to share for this editorial.

Luccioni, A. S., Jernite, Y., & Strubell, E. (2024). Power hungry processing: Watts driving the cost of AI deployment? ArXiv. https://doi.org/10.48550/arXiv.2311.16863

Luccioni, A. S., Viguier, S., & Ligozat, A.-L. (2022). Estimating the carbon footprint of BLOOM, a 176B parameter language model. ArXiv. https://doi.org/10.48550/arXiv.2211.02001

Toffler, A. (1970). Future shock. Bantam Books.

©2025 Ralf Herbrich. This editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the editorial.

Origin:
publisher logo
Harvard Data Science Review
Loading...
Loading...
Loading...

You may also like...