I took six weeks off to raise a baby and everyone decided it was time to declare the AI revolution imminent. It’s hard not to take it personally.
The tick-tock of new developments, each more impressive than the last, and each arriving on the scene faster than the last, reached its apogee last week with the near-simultaneous announcement of Google’s Bard and Microsoft’s Bing Chat. Since then, there has been a possible permutation of the discourse, from millennial claims of an impending AI eschaton to dismissal of the entire field as glorified autocompletion.
I’m not here to settle that debate. Instead, if 2023 is the year AI changes everything, then early in that year is the time to dig a little deeper into what it is, how it works, and why it is what it is. And the best way to do that is to start talking about all those little terms that fall outside the mainstream because they’re “too high-tech.”
What the acronyms and key AI jargon really mean
Neural grid
Neural networks are the fundamental technology at the heart of the AI boom. Think of them as the equivalent of the steam engine in the first Industrial Revolution: a general-purpose technology that can reach and transform countless different industries and use cases.
First conceived in the 1940s, neural networks began as an effort to model animal brains, which are made up of millions of simple neurons, each connected to a few more. Each individual neuron is extremely simple, but quantity begets quality, and enough of them together can learn to perform complex tasks. And the same is true of artificial neural networks, although those neurons are purely algorithmic ideas rather than physical connections.
Like the steam engine, it took decades to understand the true power of the invention. A neural network only works with huge amounts of data and computing power, so they have been curiosities for the better part of the last 70 years. That changed around the turn of the millennium, and the AI era slowly began to appear.
LLM
A “big language model,” or LLM, is one of two major AI approaches that have led to the latest flurry of progress in the industry. Describes neural networks that are trained using large collections of text data, such as OpenAI’s GPT suite, Google’s PaLM, or Meta’s LLaMa. For example, PaLM uses “high-quality web documents, books, Wikipedia, conversations, and GitHub code” to develop an understanding of the language.
The question that an LLM is trying to answer is simple: given a short section of text, what comes next? But performing that task well is incredibly powerful. For one thing, it’s recursive. Once you’ve predicted what’s coming next, you have a new, slightly longer section of text, which you can feed back to the LLM and repeat the question, generating full sentences, paragraphs, articles, or books.
The question is also general purpose. Predicting what comes next for a small piece of actual English text is different from predicting what comes next for a small piece of code, a question, a poem, a couple of translated sentences, or a logic puzzle, but the same approach seems work quite well for all those tasks. The bigger the language model, the better the result: GPT-3 is 1500 times bigger than GPT-1, and it doesn’t seem like we’re close to discovering the limit.
HOWEVER
What LLMs have done for text, “generative antagonistic networks” have done for images, movies, music, and more. Strictly speaking, a GAN is two neural networks: one built to tag, categorize, and qualify, and the other built to create from scratch. By pairing them, you can create an AI that can generate content on demand.
Let’s say you want an AI that can take photos. First, you do the hard work of creating the tagging AI, one that can look at an image and tell you what’s in it, showing it millions of images that have already been tagged, until it learns to recognize and describe “a dog.” , “a bird”, or “a photograph of an orange cut in half, showing that its interior is that of an apple”. Then you take that program and use it to train a second AI to fool it. That second AI “wins” if it can create an image that the first AI will give the desired label to.
Once you’ve trained that second AI, you’ll have what you set out to build: an AI that you can tag and get a picture that you think matches the tag. or a song. or a video. EITHER a 3D model.
Calculate
Training a new AI model can be expensive. The final creation of GPT-3 required around $10 million of computing time, according to OpenAI research papers, and it was not said how many failed efforts it took before the final run went as expected. That hurdle, access to “computing” or computing power, means that large general-purpose tools like LLMs tend to be the competition of massive companies. already in 2018, OpenAI warned that the amount of computation used in AI training runs was doubling every three and a half months. A year later, for this reason, the company announced that it would cease to be a non-profit model due to the need to “invest billions of dollars in the next few years in large-scale cloud computing.”
The UK is a world leader in AI research, thanks to the ‘golden triangle’ of Oxford, Cambridge and London. But academics often have limited access to the amount of computing they need to work at the cutting edge, which has led to commercial profits being captured by US and Chinese corporate giants with billions to invest. That has led to calls for a government-owned “BritGPT”, built with public funds to provide the computing that UK researchers lack.
black box
Neural networks are often described as a “black box”: the more competent they get, the harder it is to figure out how they do what they do. GPT-3 contains 175 billion “parameters,” each of which describes how strongly or weakly one neuron affects another. But it is almost impossible to say what a given parameter does for the LLM as a whole.
Even the general structure of neural networks is a mystery. Sometimes, we can glimpse the order. The “T” in GPT stands for “Transformer,” a way of wiring up the neural network to allow it to mimic short-term memory, which obviously makes sense for something that involves reading a sentence verbatim. But other aspects of neural network design are more trial and error: for example, it seems that forcing a neural network to “squeeze” its thinking through a bottleneck of just a few neurons can improve the quality of the output. Because? We really don’t know. It just… does.
Fine tuning
Not everything requires training an AI model from scratch. You can think of the $10 million spent on GPT-3 as the cost of teaching an AI to read and write perfect English. But if all you want to do is develop an AI that can, for example, write good scientific papers, you don’t need to start from scratch when AIs that can read English already exist: you can instead “tune” those AIs on the specific data you want them to learn from, by teaching them hyper-specific skills for a fraction of the cost. But there’s a risk in doing so: that fine-tuning inevitably builds on initial training, which may not have been in your control.
Alignment
At one level, the “alignment” of the AI is a simple question: have we really trained the AI to do what we want it to do? If we want an AI that can predict which prisoners are likely to reoffend, but the AI uses racial profiling as a central part of its decision, we might describe it as “not aligned” with our wishes.
Sometimes the AI can be misaligned due to incorrect training data, which incorporates biases and inaccuracies. If an AI is trained to detect repeat offenders based on a data set of prisoners, for example, it will never know who isn’t sent to prison; If you’re trained to speak English with a data set that includes all of Twitter, you might start spouting idiosyncratic beliefs about the links between Bill Gates, 5G, and covid vaccines.
Other times, the AI may be misaligned because we’ve asked it the wrong question. An LLM is designed to predict what text comes next, but sometimes it doesn’t. In fact what we want: sometimes we prefer to have “true” answers than “probable”. Sometimes we prefer to have responses that don’t repeat racial slurs, threaten the user, or provide bomb-building instructions. But that’s not the question we asked the AI.
And sometimes the alignment is used to mean something more existential. Suppose you ask an AI to optimize your factory floor plan to maximize output per hour, and you decide that the most important thing you need to do is make sure no one interrupts production for the next billion years, so you hide in his plans technology that would kill all forms of organic life. on the planet, that would also be a non-aligned AI.
If you’d like to read the full version of the newsletter, sign up to get TechScape delivered to your inbox every Tuesday.