In the second of our five part seriesI’m going to explain how the technology actually works.
The artificial intelligences that power ChatGPT, Microsoft’s Bing chatbot, and Google’s Bard can carry on human-like conversations and write natural, flowing prose on an infinite variety of topics. They can also perform complex tasks, from writing code to planning a child’s birthday party.
But how does it all work? To answer that, we need to look under the hood of something called the big language model, the kind of AI that powers these systems.
Long Language Models, or LLMs, are relatively new to the AI scene. The first ones appeared only about five years ago, and they weren’t very good. But today they can compose emails, presentations and memos and give you classes in a foreign language. Even more capabilities are sure to emerge in the coming months and years, as technology improves and Silicon Valley strives to cash in.
I’ll walk you through setting up a large language model from scratch, keeping things simple and skipping a lot of the hard math. Let’s say we’re trying to build an LLM to help you answer your emails. We’ll call it MailBot.
Step 1: Set a goal
Every AI system needs a goal. Researchers call this a objective function. It can be simple, for example, “win as many chess games as possible”, or complicated, such as “predict the three-dimensional shapes of proteins, using only their amino acid sequences”.
Most large language models have the same basic objective function: Given a sequence of text, guess what comes next. We’ll give MailBot more specific goals later, but for now we’ll stick to that one.
Step 2 – Collect lots of data
Next, we need to gather the training data that will teach MailBot to write. Ideally, we’ll put together a colossally large text repository, which typically means billions of pages pulled from the Internet, such as blog posts, tweets, Wikipedia articles, and news.
A new generation of chatbots
A brave new world. A new crop of AI-powered chatbots has kicked off a fight to determine if the technology could change the internet economy, turning current powerhouses into past ones and creating the next industry giants. Here are the bots to know:
ChatGPT. ChatGPT, a research lab’s artificial intelligence language model, OpenAI, has been making headlines since November for its ability to answer complex questions, write poetry, generate code, plan vacations, and translate languages. GPT-4, the latest version released in mid-March, can even respond to images (and pass the uniform bar exam).
bing. Two months after ChatGPT’s debut, Microsoft, OpenAI’s main investor and partner, added a similar chatbot, capable of having open text conversations on virtually any topic, to its Bing Internet search engine. But it was the bot’s occasionally inaccurate, misleading, and bizarre responses that garnered much of the attention after its release.
Ernie. Search giant Baidu unveiled China’s first major challenger to ChatGPT in March. Ernie’s debut, short for Enhanced Rendering Through Knowledge Integration, turned out to be a flop after it was revealed that a promised “live” demo of the bot had been recorded.
To get started, we’ll use some free and publicly available data libraries, such as the Common Crawl web data repository. But we’ll also want to add our own secret sauce, in the form of proprietary or specialized data. We may license some foreign language texts, so MailBot learns to compose emails in French or Spanish in addition to English. In general, the more data we have and the more diverse the sources, the better our model will be.
Before we can feed the data into our model, we need to break it into units called tokens, which can be words, phrases, or even individual characters. Transforming text into bite-size chunks helps a model to analyze it more easily.
Step 3: Build your neural network
Once our data is tokenized, we need to assemble the “brain” of the AI, a type of system known as a neural network. This is a complex network of interconnected nodes (or “neurons”) that process and store information.
For MailBot, we’re going to want to use a relatively new type of neural network known as transformer model. They can analyze several pieces of text at the same time, making them faster and more efficient. (Transformer models are the key to systems like ChatGPT, whose full acronym stands for “Generative Pretrained Transformer.”)
Step 4: Train your neural network
The model will then analyze the data, token by token, identifying patterns and relationships. You may notice that “Dear” is often followed by a name, or that “Best regards” often comes before your name. By identifying these patterns, the AI learns to construct messages that make sense.
The system also develops a sense of context. For example, you might learn that “bank” can refer to a financial institution or to the side of a river, depending on the words around it.
As it learns these patterns, the transformative model draws a map: an enormously complex mathematical representation of human language. It keeps track of these relationships using numerical values known as parameters. Many of today’s best LLMs have hundreds of billions of parameters or more.
Training can take days or even weeks and will require a great deal of computing power. But once that’s done, you’re almost ready to start writing your emails.
Interestingly, you can also develop other abilities. As LLMs learn to predict the next word in a sequence, over and over again, they may gain other unexpected skills, such as knowing how to code. AI researchers call these emergent behaviors, and are sometimes still baffled by them.
Step 5: Adjust your model
Once a large language model is trained, it must be calibrated for a specific job. A chatbot used by a hospital might need to understand medical terms, for example.
To fine-tune MailBot, we could ask it to generate a bunch of emails, hire people to rate their accuracy, and then feed the ratings back into the model until it improves.
This is a rough approximation of the approach that was used with ChatGPT, which is known as reinforcement learning with human feedback.
Step 6 – Release, Carefully
Congratulations! Once MailBot has been trained and tuned, it’s ready to use. After you create some sort of user interface for it, like a Chrome extension that plugs into your email app, you can start generating emails.
But no matter how cool it looks, you’ll still want to keep up with your new assistant. As companies like Microsoft and Meta have learned the hard way, AI systems can be erratic and unpredictable, or even creepy and dangerous.
Tomorrow we’ll hear more about how things can go wrong in unexpected and sometimes disturbing ways.
Your homework
Let’s explore one of the most creative abilities of LLMs: the ability to combine disparate concepts and formats into something strange and new. For example, our colleagues at Well asked ChatGPT to “write a song featuring Taylor Swift’s vocals that uses themes from a Dr. Seuss book.”
For today’s assignment, try mixing and matching a format, style, and theme, such as, “Write a Snoop Dogg-style limerick about global warming.”
Don’t forget to share your creation as a comment.
Proof
Question 1 of 3
What is the main objective function of large language models like ChatGPT?
Begin the quiz by choosing your answer.
Glossary
-
Transformer model: A useful neural network architecture for understanding language, which doesn’t have to parse words one by one, but can look at a whole sentence at a time. A technique called self-attention allows the model to focus on the particular words that are important to understanding the meaning of the sentence.
-
Parameters: Numeric values that define the structure and behavior of a large language model, like clues that help you guess what words are coming next. Modern systems like GPT-4 are believed to have hundreds of billions of parameters.
-
Reinforced learning: A technique that teaches an AI model to find the best result by trial and error, receiving rewards or punishments from an algorithm based on its results. This system can be improved by humans giving feedback on its performance.
Click here for more glossary terms.