Open source research on large language models (LLMs) is incredibly valuable as it aims to democratize a powerful and influential technology. Although open source LLMs are now commonly used and widely studied, this area of research faced some initial struggles that were difficult to overcome. That is, open source LLMs performed poorly at first and were heavily criticized. Within this overview, we will study a line of research that changed this narrative by making high-performing, pre-trained LLMs available to all. Since pre-training a language model is very expensive, the models we will study here are especially impactful. Once these high-performance base models are created and published, many people could conduct research using these models at marginal additional cost.
“The capabilities of the LLMs are remarkable considering the seemingly simple nature of the training methodology.” — from (14)
The current series. This overview is the second part of a three-part series on the history of open source LLMs. He first part The series summarized initial attempts to create open source LLMs. Here, we will study the most popular open source base models (i.e. language models that have been pre-trained but not tuned or aligned) that are currently available. Next time we’ll look at how these models can be adjusted or aligned to create a variety of useful applications.
In the first part of this series, we saw that the early days of research on open source LLM resulted in the proposal of several important base models, such as OPT and BLOOM. However, these models were widely considered to have rather poor performance compared to closed-source pretrained models (e.g. GPT-3). How do we solve this? First of all, we must delve into the LLM training process.
Training channel. LLMs are trained in several steps, as shown in the figure below. First, we pre-train the model…