Editor's Image
Have you heard of ai/” rel=”noopener” target=”_blank”>Andrej Karpathy? He is a renowned computer scientist and artificial intelligence researcher known for his work on deep learning and neural networks. He played a key role in the development of ChatGPT at OpenAI and was previously Senior Director of ai at Tesla. Even before that, he designed and was the lead instructor for the first deep learning class. Stanford – CS 231n: Convolutional Neural Networks for Visual Recognition. The class became one of the largest at Stanford, growing from 150 enrolled in 2015 to 750 students in 2017. I highly recommend anyone interested in deep learning check this out on YouTube. I will not go into more details about him and we will focus our attention on one of his most popular talks on YouTube that he crossed 1.4 million views “Introduction to large language models.” This talk is a practical introduction to LLMs and is a must-see for anyone interested in LLMs.
I have provided a concise summary of this talk. If this piques your interest, I highly recommend checking out the slides and YouTube link that will be provided at the end of this article.
This talk provides a comprehensive introduction to LLMs, their capabilities, and the potential risks associated with their use. It has been divided into 3 main parts which are as follows:
Part 1: LLM
Slides by Andrej Karpathy
LLMs are trained on a large corpus of text to generate human-like responses. In this part, Andrej specifically looks at the Llama 2-70b model. It is one of the largest LLMs with 70 billion parameters. The model consists of two main components: the parameter file and the execution file. The parameter file is a large binary file that contains the model weights and biases. These weights and biases are essentially the “knowledge” that the model has learned during training. The run file is a piece of code used to load the parameter file and run the model. The model training process can be divided into the following two stages:
1. Pre-training
This involves collecting a large amount of text, around 10 terabytes, from the Internet and then using a GPU cluster to train the model with this data. The result of the training process is a base model which is Internet lossy compression. It is capable of generating coherent and relevant text but not answering questions directly.
2. Fine tuning
The pre-trained model is additionally trained on a high-quality data set to make it more useful. This results in a wizard model. Andrej also mentions a third stage of tuning, which involves the use of comparison labels. Instead of generating answers from scratch, the model is given multiple candidate answers and asked to choose the best one. This can be easier and more efficient than generating responses and can further improve model performance. This process is called reinforcement learning from human feedback (RLHF).
Part 2: Future of LLMs
Slides by Andrej Karpathy
While discussing the future of large language models and their capabilities, the following key points are discussed:
1. Scale law
Model performance is correlated with two variables: the number of parameters and the amount of training text. Larger models trained with more data tend to achieve better performance.
2. Use of tools
LLMs like ChatGPT can use tools like a browser, calculator, and Python libraries to accomplish tasks that would otherwise be challenging or impossible for the model alone.
3. System One and System Two Thinking in LLMs
Currently, LLMs predominantly employ system one thinking: fast, instinctive and pattern-based. However, there is interest in developing LLMs capable of engaging in system two thinking: slower, rational, and requiring conscious effort.
4. LLM SO
LLMs can be considered as the core process of an emerging operating system. They can read and generate text, have extensive knowledge on various topics, browse the Internet or consult local files, use existing software infrastructure, generate images and videos, listen and speak, and think for long periods using the system 2. The contextual window of an LLM is analogous to RAM in a computer, and the kernel process attempts to page relevant information in and out of its context window to perform tasks.
Part 3: LLM Security
Slides by Andrej Karpathy
Andrej highlights ongoing research efforts to address security challenges associated with LLMs. The following attacks are analyzed:
1. jailbreak
Attempts to bypass security measures on LLMs to extract harmful or inappropriate information. Examples include role-playing to trick the model and manipulate responses using optimized sequences of words or images.
2. Immediate injection
It involves injecting new instructions or prompts into an LLM to manipulate its responses. Attackers can hide instructions within images or web pages, leading to the inclusion of unrelated or harmful content in the model's responses.
3. Data poisoning/backdoor attack/sleeper agent attack
It involves training a large language model with malicious or manipulated data containing trigger phrases. When the model encounters the trigger phrase, it can be manipulated to perform undesirable actions or provide incorrect predictions.
You can watch the full video on YouTube by clicking below:
Slideshow: Click here
If you're new to LLMs and looking for resources to start your journey, this comprehensive list is a great place to start! It contains core and LLM-specific courses that will help you build a solid foundation. Additionally, if you are interested in a more structured learning experience, Maxime Labonne recently launched its LLM course with three different tracks to choose from depending on your needs and experience level. Here are the links to both resources for your convenience:
- A Comprehensive List of Resources for Mastering Large Language Models by Kanwal Mehreen
- Maxime Labonne's Large Language Model Course
Kanwal Mehreen is an aspiring software developer with a strong interest in data science and ai applications in medicine. Kanwal was selected as a Google Generation Scholar 2022 for the APAC region. Kanwal loves sharing technical knowledge by writing articles on trending topics and is passionate about improving the representation of women in the tech industry.