Image by author
In recent years, and especially since the emergence of ChatGPT just over 12 months ago, generative ai models for creating realistic synthetic text, images, video and audio have emerged and have advanced rapidly since then. What started as humble research quickly turned into systems with the ability to generate high-quality, human-like results through the various means mentioned above. Driven in particular by key innovations in neural networks and massive increases in computing power, more and more companies are offering free and/or paid access to these models whose capacity is increasing at a remarkable rate.
However, generative ai isn't all rainbows and puppies. While holding great promise for augmenting human creativity in a wide variety of applications, concerns remain about how to responsibly evaluate, test, and deploy these generative systems. There is particular unease related to the spread of misinformation, along with concerns about bias, truthfulness, and the social impacts this technology introduces.
However, the first thing we should do with any new technology is try to understand it before taking advantage of or criticizing it. Getting started is what we have planned for this article. We intend to lay out some key generative ai terms and do our best to make them understandable at an intuitive level for beginners, in order to provide an elementary foundation and pave the way for deeper learning in the future. On that note, for each key term below you will find links to related material to start researching further as you wish.
Now let's get started.
Natural language processing
Natural language processing (NLP) is a subfield of ai that focuses on enabling machines to understand, interpret, and generate human language by programmatically providing these machines with the tools necessary to do so. NLP bridges the gap between human communication and computer understanding. NLP first employed rule-based methods, followed by “traditional” machine learning approaches, while the most advanced NLP today is based on a variety of neural network techniques.
Neural networks
Neural networks are computational machine learning models inspired by (No replicas of) the human brain, used to learn from data. Neural networks consist of layers (many layers = deep learning) of artificial neurons that process and transmit small pieces of individual data, adjust this data to make it work, and repeatedly update the weights associated with the processing neurons in an attempt to “better fit.” ” to the data. to the function. Neural networks are essential to the learning and decision-making capabilities of today's ai. Without the deep learning revolution that began just over a decade ago, much of what we call ai would not have been possible.
Generative ai
Generative ai is a category of artificial intelligence, powered by neural networks, that focuses on the creation of new content. This content can take many forms, from text to images, audio, and more. This differs from “traditional” types of ai that focus on classifying or analyzing existing data, incorporating the ability to “imagine” and produce novel content based on training data.
Content generation
Content generation is the actual process in which trained generative models generate synthetic text, images, videos and audio, doing so with patterns learned from their training data, producing contextually relevant results in response to user input or prompts. These indications can also be in any of these mentioned forms. For example, text could be used as a message to generate more text, or to generate an image based on the text description, or a piece of audio or video instead. Likewise, an image could be used as a message to generate another image, text, video, etc. Multimodal messaging is also possible, where, for example, text and an image could be used to generate audio.
Large language models
Large language models (LLMs) are specialized machine learning models that are designed to process and “understand” human language. LLMs are trained with large amounts of text data, allowing them to analyze and replicate complex linguistic structures, nuances, and contexts. Regardless of the exact LLM model and the techniques used, the essence of these models is to learn and predict what the next word or token (group of letters) will be that follows the current one, and so on. LLMs are essentially incredibly complex “next word guessers”, and improving next word guessing is a very hot research topic right now, as you've probably heard.
Foundation models
Fundamental models are ai systems that have been designed with broad capabilities that can then be tailored for a variety of specific tasks. Fundamental models provide a foundation for creating more specialized applications, such as fine-tuning a general language model for specific chatbots, assistants, or additional generative functionality. However, fundamental models are not limited to language models and also exist for generation tasks such as images and videos. Examples of well-known and reliable fundamental models include GPT, BERT, and Stable Diffusion.
Parameters
In this context, parameters are numerical values that define the structure, operational behavior, and learning and prediction capacity of a model. For example, the billions of parameters in OpenAI's GPT-4 influence its word prediction and dialogue creation capabilities. More technically, the connections between each neuron in a neural network have weights (mentioned above), each of these weights being a single parameter of the model. The more neurons → the more weights → the more parameters → the more capacity a (well-trained) network will have to learn and predict.
Word embeddings
Word embeddings are a technique in which words or phrases are converted into numerical vectors of a predetermined number of dimensions, in an attempt to capture their meaning and contextual relationships in a multidimensional space of a much smaller size than would be required. for one time only. encode each word (or phrase) into a vocabulary. If you were to create a matrix of 500,000 words where each row was created for a single word and each column in that row was set to “0” except for a single column representing the word in question, the matrix would be 500,000 x 500,000 rows x columns, and be incredibly scarce. This would be a disaster for both storage and performance. By setting the columns to various fractional values between 0 and 1 and reducing the number of columns to, say, 300 (dimensions), we have a much more focused storage structure and inherently increase the performance of the operation. As a side effect, by learning these dimensional embedding values using a neural network, similar terms will be “closer” in dimensional values than dissimilar terms, giving us information about the relative meanings of the words.
Transformer models
Transformative models are artificial intelligence architectures that simultaneously process entire sentences, which is crucial for understanding language context and long-term associations. They excel at detecting relationships between words and phrases, even when they are far apart in a sentence. For example, when “she” is stated at the beginning of a text fragment as a noun and/or pronoun that refers to a particular individual, transformers can “remember” this relationship.
Positional coding
Positional coding refers to a method in transformative models that helps maintain the sequential order of words. This is a crucial component of understanding context within a sentence and between sentences.
Reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) refers to an LLM training method. Like traditional reinforcement learning (RL), RLHF trains and uses a reward model, although this comes directly from human feedback. The reward model is then used as a reward function in training the LLM by using an optimization algorithm. This model explicitly keeps humans informed during model training, with the hope that human feedback can provide the essential and perhaps unattainable feedback needed for optimized LLMs.
Emergent behavior
Emergent behavior It refers to the unexpected abilities that large, complex language models display, abilities that are not displayed in simpler models. These unexpected skills may include skills like coding, music composition, and fiction writing. These abilities are not explicitly programmed into the models, but rather emerge from their complex architectures. However, the issue of emerging skills may go beyond these more common skills; for example, it is Theory of mind An emergent behavior?
Hallucinations
Hallucinations is the term given when LLMs produce objectively incorrect or illogical answers due to limitations in the data and architecture. Despite the advanced capabilities of the model, these errors can still occur when queries are encountered that have no basis in the model's training data and when a model's training data consists of incorrect or non-objective information.
Anthropomorphism
Anthropomorphism is the tendency to attribute human qualities to ai systems. It is important to note that, despite their ability to imitate human emotions or speech and our instinct to think of models as “he” or “she” (or any other pronoun) instead of “it”, communication systems ai They do not have feelings or consciousness.
Inclination
Bias is a complex term in ai research and can refer to several different things. In our context, bias refers to errors in ai results caused by biased training data, leading to inaccurate, offensive or misleading predictions. Bias arises when algorithms prioritize irrelevant data features over meaningful patterns, or lack meaningful patterns altogether.
Matthew May (@mattmayo13) has a master's degree in computer science and a postgraduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging ai. He is driven by the mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.