OpenAI has created a version of GPT-4, its latest text generation model, that can “remember” about 50 pages of content thanks to a greatly enlarged context window.
That might not sound significant. But that’s five times more information than the standard GPT-4 can hold in its “memory” and eight times more than GPT-3.
“The model can use long documents flexibly,” said Greg Brockman, co-founder and president of OpenAI, during a live demo this afternoon. “We want to see what kind of applications [this enables].”
When it comes to text-generating AI, the context window refers to the text that the model considers before generating additional text. While models like GPT-4 “learn” to type by training themselves on billions of examples of text, they can only consider a small fraction of that text at a time, determined primarily by the size of their context window.
Models with small context windows tend to “forget” the content of even very recent conversations, leading them to go off topic. After a few thousand words, they also forget your initial instructions and instead extrapolate their behavior from the latest information within your context window instead of the original request.
Allen Pike, a former Apple software engineer, colorful Explain this way:
“[The model] He will forget everything you try to teach him. She’ll forget you live in Canada. He will forget that you have children. He’ll forget you hate booking things on Wednesdays and please stop suggesting things on Wednesdays, dammit. If neither of you has mentioned his name for a while, he’ll forget that too. talk to a [GPT-powered] character for a while, and you can start to feel like you’re bonding with him, getting to a really cool place. Sometimes he gets a little confused, but that happens to people too. But eventually, the fact that he has no medium-term memory becomes clear, and the illusion is shattered.
We have not yet been able to get the version of GPT-4 with the extended context window, gpt-4-32k. (OpenAI says it is processing requests for the high-context and low-context GPT-4 models at “different rates depending on capacity.”) But it’s not hard to imagine how conversations with him could be so much more convincing than the ones before. generation model.
With a larger “memory”, GPT-4 should be able to chat relatively coherently for hours, even days, instead of minutes. And perhaps most importantly, it should be less likely to go off the rails. As Pike points out, one of the reasons chatbots like Bing Chat can be forced to misbehave is because their initial instructions (be a helpful chatbot, respond respectfully, etc.) are quickly removed from their context windows by prompts. and additional answers.
It can be a little more nuanced than that. But the context window plays an important role in grounding the models. definitely. Over time, we’ll see what kind of tangible difference it makes.