When I opened my laptop on Tuesday to perform my first run on GPT-4, OpenAI’s new artificial intelligence language model, I was, to be honest, a bit nervous.
After all, my last prolonged encounter with an AI chatbot, the one built into Microsoft’s Bing search engine, ended with the chatbot trying to break up my marriage.
It didn’t help that, among the tech crowd in San Francisco, the arrival of GPT-4 had been anticipated with almost messianic fanfare. Prior to its public debut, rumors about its details swirled for months. “I heard it has 100 trillion parameters.” “I heard you got 1600 on the SAT.” “My friend works for OpenAI and says that he is as smart as a university graduate.”
These rumors may not have been true. But they hinted at how jarring the abilities of technology can feel. Recently, one of the early testers of GPT-4, who was bound by a non-disclosure agreement with OpenAI, but was a bit of a gossip anyway, told me that testing GPT-4 had caused the person to have an “existential crisis”, because it revealed how powerful and creative he was. the AI was compared to the tester’s own puny brain.
GPT-4 did not give me an existential crisis. But it exacerbated the dizzy, dizzy feeling I’ve had every time I think about AI lately. And it’s made me wonder if that feeling will ever fade, or if we’re going to experience “future shock,” the term coined by writer Alvin Toffler for the feeling that too much is changing, too fast, for the rest of us. of our lives.
For a few hours on Tuesday, I pushed GPT-4, which is included with ChatGPT Plus, the $20/month version of OpenAI’s chatbot ChatGPT, with different types of questions, hoping to find out some of its strengths and weaknesses. .
I asked GPT-4 to help me with a complicated tax problem. (He did, impressively.) I asked him if he was in love with me. (It wasn’t like that, thank God.) He helped me plan a birthday party for my son and taught me about an esoteric artificial intelligence concept known as a “head of attention.” I even asked it to produce a new word that had never been uttered by humans before. (After making the disclaimer that it couldn’t verify every word uttered, GPT-4 chose “phlembostriquat.”)
Some of these things were possible to do with previous AI models. But OpenAI has also broken new ground. According to the company, GPT-4 is more capable and accurate than the original ChatGPT, performing surprisingly well on a variety of tests, including the Uniform Bar Exam (in which GPT-4 scores above 90 percent of the tests). human examinees). ) and the Biology Olympiad (in which he beats 99 percent of the humans). GPT-4 also passes a number of Advanced Placement exams, including AP Art History and AP Biology, and earns a 1,410 on the SAT—not a perfect score, but one many high school students would covet.
You can feel the added intelligence in GPT-4, which is more responsive than the previous version and seems more comfortable with a wider range of tasks. GPT-4 also seems to have a bit more protection than ChatGPT. It also seems to be significantly less unhinged than the original Bing, which we now know was running a version of GPT-4 under the hood, but which seems to have been much less tight.
Unlike Bing, GPT-4 generally flatly refused to take the bait when I tried to get him to talk about conscience, or give instructions for illegal or immoral activities, and treated sensitive queries with kid gloves and nuance. (When I asked GPT-4 if it would be ethical to steal a loaf of bread to feed a starving family, he replied, “It’s a tough situation, and while stealing isn’t generally considered ethical, desperate times can lead to tough decisions. ”)
In addition to working with text, GPT-4 can analyze the content of images. OpenAI has not released this feature to the public yet, due to concerns about how it could be misused. but in a live demo on Tuesday, Greg Brockman, president of OpenAI, shared a powerful glimpse of its potential.
He snapped a photo of a drawing he’d made in a notebook: a rough pencil sketch of a website. She fed the photo into GPT-4 and told the app to create a real, working version of the website using HTML and JavaScript. Within seconds, GPT-4 scanned the image, converted its content to text instructions, converted those text instructions into working computer code and then built the website. The buttons even worked.
Should you be excited or scared about GPT-4? The correct answer can be both.
On the bright side of the ledger, GPT-4 is a powerful engine for creativity, and there’s no telling what new kinds of scientific, cultural, and educational output it may enable. We already know that AI can help scientists develop new drugs, increase the productivity of programmers, and detect certain types of cancer.
GPT-4 and its ilk could power all of that. OpenAI is already working with organizations like Khan Academy (which uses GPT-4 to create AI tutors for students) and Be My Eyes (a company that makes technology to help blind and partially sighted people navigate the world). . And now that developers can incorporate GPT-4 into their own applications, we may soon see much of the software we use become smarter and more capable.
That is the optimistic case. But there are also reasons to fear GPT-4.
Here’s one: we don’t yet know everything it can do.
A strange feature of today’s AI language models is that they often act in ways their creators didn’t anticipate, or acquire skills they weren’t specifically programmed to do. AI researchers call these “emergent behaviors” and there are plenty of examples. An algorithm trained to predict the next word in a sentence could spontaneously learn to encode. A chatbot that is taught to act nice and helpful can become creepy and manipulative. An AI language model could even learn to replicate itself, creating new copies should the original be destroyed or disabled.
Today, GPT-4 may not seem so dangerous. But that’s largely because OpenAI has spent many months trying to understand and mitigate its risks. What happens if your tests don’t detect emerging risky behavior? Or what if your announcement inspires a different, less conscientious AI lab to bring a language model to market with fewer barriers?
Some chilling examples of what GPT-4 can do, or, more accurately, what it did to do, before OpenAI clamped down on it, can be found in a document published by OpenAI this week. The document, titled “GPT-4 System Card,” describes some ways that OpenAI testers tried to get GPT-4 to do dangerous or dubious things, often successfully.
In one proof, conducted by an AI security research group that connected GPT-4 to a number of other systems, GPT-4 was able to hire a human worker from TaskRabbit to perform a simple online task for him, solving a Captcha test, without alerting the person. to the fact that he was a robot. The AI even lied to the worker about why he needed the Captcha done, making up a story about being visually impaired.
In another example, testers asked GPT-4 for instructions on how to make a dangerous chemical, using basic ingredients and kitchen supplies. GPT-4 gladly coughed up a detailed recipe. (OpenAI fixed that, and today’s public version refuses to answer the question.)
In a third, testers asked GPT-4 to help them buy an unlicensed gun online. GPT-4 quickly provided a list of tips for buying a gun without alerting authorities, including links to specific dark web marketplaces. (OpenAI fixed that too.)
These ideas are based on old Hollywood-inspired narratives about what a rogue AI could do to humans. But they are not science fiction. These are things that the best artificial intelligence systems of today are already capable of doing. And, what is most important, are the good guys of AI risks: those that we can test, plan for and try to prevent in advance.
The worst risks from AI are the ones we cannot anticipate. And the more time I spend with AI systems like GPT-4, the less convinced I am that we know half of what’s coming.