When OpenAI introduced the latest version of its immensely popular ChatGPT chatbot this month, it had a new voice that possessed human inflections and emotions. The online demonstration also included the robot teaching a child how to solve a geometry problem.
Much to my chagrin, the demo turned out to be essentially a bait and switch. The new ChatGPT launched without most of its new features, including improved voice (which the company told me it postponed for fixes). The ability to use a phone's video camera to get real-time analysis of something like a math problem is also not available yet.
Amid the delay, the company also disabled the ChatGPT voice that some said sounded like actress Scarlett Johansson after she threatened legal action, replacing it with a different female voice.
For now, what has really been implemented in the new ChatGPT is the ability to upload photos for the bot to analyze. Users can generally expect faster and more lucid responses. The bot can also perform real-time language translations, but ChatGPT will respond in its older, machine-like voice.
However, this is the leading chatbot that revolutionized the tech industry, so it was worth checking out. After testing the accelerated chatbot for two weeks, I had mixed feelings. He excelled in language translation, but struggled with mathematics and physics. In total, I didn't see a significant improvement over the last version, ChatGPT-4. I definitely wouldn't let him tutor my son.
This tactic, where ai companies promise crazy new features and deliver a half-assed product, is becoming a trend that is sure to confuse and frustrate people. The $700 ai Pin, a talking lapel pin from startup Humane, which is funded by OpenAI CEO Sam Altman, was universally criticized for overheating and spitting nonsense. Meta also recently added an ai chatbot to its apps that did a poor job at most advertised tasks, such as web searches for airline tickets.
Companies are launching ai products prematurely in part because they want people to use the technology to learn how to improve it. In the past, when companies introduced new tech products like phones, what they showed us (features like new cameras and brighter screens) was what we got. With artificial intelligence, companies are giving a glimpse of a potential future, demonstrating technologies that are being developed and that work only under limited, controlled conditions. A mature and reliable product may arrive, or it may not.
The lesson we must learn from all this is that we, as consumers, must resist the hype and take a slow and cautious approach to ai. We shouldn't spend a lot of money on any underdeveloped technology until we see evidence that the tools work as advertised.
The new version of ChatGPT, called GPT-4o (“o” as in “omni”), can now be tried for free on OpenAI website and application. Non-paying users can make a few requests before they time out, and those with a $20 monthly subscription can ask the bot a larger number of questions.
OpenAI said its iterative approach to updating ChatGPT allowed it to gather feedback to make improvements.
“We believe it is important to preview our advanced models to give people an idea of their capabilities and help us understand their real-world applications,” the company said in a statement.
(The New York Times sued OpenAI and its partner, Microsoft, last year for using copyrighted news articles without permission to train chatbots.)
Here's what you should know about the latest version of ChatGPT.
Geometry and Physics
To show off ChatGPT-4o's new tricks, OpenAI posted a video featuring Sal Khan, CEO of the nonprofit educational organization Khan Academy, and his son, Imran. With a video camera pointed at a geometry problem, ChatGPT was able to convince Imran to solve it step by step.
Although ChatGPT's video analysis feature has not been released yet, I was able to upload photos of geometry problems. ChatGPT successfully solved some of the easier ones, but ran into more challenging problems.
For a problem involving intersecting triangles, which I dug up in an SAT preparation websiteThe robot understood the question but gave the wrong answer.
Taylor Nguyen, a high school physics teacher in Orange County, California, uploaded a physics problem involving a man on a swing that is commonly included on Advanced Placement Calculus tests. ChatGPT made several logical errors to give an incorrect answer, but it could be corrected with Mr. Nguyen's comments.
“I was able to train him, but I'm a teacher,” he said. “How is a student supposed to detect those errors? “They are assuming the chatbot is right.”
I noticed that ChatGPT-4o succeeded in some division calculations that its predecessors did incorrectly, so there are signs of slow improvement. But it also failed at a basic math task that previous versions and other chatbots, including Google's Meta ai and Gemini, failed at: the ability to count. When I asked ChatGPT-4o for a four-syllable word that starts with the letter “W,” it responded, “Wonderful.”
OpenAI said it was constantly working to improve its systems' responses to complex mathematical problems.
Khan, whose company uses OpenAI technology in its Khanmigo tutoring software, did not respond to a request for comment on whether he would let ChatGPT tutor his son alone.
Reasoning
OpenAI also highlighted that the new ChatGPT was better at reasoning or using logic to generate responses. So I ran it through one of my favorite tests: I asked it to generate a Where's Waldo? puzzle. When he showed a picture of a giant Waldo standing in a crowd, I said the point is that he's supposed to be hard to find.
The robot then generated an even larger Waldo.
Subbarao Kambhampati, a professor and artificial intelligence researcher at Arizona State University, also put the chatbot through some tests and said he didn't see any noticeable improvement in reasoning compared to the last version.
He presented ChatGPT with a puzzle involving blocks:
If block C is on top of block A and block B is separately on the table, can you tell me how can I make a block stack with block A on top of block B and block B on top of block C, but without Moving block C?
The answer is that it is impossible to arrange the blocks under these conditions, but, as with previous versions, ChatGPT-4o always found a solution that involved moving block C. With this and other reasoning tests, ChatGPT was occasionally able to take feedback to get the right answer, which is contrary to how artificial intelligence is supposed to work, Kambhampati said.
“You can correct it, but when you do you're using your own intelligence,” he said.
OpenAI pointed out Test results That showed that GPT-4o scored about two percentage points higher in answering general knowledge questions than previous versions of ChatGPT, illustrating that its reasoning skills had improved slightly.
Language
OpenAI also said that the new ChatGPT could perform language translations in real time, which could help you converse with someone who speaks a foreign language.
I tested ChatGPT with Mandarin and Cantonese and confirmed that it could translate phrases like “I would like to book a hotel room for next Thursday” and “I want a king-size bed.” But the accents were a little off. (To be fair, my broken Chinese isn't much better.) OpenAI said it was still working to improve accents.
ChatGPT-4o also stood out as an editor. When I fed it the paragraphs I wrote, it was quick and effective at removing excessive words and jargon. ChatGPT's decent performance with language translation gives me confidence that it will soon become a more useful feature.
Bottom line
One important thing OpenAI did right with ChatGPT-4o is making the technology free for people to try. Free is the right price: Since we are helping train these ai systems with our data to improve, we shouldn't have to pay for them.
The best of ai is yet to come and one day it might be a good math tutor we want to talk to. But we should believe it when we see and hear it.