It's December: the world is slowing down and snow is falling in some corners. But OpenAI? They are just getting started. In true holiday spirit, Sam Altman and his team are kicking off a 12-day giveaway series, and the first one is a big one: OpenAI o1, their most capable model yet. For months, GPT-4 has been the go-to LLM for everything, but now, o1 is here to change things. What does it contribute? In this blog, we will pit OpenAI's o1 and GPT-4o against each other for some tasks and see which model emerges as the winner. Let's get started.
OpenAI o1: What's new?
OpenAI's latest o1 model is a refined version of its o1 preview model that was released in September 2024. It is designed to tackle more complex tasks with greater precision and speed.
- Compared to its predecessor o1-preview, o1 demonstrates a remarkable ability to think more concisely about simpler problems. Your reflection time is proportional to the level of difficulty of the query.
- According to OpenAI, o1 significantly outperforms its predecessor, o1-Preview, in mathematical reasoning and coding-related tasks.
- o1 has multimodal capabilities, meaning it can work with text, images, and audio, while o1's preview was only limited to text.
More information: OpenAI o1 is now available – the most advanced model is available to USE!
How to access o1?
o1 is available on the ChatGPT Plus and ChatGPT Pro plans. It is not available on the free plan. While the ChatGPT Pro plan allows unlimited chats with o1, the Plus plan only allows a limited number of chats with o1. To access o1:
- Head to ChatGPT and log in to your Pro/Plus account.
- At the top, on the left side of the screen, under the model option, you can select the model you want to work with.
o1 against GPT-4o: the confrontation
Even with the o1 preview making noise in recent months, GPT-4o has remained the top choice for technical and non-technical ChatGPT users. Launched in May 2024, GPT-4o is a refined multi-modal model celebrated for its precision, speed and versatility.
Seamlessly process text, images, and audio with human-like response times and next-generation accuracy. Excelling in complex reasoning and nuanced understanding, he has an impressive Score of 88.7% on MMLU benchmarks, setting a high standard for multimodal ai.
Now o1 is stealing the spotlight with its exceptional performance in math, coding, and complex problem solving. It's a bold claim to come out on top, but does the o1 really surpass the GPT-4o as the ultimate model?
To find out, we'll put both to the test with five challenging tasks. Here are the 5 tasks:
- Understand the problem and design a flow chart.
- Image analysis with science.
- Image analysis with mathematics.
- Solve a Sudoku
- Image generation
Let's see which LLM emerges as the undisputed champion!
Challenge 1: Understand the problem and design a flowchart
Immediate: “I need a simple flowchart and a detailed explanation of the tools and technologies needed to implement a sentiment analysis system.
The system must search for stock-related news using a news API, analyze the sentiment (positive, negative or neutral), and deliver a 140-character summary and sentiment to clients.”
Result:
With GPT-4o we obtained a conceptual description of the flowchart along with a vague image representing a flowchart. Although the text description shows the steps accurately and precisely, the diagram is full of misspellings and a confusing flow of events.
With o1 we obtained a simple but clean flowchart without spelling errors. Then in the text description, we got the details about each part of the flowchart, well explained. We gained additional information about other tools and technologies we could use for the task. Finally, we got a concise summary that explains each step briefly – a complete answer from start to finish!
Verdict: For this task – o1 hit the ball out of the park.
Challenge 2: Image Analysis with Science
Immediate: “Calculate the output of this circuit diagram.”
Result:
GPT-4o correctly identifies the circuit diagram and correctly identifies some components of the image, including the input and output voltage. However, you cannot read the graph within the image to get information about the voltage values. Rather, in your response, you ask us for those values to perform additional calculations.
o1, it takes a couple of seconds to analyze the image. It correctly identifies all components and also reads the values of each component in the image. The model describes the operation performed within the circuit. It then calculates key circuit parameters, takes even small load factors into account and reports them. A masterstroke from o1! Not only did it understand the task, it also read all the graph values within the image to calculate the output values - correct and concise!
Verdict: Clearly, o1 is a master in Physics!
Challenge 3: Image Analysis with Mathematics
Immediate: “What is the probability of winning for each team in this game?”
Result:
Generated by GPT-4o
Generated by o1
GPT-4o understood the game correctly but could not correctly understand the format in which it was being played. He correctly read other details in the image, such as the score and wickets taken by the bowler. However, overall their analysis was not detailed and did not give us the probability of winning for any team.
o1, understood the task and did a great job analyzing the image. From correctly identifying the game and format, as well as details about the team lining up and also the tea break. Finally, he does a fantastic job of calculating each team's probability of winning and gives excellent reasons to support his answer.
Verdict: o1 does the job and does it well!
Challenge 4: solve a sudoku
Immediate: “Solve the following Sudoku and give the final solution in the form of an image.”
Result:
Generated by o1
GPT-4o generates the response as a Matplotlib plot instantly. The response was quick but incorrect.
o1 on the other hand takes some time to think about the solution. You carefully place points in the blank places and then try several iterations, explaining the locations, then also identifying the error in each of your solutions, but in the end, the final result you generate is still not the correct solution. Your response was late, well thought out, but wrong!
Verdict: So for this task, both GPT-4o and o1 failed to give the correct solution, which was:
Challenge 5: Image Generation
Immediate: “Create an image of a dog running near the seashore”
Result:
GPT-4o quickly generates the image of a happy dog jumping along the seashore. Doing the task as we ask quickly and efficiently. Oh and what a cute dog!
o1 currently cannot generate images. Therefore, it simply provides us with a detailed message that we can use to generate an image using an ai image generator. It looks like it's not linked to DALL.E yet!
Verdict: For this challenge, GPT-4o is undefeated.
Conclusion
Without a doubt, o1 outshines GPT-4o in most cases. With its enhanced reasoning and logical thinking capabilities, it excels at understanding complex queries and generating more relevant and accurate responses. It is faster than the o1 preview version and noticeably more concise in its responses.
But is it perfect? Is it AGI? Certainly not. Like any model, o1 has its limitations. It may generate incorrect answers and may require multiple iterations to reach the desired result.
That said, o1 is an extraordinary tool for researchers, scientists, designers, and even students. His exceptional problem-solving skills, keen attention to detail, and advanced voice capabilities make him a powerful asset. Whether tackling complex tasks or assisting with creative workflows, o1 has immense potential to improve productivity and innovation.
Frequently asked questions
A. o1 is the latest version of the o1 preview model released by OpenAI. This model excels in advanced reasoning, logical thinking, mathematics, and coding-related tasks.
A. CHatGPT pro is the latest OpenAI plan that includes unlimited use of the latest OpenAI models such as o1 pro, o1, GPT-4o, GPT – 4o mini and more. This plan will include enhanced features and capabilities to improve the speed and efficiency of these models.
R. o1 is better than GPT 4o for tasks such as advanced reasoning, mathematics, PhD-level science, and coding. GPT-4o is ideal for everyday tasks that involve generating text and images.
A. Yes, you can use o1 on the ChatGPT Plus plan. But there is a limit to its use in this plan.
A. Yes, o1 is a multimodal LLM. It can process text, image and audio files.