OpenAI recently launched o1 and o1 pro in their 12 Days of OpenAI – Live Updates, offering unlimited access via a $200 ChatGPT Pro subscription. With much speculation surrounding its capabilities, I asked myself: Is this premium subscription worth investing in? To answer this, I pitted these two ai models against each other in six challenging tasks. This article explores its strengths, weaknesses and overall performance. In the end, you will clearly understand whether the $200 ChatGPT Pro subscription is the right choice for you or not.
Challenge 1. Solve a zebra problem
As a first test of o1 vs o1 pro, I am taking a zebra problem: difficult level of this website. Let's see which one solves it best!
Immediate:
Solve this zebra problem:
o1 Answer:
Putting this answer as a solution, it turns out to be incorrect:
o1 pro Answer:
Putting this answer as a solution, it is correct:
Observation:
o1 pro took much longer than o1 to respond. o1 fails to solve the problem, while o1 pro does!
Verdict:
o1 | o1 pro
Challenge 2: Find the difference
Immediate:
Find 3 differences in the two images:
o1 Answer:
- Item in hand: In the picture on the left, she is holding a spoon. In the picture on the right, she is holding an ice cream cone.
- Hair accessory: In the picture on the left, her hair is straight. In the image on the right, she is wearing a small pink bow in her hair.
- Background object: In the picture on the left, there is a green sewing machine on the shelf behind her. In the image on the right, the sewing machine is replaced by a simple green box (or frame).
Observation:
Only the second difference is correct, which is the missing hair accessory, the rest 2 are incorrect.
o1 pro Answer:
- Girl's Hair Accessory: In the image on the right, she has a pink bow in her hair, which is not present in the image on the left.
- What he has in his hand: In the image on the left, he is holding a spoon. In the picture on the right, she is holding an ice cream cone.
- The color of the middle ball in the ice cream bowl: In the left image, the middle ball is pink, while in the right image, that ball has turned white.
Observation:
Only the first difference is correct. The rest 2 are incorrect.
Both o1 and o1 pro could not answer correctly. However, o1 was faster in generating the response.
Verdict:
o1 | o1 pro
Challenge 3: Solve a math problem
In this challenge, I will give the models a computational indefinite integrals problem. Let's see which one is capable of solving it!
Immediate:
Solve this math problem:
Response from o1 and o1 pro (same):
Observation:
Both models provided the correct answer, but o1 was much faster than o1 pro at finding the solution.
Verdict:
o1 | o1 pro
Challenge 4: Analyze the article
Immediate:
Read the article: https://www.analyticsvidhya.com/blog/2024/07/building-agent-rag-systems-with-langgraph/ to understand the process of creating a vector database for Wikipedia data. Summarize key steps concisely.
o1 Answer:
o1 pro Answer:
Observation:
The “o1 pro answer” is closer to the actual implementation in the article. Here's why:
The article provides a much more detailed, step-by-step implementation, including:
- Using specific libraries like LangChain and OpenAI embeds
- Loading Wikipedia data from a specific file
- Using Chroma as a vector database
- Implementation of advanced RAG components such as:
- Reformulation of queries
- Document Relevance Rating
- Web search integration
- A complex LangGraph workflow
o1 pro's answer captures more nuance by mentioning:
- Specific embedding models (sentence transformers)
- Vector Database Options
- Metadata storage
- Evidence Recovery
- Integration with a RAG pipeline
In contrast, the initial “o1 answer” is more generic and lacks the technical depth demonstrated in the article. So o1 pro's answer is significantly closer to the actual implementation of the article.
Verdict:
o1 | o1 pro
Challenge 5: Image Creation
Immediate:
Create an image of a cat.
o1 Answer:
or 1 professional answer:
Observation:
Both o1 and o1 pro failed to generate images, indicating that both versions of o1 do not support generating images. However, when giving the same message to GPT 4o, I received the response:
Therefore, it is safe to say that only GPT 4o surpasses o1 and o1 pro in imaging.
Verdict:
o1 | o1 pro
Challenge 6: Create a logical flowchart
Immediate:
Create a complete flowchart illustrating the reflection pattern in agent ai.
o1 Answer:
o1 pro Answer:
They both provided incomplete flow chats, so I decided to update my message. Here is my updated message:
New message:
These are the steps involved in the reflection pattern:
- Generate initial output
- Self-assessment/criticism of the result
- Identify errors, gaps or areas for improvement
- Develop suggestions for improvement
- Review/refine the result
- Repeat steps 2 to 5 until you achieve a satisfactory result
- Create a new flowchart now
o1 Answer:
o1 pro Answer:
Observation:
Although the content of both answers is the same, o1 definitely wins by providing a real flowchart, while o1 pro only provided the correct content.
Verdict:
o1 | o1 pro
Results table: o1 vs o1 pro
Challenge | Verdict |
---|---|
Zebra problem | o1 pro was successful, but slower |
Find differences | Both models performed poorly. |
math problem | Both were resolved correctly, o1 was faster. |
Analyzing the article | o1 pro provided more depth |
Image creation | Neither of them could generate images (GPT 4o could) |
Create a logical flowchart | o1 won by creating a real flowchart |
o1 pro seems to have a slight advantage in terms of depth and accuracy in problem solving, particularly in complex tasks like solving the zebra problem and analyzing technical articles. However, o1 tends to be faster and works well on simpler tasks. The verdict seems to be that o1 pro is marginally better, especially for more complex or technical challenges that require deeper understanding.
Also read: Is the new o1 model better than the GPT-4o?
Final note
While o1 pro shows promise in solving complex problems, it is important to consider your specific needs and budget. For basic to intermediate tasks, GPT-4o or other more affordable alternatives may be sufficient. If solving complex problems is a priority and you're willing to invest, o1 pro could be a valuable tool.
However, as OpenAI continually refines these models, it would be prudent to wait for further updates before making a final decision. OpenAI will likely add more benefits to the $200 ChatGPT Pro plan in the future.
What do you think about this? Let me know in the comments section below.
Stay tuned to Analytics Vidhya blog for more awesome updates!