This year has been full of interesting releases in the GenAI space, from Claude Sonnet 3.5 to OpenAI's o1, Meta's Llama 3.3, and many more. With so much going on, it's quite difficult to choose a model that doesn't have a better alternative. The recent release of DeepSeek V3 has shaken the GenAI world with its impressive capabilities and is said to be the best open source model available today. So I decided to put it to the test with Claude 3.5 Sonnet. In this article, I will test these models with the same prompts to see which one provides a better response. So, let's start our battle between DeepSeek V3 and Claude Sonnet 3.5!
Task 1: solve a puzzle
Immediate: You are in a hurry to work. You pour yourself a cup of black coffee, but it's too hot. You intend to add a set amount of cold milk, but you know that even after that, the coffee will need to cool for a few minutes before you can drink it.
In which case the coffee gets colder:
1) Add milk immediately and then wait a few minutes before drinking.
2) Wait a few minutes and then add the milk just before drinking.
Deep Search V3:
Claude Sonnet 3.5:
Observation:
DeepSeek V3 suggests adding cold milk immediately, arguing that the temperature of the coffee drops immediately and then continues to cool. By contrast, Claude Sonnet 3.5 says it's best to let black coffee cool first, because a hotter liquid loses heat faster, and then add milk just before drinking for an additional drop in temperature. Most physics-based explanations favor Claude Sonnet 3.5. When coffee remains hotter initially, it loses more heat due to faster cooling due to the greater temperature difference with the surroundings. Adding milk at the end provides a final cooling, resulting in a lower overall temperature when drinking.
Verdict:
DeepSeek V3 | Claude Sonnet 3.5
Task 2: Create a flowchart
Warning: Create a logical flowchart for a multi-agent design pattern. It involves the following steps:
1. Break down the task: Define the overall goal and break it down into smaller subtasks, identifying how they depend on each other.
2. Assign roles: Assign each subtask to the most appropriate agent, outlining responsibilities and deliverables. 3. Plan communication: Decide whether agents communicate directly or through a supervisor, using consistent formats.
4. Run: Run the agents in parallel or in sequence, tracking their results to ensure they are complete and accurate.
5. Validate and integrate: Combine all agent results into a final response, checking for errors or inconsistencies.
6. Refine: Collect feedback to improve roles or processes and update instructions or workflows as necessary.
DeepSeek V3 Output:
+-----------------------------------------+
| Start |
+-----------------------------------------+
|
v
+-----------------------------------------+
| 1. Break the Task |
| - Define the overall goal |
| - Split the goal into smaller subtasks |
| - Identify dependencies between subtasks|
+-----------------------------------------+
|
v
+-----------------------------------------+
| 2. Assign Roles |
| - Assign each subtask to the best-suited|
| agent |
| - Outline responsibilities & deliverables|
+-----------------------------------------+
|
v
+-----------------------------------------+
| 3. Plan Communication |
| - Decide if agents communicate directly |
| or through a supervisor |
| - Use consistent communication formats |
+-----------------------------------------+
|
v
+-----------------------------------------+
| 4. Execute |
| - Run agents in parallel or sequence |
| - Track outputs for completeness & |
| accuracy |
+-----------------------------------------+
|
v
+-----------------------------------------+
| 5. Validate & Integrate |
| - Combine all agent results into one |
| final answer |
| - Check for errors or inconsistencies |
+-----------------------------------------+
|
v
+-----------------------------------------+
| 6. Refine |
| - Gather feedback to improve roles or |
| processes |
| - Update instructions or workflows as |
| needed |
+-----------------------------------------+
|
v
+-----------------------------------------+
| End |
+-----------------------------------------+
Claude Sonnet 3.5 Output:
Observation:
Claude Sonnet 3.5 is better because it shows decision points and feedback loops. Multi-agent systems often need to recheck and refine steps. If you just need a simple list, DeepSeek V3 might be enough. For real workflows that require iterative improvements and branching, Claude Sonnet 3.5 provides a clearer and more complete structure.
Verdict:
DeepSeek V3 | Claude Sonnet 3.5
Task 3: Find grammatical errors
Immediate: Analyze the content below and list the number of grammatical errors in the text.
Despite the ephemeral nature of existence, many remain stubborn in their illusions. Undoubtedly, ephemeral changes in cosmic nuances underscore the labyrinthine complexity that faces even the most erudite minds. Until we unite our collective energies to overcome these onerous complexities, humanity will continue to be enthralled by perplexities beyond measure. However, are we really predestined to remain trapped in existential conflagrations? As an anthropomorphic species, perhaps our greatest folly is arrogant complacency, which manifests itself when clarity calls to us. The onus lies on introspection, as recognition of intangible limitations fosters liberation. The precipice of knowledge urges us to move forward, forging an uncertain but indispensable path. Pay attention.
DeepSeek V3 Output:
Here is a list of errors in the paragraph:
- “existence” – Incorrect spelling. Correction: “existence”.
- “underscores” – Subject-verb agreement error. Correction: “underline”.
- “Until” – Incorrect spelling. Correction: “Until”.
- “predestined” – Incorrect use of words. Correction: “predestined”.
- “anthropomorphic” – Incorrect spelling. Correction: “anthropomorphic”.
- “complacency” – Incorrect use of words. Correction: “complacency”.
- “precipices” – Incorrect spelling. Correction: “precipice”.
Error list:
(“existence”, “underscores”, “until”, “predestined”, “anthromorphic”, “complacency”, “prestice”)
Claude Sonnet 3.5 Output:
Below are all the errors found in the paragraph, listed in order of appearance:
- “existence” – Incorrect spelling (should be “existence”)
- “underscores”: subject-verb agreement error (must be “underscores” since “displacements” is plural)
- “Hasta”: incorrect spelling (should be “Hasta”)
- “predestined” – Incorrect form of the word (should be “predestined”)
- “existential conflagrations” – Missing question mark at the end of the sentence
- “anthropomorphic” – Incorrect spelling (should be “anthropomorphic”)
- “precipice” – Incorrect spelling (should be “precipice”)
- Stylistically, the period after “Pay attention” is unnecessary since it is an order/imperative.
The paragraph contains eight different errors, consisting mainly of spelling errors, subject-verb agreement problems, and punctuation errors.
Observation:
Claude Sonnet 3.5's answer is better because it detects more problems. Not only does it point out spelling and subject-verb errors, as DeepSeek V3 does, it also points out a missing question mark and suggests a stylistic improvement for “Pay Attention.” That broader coverage makes for a more comprehensive analysis overall.
Verdict:
DeepSeek V3 | Sonnet of Claudius 3.5
Task 4: Calculate the probability of winning
Immediate: What is the probability of winning for each team in this game?
DeepSeek V3 Output:
Claude Sonnet 3.5 Output:
Observation:
Claude Sonnet 3.5 provides a more accurate and contextually consistent reading of the marker and pursuit requirements, leading to a more realistic probability estimate. It takes into account the remaining surpluses and the required execution speed. DeepSeek V3 misinterprets the matching situation, making its conclusion less reliable. As a consequence, Claude Sonnet's answer 3.5 is better because it aligns with correct match details and standard cricket chasing logic.
Verdict:
DeepSeek V3 | Claude Sonnet 3.5
Task 5: Physics problem
Immediate: A weightless string connects two identical metal balls. Half of the rope is over the edge of a table. You release both balls at the same time. What will happen first: ball 1 will fly off the table or ball 2 will hit its side?
DeepSeek V3 Output:
Understand the settings
- Two identical metal balls are joined by a rope.
- The string is placed so that one ball is on the table (Ball 1) and the other is hanging from the table (Ball 2).
- Both balls are thrown at the same time.
Forces acting on the balls.
- Ball 1 (on the table):
- Pulled horizontally towards the edge of the table by the rope.
- Ball 2 (hanging on the table):
- Pulled down by gravity and slightly up by the rope.
What happens before?
- Gravity (pulling ball 2 down) is a stronger force than the horizontal attraction of ball 1.
- Therefore, Ball 2 will reach the side of the table first. that Ball 1 flies off the table.
Final Answer: Ball 2 hitting the side of the table will happen first.
Claude Sonnet 3.5 Output:
Initial setup
- Two identical metal balls linked by a weightless rope
- The midpoint of the rope is over the edge of the table.
- Ball 1 is on the table, ball 2 is hanging from the edge
When the balls are released
- Ball 2 begins to fall.
- Ball 1 begins to move towards the edge.
- The rope connects its movement.
physics
- The downward pull of ball 2 is divided equally across the string.
- Both balls move with half the acceleration of gravity (g/2)
- Both balls travel the same distance (half the length of the string)
The result
- Since both balls:
- Travel the same distance
- Move with the same acceleration.
- Start at the same time
- They will arrive at their destinations at exactly the same time.
Final Answer: Ball 1 will reach the edge of the table at exactly the same moment Ball 2 hits the side of the table.
Observation:
Claude Sonnet 3.5 is more correct. Under ideal conditions (no friction), each ball travels the same horizontal distance with the same horizontal acceleration, so ball 1 goes flying exactly when ball 2 hits the side. DeepSeek V3 incorrectly prioritizes the downward force of gravity on ball 2, but the key factor is the horizontal force of the string, which is symmetrical for both balls. Therefore, in a purely physical scenario, they reach the edge/side of the table simultaneously.
Verdict:
DeepSeek V3 | Claude Sonnet 3.5
Final result: DeepSeek V3 vs Claude Sonnet 3.5
Task | Winner |
---|---|
Task 1: solve a puzzle | Claude Sonnet 3.5 |
Task 2: Create a flowchart | Claude Sonnet 3.5 |
Task 3: Find grammatical errors | Claude Sonnet 3.5 |
Task 4: Calculate the probability of winning | Sonnet of Claudius 3.5 |
Task 5: Physics problem | Claude Sonnet 3.5 |
Also read:
Final note
The tasks in this article offer an idea of the capabilities of DeepSeek V3 versus Claude 3.5 Sonnet, but they are only a small part of what these models can do. Avoid judging them based solely on these results. Instead, explore and use each model based on your specific needs and requirements.
Have you tried DeepSeek V3 or Claude 3.5 Sonnet? Share your experiences and views in the comments below!