Only a few months ago, Depseek shook ai's world with its models V3, R1 and Janus. And recently, Grok 3 of Elon Musk has joined the party! With the statement of being the most intelligent ai on Earth, Grok 3's expectations are quite high. According to the official reference points shared by the XAI team in the launch event, it seems to be a monster model, surpassing all its competitors in various reference points. But is it really good enough for the regular tasks that you and I do in our daily lives? Is it worth spending $ 40 on a Premium+ x account to access the model? You will definitely know the answers at the end of this article. Let's start our competition, testing Grok 3 vs Deepseek R1 in five diverse indications!
Task 1: Python code generation
Immediate: “Write a python function to generate a random labyrinth using the search for depth and visualize it using Matpletlib “
Grok 3
import random
import numpy as np
import matplotlib.pyplot as plt
def generate_maze(width, height):
"""
Generate a random maze using Depth-First Search and visualize it with Matplotlib.
Parameters:
width (int): Width of the maze (number of columns)
height (int): Height of the maze (number of rows)
"""
# Initialize the maze grid with walls (1) and paths (0)
maze = np.ones((height * 2 + 1, width * 2 + 1), dtype=int)
maze(1::2, 1::2) = 0 # Set paths (odd rows/columns) to 0
def carve_path(x, y):
"""
Recursively carve paths in the maze using DFS.
"""
directions = ((0, 2), (2, 0), (-2, 0), (0, -2)) # Down, Right, Up, Left
random.shuffle(directions)
for dx, dy in directions:
new_x, new_y = x + dx, y + dy
if (0 <= new_x < height * 2 + 1 and
0 <= new_y < width * 2 + 1 and
maze(new_x, new_y) == 1):
maze(new_x - dx // 2, new_y - dy // 2) = 0 # Remove wall
maze(new_x, new_y) = 0 # Carve path
carve_path(new_x, new_y)
# Start carving from the top-left path (1,1)
carve_path(1, 1)
# Ensure entrance and exit are open (top-left and bottom-right paths)
maze(0, 1) = 0 # Entrance
maze(-1, -2) = 0 # Exit
# Visualize the maze
plt.figure(figsize=(width // 2, height // 2))
plt.imshow(maze, cmap='binary', interpolation='nearest')
plt.title("Random Maze Generated with DFS")
plt.axis('off')
plt.show()
return maze
# Example usage
if __name__ == "__main__":
# Generate a 10x10 maze (actual grid will be 21x21 due to walls)
maze = generate_maze(10, 10)#import csv
Deepseek R1:
import random
import matplotlib.pyplot as plt
def generate_maze(width, height):
# Initialize the grid with walls (1)
grid = ((1 for _ in range(width)) for _ in range(height))
# Starting point
start_x, start_y = 1, 1
grid(start_y)(start_x) = 0
# Stack for DFS
stack = ((start_x, start_y))
# Directions: up, right, down, left
directions = ((0, -2), (2, 0), (0, 2), (-2, 0))
while stack:
x, y = stack(-1)
random.shuffle(directions)
for dx, dy in directions:
nx, ny = x + dx, y + dy
if 0 < nx < width - 1 and 0 < ny < height - 1 and grid(ny)(nx) == 1:
grid(ny)(nx) = 0
grid(y + dy // 2)(x + dx // 2) = 0
stack.append((nx, ny))
break
else:
stack.pop()
return grid
def plot_maze(grid):
height = len(grid)
width = len(grid(0))
fig, ax = plt.subplots(figsize=(width, height))
ax.set_xticks(())
ax.set_yticks(())
for y in range(height):
for x in range(width):
if grid(y)(x) == 1:
ax.add_patch(plt.Rectangle((x, height - y - 1), 1, 1, color="black"))
ax.set_xlim(0, width)
ax.set_ylim(0, height)
ax.set_aspect('equal')
plt.gca().invert_yaxis()
plt.show()
# Parameters
maze_width = 21 # Should be odd
maze_height = 21 # Should be odd
# Generate and plot the maze
maze = generate_maze(maze_width, maze_height)
plot_maze(maze)#import csv
Production:

Observation:
The Deepseek R1 output has a clear and well structured maze that seems professionally generated. The sharp and consistent walls create a properly closed maze, ensuring a challenging but visually clean admission experience. It adheres to good principles of labyrinth generation, which makes it functional and aesthetically pleasant.
In contrast, Grok 3's exit seems very pixelated, with less distinguishable roads due to the thick black and white grid structure. The general design feels more chaotic, with roads that are less defined compared to Deepseek R1. While use depth search generation (DFS), visual clarity and usability are significantly lower, which makes it a less effective labyrinth representation.
Verdict:
Grok 3 | Deepseek R1
Task 5: Web search
As Deepseek R1 does not allow the web search, I have used the Depseek search option for this task!
Immediate: “What are the latest advances in nuclear fusion technology from 2025? FROMAT OF OURPUT – ADVANCE LIST +SOURCE “
Observation:
Between Deepseek R1 and Grok 3, Depseek R1 is the best model to answer heavy research consultations such as advances in nuclear fusion technology. The greatest advantage of Deepseek R1 is that it provides direct links and clicking for each claim, ensuring transparency and credibility. It also covers a broader range of advances, including merger fuel technology, private sector investments, regulatory changes and government initiatives, which makes your response more complete. In addition, their sources come from authorized platforms such as Iter, Ukaea and the USA of the USA, which significantly improves reliability.
On the other hand, while Grok 3 offers a better explanatory depth, it falls short of credibility because it does not include direct source links. On the other hand, it makes vaguely refers to the media such as BBC News or MIT News without the appropriate appointments. An important weakness is its dependence on the publications of x (twitter), which often contain unseested statements. While Grok 3 recognizes these limitations, the lack of concrete sources makes it less reliable for objective research.
Verdict:
Grok 3 | Deepseek R1
Task 3: Basic animation using HTML
Immediate: “Create HTML + CSS code for a red ball that revolves inside a square.”
Grok 3:
Rotating Red Ball in Square
Deepseek R1:
Production:
Observation:
Clearly, Grok 3 is struggling to interpret the notice and generate the correct answer. Deepseek R1 took between 8 and 10 minutes to respond, but its departure is more accurate and precise.
Verdict:
Grok 3 | Deepseek R1
Task 4: Image analysis
Immediate: “Analyze this chess board position. Suggest the best movement for the current (white) player to verify the black companions and explain the reasoning“

Grok 3:

Deepseek R1:

Observation:
Both Grok 3 and Deepseek R1 were wrong. Grok 3 suggested E4-E5, who does not deliver checkmate or threatens the king. Deepseek R1 suggested that1#, but this movement is impossible in the given position, which shows that the board misunderstood. The correct movement was qf7#, where the queen delivers checkmate catching the black king. Grok 3 could not recognize an immediate checkmate, while Depseek R1 assumed the incorrect configuration of the Board instead of analyzing the real position.
Verdict:
Grok 3 | Deepseek R1
Task 5: Logical Reasoning
Immediate: “Solve this zebra puzzle. Give me out in a table “

Grok 3:

Put the response generated in the puzzle:

Deepseek R1:

Put the response generated in the puzzle:

Observation:
Deepseek R1 again took longer to answer, but gave the correct answer. Grok 3 could not understand the image and gave an incorrect exit.
Verdict:
Grok 3 | Deepseek R1
Grok 3 vs Deepseek R1: Result
Python code generation | Deepseek R1 |
Web search | Deepseek R1 |
Basic animation (HTML + CSS) | Deepseek R1 |
Image analysis (Chessboard Checkmate) | Both failed |
Logical reasoning (zebra puzzle) | Deepseek R1 |
Also read:
Final note
Grok 3 of Elon Musk was publicized as a change of play in ai, claiming to be the smartest model on the earth. However, in real world tests, he could not meet expectations. In multiple tasks, Grok 3 fought with precision, logical reasoning and complex problems resolving, often producing incorrect or poorly structured responses. Meanwhile, Deepseek R1 constantly exceeded it, delivering more precise, structured and verifiable responses in key areas such as coding generation, web search and logical reasoning.
Despite the bold marketing statements, Grok 3 still has a long way to go before you can compete with the best ai models. The fact that basic reasoning tasks failed suggests that XAI needs important improvements in its training approach. However, given the music history of iteration and rapid improvements, it will be interesting to see if future updates can close this gap. Will Grok 3 evolve towards the power of ai who claims to be, or will it continue to be an overrated experiment? Time will say it.
Be attentive to the Analytics Vidhya blog to follow Grok 3 updates regularly!