DeepSeek V3:The $5.5M Trained Model Beats GPT-4o & Llama 3.1

Model	Arena-Hard	AlpacaEval 2.0
DeepSeek-V2.5-0905	76.2	50.5
Qwen2.5-72B-Instruct	81.2	49.1
LLaMA-3.1 405B	69.3	40.5
GPT-4o-0513	80.4	51.1
Claude-Sonnet-3.5-1022	85.2	52.0
DeepSeek-V3	85.5	70.0

Arena-Hard Performance:
- DeepSeek-V3 ranks highest with 85.5, narrowly surpassing Claude-Sonnet-3.5 (85.2) and significantly outperforming DeepSeek-V2.5 (76.2).
- This shows its exceptional ability to generate well-rounded, context-aware responses in difficult scenarios.
AlpacaEval 2.0 Performance:
- DeepSeek-V3 leads with 70.0, far ahead of Claude-Sonnet-3.5 (52.0), the second-best performer.
- This demonstrates significant improvements in user preference and overall quality of open-ended outputs, showcasing better alignment with user expectations.
Comparison with Competitors:
- Qwen2.5 (Arena-Hard: 81.2, AlpacaEval: 49.1):
  - Performs reasonably well on Arena-Hard but falls behind significantly in user preference, indicating weaker alignment with user-friendly response styles.
- GPT-4-0513 (Arena-Hard: 80.4, AlpacaEval: 51.1):
  - Competitive on both metrics but doesn’t match the user-centered quality of DeepSeek-V3.
- LLaMA-3.1 (Arena-Hard: 69.3, AlpacaEval: 40.5):
  - Scores lower on both benchmarks, highlighting weaker open-ended generation capabilities.
- DeepSeek-V2.5 (Arena-Hard: 76.2, AlpacaEval: 50.5):
  - The leap from V2.5 to V3 is substantial, indicating major upgrades in response coherence and user preference alignment.

You can also refer to this to understand the evaluation better:

Link to the <a target="_blank" href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/README.md” target=”_blank” rel=”noreferrer noopener nofollow”>DeepSeek V3 Github

Aider Polyglot Benchmark Results

Here are the Aider Polyglot Benchmark Results, which evaluate models on their ability to complete tasks correctly. The evaluation is divided into two output formats:

Diff-like format (shaded bars): Tasks where outputs resemble code diffs or small updates.
Whole format (solid bars): Tasks requiring the generation of an entire response.

Key Observations

Top Performers:
- o1-2024-11-12 (Tingli) leads the benchmark with nearly 65% accuracy in the whole format, showing exceptional performance across tasks.
- DeepSeek Chat V3 Preview and Claude-3.5 Sonnet-2024-1022 follow closely, with scores in the range of 40–50%, demonstrating solid task completion in both formats.
Mid-Performers:
- Gemini+exp-1206 and Claude-3.5 Haiku-2024-1022 score moderately in both formats, highlighting balanced but average performance.
- DeepSeek Chat V2.5 and Flash-2.0 sit in the lower mid-range, showing weaker task resolution abilities compared to the leading models.
Lower Performers:
- y-lightning, Qwen2.5-Coder 32B-Instruct, and GPT-4o-mini 2024-07-18 have the lowest scores, with accuracies under 10–15%. This indicates significant limitations in handling both diff-like and whole format tasks.
Format Comparison:
- Models generally perform slightly better in the Whole format than the Diff-like format, implying that full-response generation is handled better than smaller, incremental changes.
- The shaded bars (diff-like format) are consistently lower than their whole-format counterparts, indicating a consistent gap in this specific capability.

DeepSeek Chat V3 Preview’s Position:

Ranks among the top three performers.
Scores around 50% in the whole format and slightly lower in the diff-like format.
This shows strong capabilities in handling complete task generation but leaves room for improvement in diff-like tasks.

Insights:

The benchmark highlights the diverse strengths and weaknesses of the evaluated models.
Models like o1-2024-11-12 show dominance across both task formats, whereas others like DeepSeek Chat V3 Preview excel primarily in full-task generation.
Lower performers indicate a need for optimization in both nuanced and broader task-handling capabilities.

This ultimately reflects the versatility and specialized strengths of different ai systems in completing benchmark tasks.

DeepSeek V3’s Chat Website & API Platform

You can interact with DeepSeek-V3 through the official website: DeepSeek Chat.

Additionally, they offer an OpenAI-Compatible API on the DeepSeek Platform: Link.
There is an API cost to it and it depends on the tokens:

How to Run DeepSeek V3?

If you prefer not to use the chat UI and want to directly work with the model, there’s an alternative for you. The model, DeepSeek-V3, has all its weights released on Hugging Face. You can access the SafeTensor files there.

Model Size and Hardware Requirements:

Firstly, the model is massive, with 671 billion parameters, making it challenging to run on standard consumer-grade hardware. If your hardware isn’t powerful enough, it’s recommended to use the DeepSeek platform for direct access. Wait for a Hugging Face Space if one becomes available.

How to Run Locally?

If you have sufficient hardware, you can run the model locally using the DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, AMD GPU, Huawei Ascend NPU.

Convert the model to a quantized version to reduce memory requirements, which is particularly helpful for lower-end systems.

Here’s how you can convert FP8 weights to BF16:

Conversion script if you need bf16

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

Setup Process with DeepSeek-Infer Demo

Hugging Face’s transformers library does not directly support the model yet. To set it up, you’ll need to:

Clone the DeepSeek ai GitHub repository:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git

Install the required dependencies:

cd DeepSeek-V3/inference
pip install -r requirements.txt

Download the Hugging Face checkpoints and run the model locally.

Refer to this: <a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-V3″ target=”_blank” rel=”noreferrer noopener nofollow”>DeepSeek V3

Recommendation:

If you have powerful hardware, try running the model locally to fully explore its potential.
Otherwise, use the DeepSeek.com chat UI or their API platform for seamless access.

LLM DeepSeek Plugin

You can also use llm-deepseek plugin to access the model:

# Step 1: Install the llm package
!pip install llm
# Step 2: Install the DeepSeek plugin
!llm install llm-deepseek
# Step 3: Set the DeepSeek API key
!llm keys set deepseek
# Paste your API key when prompted
# Step 4: Run the DeepSeek Chat Model
!llm -m deepseek-chat 'five great names for a pet ocelot'
# Step 5: Run the DeepSeek Coder Model
!llm -m deepseek-coder 'how to reverse a linked list in python'

DeepSeek V3 Experimentations

On the DeepSeek platform, you can explore the web and engage in deep reasoning to solve complex problems, similar to OpenAI’s approach.

Here are some prompts and outputs:

First Experiment

Prompt: Generate an SVG of a kid and a football

Output:

Here’s a simple SVG representation of a kid playing with a football. You can
copy and paste this code into an HTML file or an SVG viewer to see the
image:

height="200">

Explanation:

The kid is represented with simple shapes: a circle for the head, a rectangle
for the body, and lines for the arms and legs.

The football is an ellipse with two white lines to mimic the classic football
design.

You can adjust the colours, sizes, and positions to customize the image
further.

Let me know if you'd like to enhance or modify this!

Second Experiment

Prompt: Create the classic snake game

Output:

Creating the classic Snake game involves using HTML, CSS, and JavaScript
. Below is a simple implementation:  
  
  
Snake Game  
    body {
      display: flex;
      justify-content: center;
      align-items: center;
      height: 100vh;
      margin: 0;
      background-color: #000;
    }
    canvas {
      border: 1px solid #fff;
    }
  
  
  
JavaScript (snake.js)
const canvas = document.getElementById("gameCanvas");
const ctx = canvas.getContext("2d");
const gridSize = 20;
const tileCount = canvas.width / gridSize;
let snake = ({ x: 10, y: 10 });
let direction = { x: 0, y: 0 };
let food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
let score = 0;
function gameLoop() {
  update();
  draw();
  setTimeout(gameLoop, 100);
}
function update() {
  const head = { x: snake(0).x + direction.x, y: snake(0).y + direction.y };
  // Check for collision with walls or itself
  if (head.x < 0 || head.x >= tileCount || head.y < 0 || head.y >= tileCount
|| snake.some(segment => segment.x === head.x && segment.y === head.y)) {
    resetGame();
    return;
  }
  snake.unshift(head);
  // Check if snake eats the food
  if (head.x === food.x && head.y === food.y) {
    score++;
    food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
  } else {
    snake.pop();
  }
}
function draw() {
  // Clear the canvas
  ctx.fillStyle = "#000";
  ctx.fillRect(0, 0, canvas.width, canvas.height);
  // Draw the snake
  ctx.fillStyle = "#0F0";
  snake.forEach(segment => ctx.fillRect(segment.x * gridSize, segment.y *
gridSize, gridSize, gridSize));
  // Draw the food
  ctx.fillStyle = "#F00";
  ctx.fillRect(food.x * gridSize, food.y * gridSize, gridSize, gridSize);
  // Draw the score
  ctx.fillStyle = "#FFF";
  ctx.font = "20px Arial";
  ctx.fillText("Score: " + score, 10, 30);
}
function resetGame() {
  snake = ({ x: 10, y: 10 });
  direction = { x: 0, y: 0 };
  score = 0;
  food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
}
window.addEventListener("keydown", e => {
  switch (e.key) {
    case "ArrowUp":
      if (direction.y === 0) direction = { x: 0, y: -1 };
      break;
    case "ArrowDown":
      if (direction.y === 0) direction = { x: 0, y: 1 };
      break;
    case "ArrowLeft":
      if (direction.x === 0) direction = { x: -1, y: 0 };
      break;
    case "ArrowRight":
      if (direction.x === 0) direction = { x: 1, y: 0 };
      break;
  }
});
gameLoop();
How It Works:
Canvas Setup: The game is drawn on a  element.
Snake Movement: The snake is represented as an array of segments. The head
moves based on the direction, and the tail follows.
Food: A random position is generated for the food. When the snake eats the
food, it grows, and the score increases.
Collision Detection: The game checks if the snake collides with the walls or
itself, and resets if it does.
Controls: Arrow keys are used to change the snake's direction.
How to Run:
Save the HTML and JavaScript code into files (index.html and snake.js).
Open the index.html file in a browser.
Use the arrow keys to control the snake.
Let me know if you'd like to add more features or improve the game!

Third Experiment

Prompt: How to clone the git repository?

Output

Firstly, it thinks for 6 seconds and then gives the output

Cloning a Git repository involves creating a copy of an existing repository
from a remote location (such as GitHub, GitLab, or Bitbucket) to your local
machine. This copy includes all the project files, commit history, and
branches.

Here are the steps to clone a Git repository:

1. Install Git
Ensure that Git is installed on your machine. If it's not installed, download
and install it from the official Git website.

2. Open Terminal or Command Prompt
Open your terminal (Linux/Mac) or command prompt (Windows).

3. Navigate to the Desired Directory
Use the cd command to navigate to the directory where you want to clone the
repository.

bash
Copy
cd /path/to/directory
4. Clone the Repository
Use the git clone command followed by the URL of the repository.

bash
Copy
git clone https://github.com/username/repository.git
Replace https://github.com/username/repository.git with the actual URL of the
repository you want to clone.

5. Enter the Cloned Directory
After cloning, navigate into the repository directory.

bash
Copy
cd repository
6. Verify the Clone
You can verify that the repository was cloned correctly by checking the
remote URL.

bash
Copy
git remote -v
This should display the URL of the remote repository.

Example
bash
Copy
# Navigate to the desired directory
cd ~/Documents

# Clone the repository
git clone https://github.com/username/repository.git

# Enter the repository directory
cd repository

# Verify the remote URL
git remote -v
Optional: Cloning Using SSH
If you have set up SSH keys, you can clone the repository using an SSH URL:

bash
Copy
git clone (email protected):username/repository.git
Optional: Shallow Clone
If you only need a part of the history (e.g., the latest commit), you can
perform a shallow clone:

bash
Copy
git clone --depth 1 https://github.com/username/repository.git
This is useful for large repositories where you don't need the full history.

Conclusion
You have now successfully cloned a Git repository to your local machine. You
can start working on the project, make changes, and commit them as needed.

DeepSeek chat is powered by V3 and is powerful

Here an MVP of Asteroids game with ai companies logos. Fully built with it in few minutes!

Sonnet 3.5 is not the King anymore
Anthropic it’s your turn!

Artifact created in the comment pic.twitter.com/FCMZTb52fQ

— Ivan Fioravanti ᯅ (@ivanfioravanti) <a target="_blank" href="https://twitter.com/ivanfioravanti/status/1871996964168204729?ref_src=twsrc%5Etfw”>December 25, 2024

All the outputs are up to the mark, and the output speed was quite impressive. Additionally, DeepSeek allows you to ask reasoning questions to the model, making it a versatile and efficient tool for complex problem-solving and in-depth analysis.

Also Read: DeepSeek V3 vs GPT-4o: Can Open-Source ai Compete with GPT-4o’s Power?

Conclusion

DeepSeek V3 stands as a monumental achievement in the evolution of large-scale ai models, combining unprecedented scale with unmatched efficiency. With its innovative architecture, cost-effective training, and impressive 685 billion parameters, DeepSeek V3 redefines what’s possible in the ai space. The model’s ability to excel in diverse benchmarks, outperforming both open-source and closed-source competitors, highlights its extraordinary capabilities.

Not only does DeepSeek V3 deliver state-of-the-art performance in tasks like coding, reasoning, and mathematical problem-solving, but it also democratizes access to cutting-edge ai with its open-source availability. Developers, researchers, and businesses alike can leverage its immense power, supported by a permissive license that fosters innovation and collaboration.

By achieving exceptional results with a training cost of just $5.5 million, DeepSeek V3 proves that scalability and efficiency can coexist, setting a new standard for the future of ai development. This release marks a significant leap forward, not just for DeepSeek, but for the entire ai community, paving the way for breakthroughs in machine learning, natural language processing, and beyond.

Pankaj Singh

Hi, I am Pankaj Singh Negi – Senior Content Editor | Passionate about storytelling and crafting compelling narratives that transform ideas into impactful content. I love reading about technology revolutionizing our lifestyle.

<!–

–>

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details

<!– Gen ai popup –>

Enter email address to continue

Resend OTP

Resend OTP in 45s

DeepSeek V3:The $5.5M Trained Model Beats GPT-4o & Llama 3.1

Aider Polyglot Benchmark Results

Key Observations

DeepSeek Chat V3 Preview’s Position:

Insights:

DeepSeek V3’s Chat Website & API Platform

How to Run DeepSeek V3?

How to Run Locally?

Setup Process with DeepSeek-Infer Demo

LLM DeepSeek Plugin

DeepSeek V3 Experimentations

First Experiment

Output:

Second Experiment

Output:

Third Experiment

Output

Conclusion

Cookies

brahmaid

csrftoken

Identityid

sessionid

g_state

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

_we_us

WebKlipperAuth

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

_fbp

fr

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

MR

ANONCHK

GenAI Pinnacle Program

Revolutionizing ai Learning & Development

Enter email address to continue

GenAI
Pinnacle
Program