DeepSeek V3:The $5.5M Trained Model Beats GPT-4o & Llama 3.1
ModelArena-HardAlpacaEval 2.0DeepSeek-V2.5-090576.250.5Qwen2.5-72B-Instruct81.249.1LLaMA-3.1 405B69.340.5GPT-4o-051380.451.1Claude-Sonnet-3.5-102285.252.0DeepSeek-V385.570.0 Arena-Hard Performance: DeepSeek-V3 ranks highest with 85.5, narrowly surpassing Claude-Sonnet-3.5 (85.2) and significantly outperforming DeepSeek-V2.5 (76.2). This ...