The ai arms race continues apace: Anthropic is launching its newest model, called Claude 3.5 Sonnet, which it says can match or better OpenAI's GPT-4o or Google's Gemini in a wide variety of tasks . The new model is now available to Claude users on the web and iOS, and Anthropic is also making it available to developers.
The Claude 3.5 Sonnet will ultimately be the middle model of the line: Anthropic uses the name Haiku for its smallest model, Sonnet for the conventional middle option, and Opus for its highest-end model. (The names are weird, but each ai company seems to be naming things in their own special, weird way, so we'll let it slide.) But the company says the 3.5 Sonnet outperforms the 3 Opus, and its benchmarks show it does so by a pretty wide margin. The new model is also apparently twice as fast as the previous one, which could be even higher.
ai model benchmarks should always be taken with a grain of salt; There are many, it's easy to choose the ones that make you look good, and the models and products change so quickly that no one seems to have an advantage for long. That said, the Claude 3.5 Sonnet looks impressive: it beat Meta's GPT-4o, Gemini 1.5 Pro, and Llama 3 400B in seven of nine overall benchmarks and four of five vision benchmarks. Again, don't read too much into that, but it looks like Anthropic has built a legitimate competitor in this space.
What does all that really amount to? Anthropic says that Claude 3.5 Sonnet will be much better at writing and translating code, handling multi-step workflows, interpreting charts and graphs, and transcribing text from images. This new and improved Claude also apparently understands humor better and can write in a much more human way.
Along with the new model, Anthropic is also introducing a new feature called Artifacts. With Artifacts, you'll be able to see and interact with the results of your Claude requests: if you ask the model to design something for you, he can now show you what it looks like and let you edit it directly in the app. If Claude writes you an email, you can edit it in the Claude app instead of having to copy it into a text editor. It's a small feature, but a smart one: These ai tools need to become more than just chatbots, and features like Artifacts simply give the app more things to do.
In reality, the artifacts seem to be a sign of Claude's long-term vision. Anthropic has long said it focuses primarily on enterprises (even as it hires consumer tech people like instagram co-founder Mike Krieger) and said in its press release announcing Claude 3.5 Sonnet that it plans to turn Claude into a tool for companies to “securely”. centralize your knowledge, documents and work in progress in a shared space.” That sounds more like Notion or Slack than ChatGPT, with Anthropic models at the center of the entire system.
For now, however, the model is the big news. And the pace of improvement here is incredible: Anthropic released Claude 3 Opus in March, proudly saying it was as good as GPT-4 and Gemini 1.0, before OpenAI and Google released better versions of their models. Now, Anthropic has taken its next step, and it surely won't be long before its competition does too. Claude is not talked about as much as Gemini or ChatGPT, but he is very much in the running.