Claude 3.5 Sonnet — Overview
Anthropic has released Claude 3.5 Sonnet, marking a significant milestone for the AI safety-focused company. The new model outperforms GPT-4o, Gemini 1.5 Pro, and Meta Llama 3 on key benchmarks while remaining available at the same price as Claude 3 Sonnet.
Benchmark Performance
Claude 3.5 Sonnet sets new industry records on several coding and reasoning benchmarks. On SWE-bench Verified, a test of real-world software engineering tasks, Claude 3.5 Sonnet achieves 49% — significantly higher than any previously published model. On the MMLU benchmark it scores 88.7%, ahead of GPT-4o at 88.7% and Gemini 1.5 Pro at 85.9%.
Artifacts Feature
Alongside the model release, Anthropic introduced Artifacts — a new UI feature in Claude.ai that allows users to generate, preview, and iterate on code, documents, and other content in a dedicated side panel. This makes Claude significantly more useful for developers and content creators who want to see their output rendered in real time.
Coding Capabilities
Claude 3.5 Sonnet demonstrates particularly strong coding performance, writing, editing, and executing code with greater accuracy and fewer errors than previous models. Anthropic reports that internal testing shows Claude 3.5 Sonnet resolves 64% of coding issues on the first attempt, compared to 38% for Claude 3 Opus.
Pricing and Access
Claude 3.5 Sonnet is available free on Claude.ai with usage limits and via the Anthropic API at $3 per million input tokens and $15 per million output tokens — the same pricing as Claude 3 Sonnet. Claude Pro subscribers get priority access with higher rate limits.
Safety and Alignment
Anthropic emphasizes that Claude 3.5 Sonnet maintains the company’s commitment to AI safety. The model underwent extensive red-teaming and evaluation before release and sits at ASL-2 on Anthropic’s safety scale, meaning it does not exhibit behaviors requiring ASL-3 safeguards.