Claude 3.7 Sonnet: Anthropic's Best Model Gets Even Better

Extended Thinking

Claude 3.7 Sonnet, released March 4, 2025, introduced extended thinking — a mode where the model reasons step by step through a problem before producing its final answer, with the thinking process visible. Unlike standard responses generated in seconds, extended thinking allows Claude to work for up to 60 seconds (or longer with higher token budgets), producing higher-quality outputs for complex tasks.

Benchmark Results

With extended thinking: 96.2% on MATH-500, 62.3% on SWE-bench Verified (resolving nearly two-thirds of real GitHub bugs — a new state-of-the-art), and 80.0% on GPQA Diamond (graduate-level scientific reasoning). On TAU-bench agentic task execution, 3.7 Sonnet scored 81.2%, establishing a new standard for agentic AI performance.

Agentic Improvements

Beyond extended thinking, 3.7 Sonnet showed major improvements in multi-step agentic task completion. The model was less likely to abandon tasks mid-way, better at recovering from tool call failures, and more consistent following complex multi-part instructions across long conversations. Developers noted a dramatic reduction in the need for complex retry logic.

Pricing and Claude Code

Priced at $3.00 per million input tokens and $15.00 per million output tokens. Extended thinking tokens are charged at the output rate. Anthropic simultaneously announced 3.7 Sonnet would power Claude Code as the default model, combining extended thinking with its improved SWE-bench performance for complex codebase tasks requiring large context reasoning.

What This Means for Indian Businesses

Claude 3.7 Sonnet's extended thinking mode is transformative for Indian businesses needing deep analytical work: contract review, market analysis, financial modelling interpretation, and multi-step business planning. At Claude Pro pricing (approximately Rs 1,700/month equivalent), you get a model that can spend 30 minutes reasoning through a complex legal contract and surface risks a junior associate might miss.