DeepSeek V3's Architecture

DeepSeek V3, released January 13, 2026, is a 671B parameter Mixture-of-Experts model that activates 37B parameters per token during inference — providing the capability of a 671B model at the computational cost of a 37B dense model. The architecture builds on DeepSeek's Multi-head Latent Attention (MLA) innovation from V2, combined with new training efficiency improvements that reduced compute requirements by 30% versus V2 for equivalent model quality.

Performance Against Commercial Models

On the major benchmarks, DeepSeek V3 achieved: 88.5% on MMLU (vs GPT-4o's 88.7%), 89.2% on HumanEval coding (vs GPT-4o's 90.2%), and 74.4% on MATH (vs GPT-4o's 74.6%). These near-parity results against OpenAI's best non-reasoning model, from a fully open-source model trained at a fraction of the cost, continued the trend DeepSeek R1 had established twelve months earlier.

Training Efficiency

DeepSeek published their training methodology alongside the model weights, reporting that V3 was trained for approximately $5.5 million in compute costs — remarkable for a model matching frontier commercial performance. The methodology included novel auxiliary loss-free load balancing for MoE routing and a multi-token prediction auxiliary task that improved training efficiency. These innovations are expected to influence training approaches across the industry.

Community Response

Within 48 hours of release, V3 was available on Ollama, LM Studio, and all major inference providers. Groq reported V3 inference at 2,600 tokens/second on their LPU hardware — the fastest inference speed for any model of this capability class ever recorded publicly. The combination of open weights, high performance, and high inference speed positioned V3 as the default choice for organisations wanting to self-host a frontier-class model.

What This Means for Indian Businesses

DeepSeek V3's full open-weight release is directly actionable for Indian organisations. At 671B parameters with MoE efficiency, V3 can be quantised and deployed on a 4-GPU server for approximately Rs 50,000-80,000 in hardware cost — a one-time expenditure that provides unlimited inference capacity thereafter. For Indian companies processing high volumes of text analysis, code review, or document processing, the economics are compelling compared to per-token API costs at scale.