Chinese AI lab DeepSeek has returned with its most ambitious release yet. On April 24, 2026, the company unveiled DeepSeek-V4-Pro and DeepSeek-V4-Flash — two preview models that together constitute the first release in its hotly anticipated V4 series. Both models are open-sourced under the permissive MIT license, continuing DeepSeek's pattern of releasing powerful models that the broader research community can freely use and build upon.

The headline figure is scale. DeepSeek-V4-Pro is a Mixture-of-Experts (MoE) model with 1.6 trillion total parameters, of which 49 billion are active at any given inference step. This makes it the largest open-weights model ever released, surpassing Moonshot AI's Kimi K2.6 (1.1T) and GLM-5.1 (754B), and more than doubling the size of DeepSeek's own V3.2 (685B). The smaller sibling, DeepSeek-V4-Flash, has 284 billion total parameters with 13 billion active — a more practical size for researchers and developers who want to run the model locally.

A Million Tokens for Everyone

Both V4 models support a 1 million token context window — a capability that was, until recently, the exclusive preserve of Google's Gemini and OpenAI's GPT-5 series. For developers building applications that need to process entire codebases, lengthy legal documents, or large research corpora in a single pass, this context length is transformative. DeepSeek's efficiency innovations mean that even at 1M token context, the computational cost per token is dramatically lower than competing models.

The efficiency story is perhaps the most remarkable aspect of V4. According to the technical paper released alongside the models, DeepSeek-V4-Pro in a 1M-token context scenario uses only 27% of the single-token FLOPs of DeepSeek-V3.2, and just 10% of the KV cache size. DeepSeek-V4-Flash pushes this further, achieving 10% of the FLOPs and 7% of the KV cache of V3.2 in the same scenario. These are not incremental improvements — they represent a fundamental rethinking of how large models handle long contexts.

Data Visualization

DeepSeek V4 vs Frontier Model Pricing ($/M tokens, Input)

GPT-5.4 NanoDeepSeek V4 ProGemini 3.1 ProGPT-5.4Claude Sonnet 4.6GPT-5.502468
  • Input Price ($/M)
DeepSeek V4 Flash is the cheapest small model on the market at $0.14/M input tokens. V4 Pro at $1.74/M is the cheapest large frontier model, undercutting GPT-5.4 by 30% and Claude Sonnet 4.6 by 42%.

Benchmark Performance: Almost at the Frontier

DeepSeek's self-reported benchmarks show V4-Pro competitive with GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks, though the paper acknowledges that performance 'falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.' For most practical applications, this gap is unlikely to matter — and the price differential more than compensates.

The models are available via API through DeepSeek's own platform and through OpenRouter, with Flash priced at $0.14/M input and $0.28/M output, and Pro at $1.74/M input and $3.48/M output. For context, GPT-5.5 costs $5/M input and $30/M output. DeepSeek V4 Pro delivers comparable performance at roughly one-third the input cost and one-ninth the output cost.

The geopolitical dimension of this release should not be overlooked. DeepSeek has developed V4 in an environment of US export controls on advanced semiconductors to China. The models are reportedly optimized to run efficiently on Huawei's Ascend chips — an alternative to NVIDIA's restricted H100/H200 GPUs. If DeepSeek can continue to match Western frontier models while operating under these constraints, it raises fundamental questions about the effectiveness of export controls as a tool of AI competition policy.