DeepSeek V4 Is Here—Its Pro Version Costs 98% Less Than GPT 5.5 Pro

In short

DeepSeek launched its brand-new V4-Pro design with 1.6 trillion specifications.
It costs $1.74/$ 3.48 per million input/output tokens, approximately 1/20th the rate of Claude Opus 4.7 and 98% less than GPT 5.5 Pro.
DeepSeek qualified V4 partially on Huawei Ascend chips, preventing U.S. export limitations, and states that when 950 brand-new supernodes come online later on in 2026, the Pro design’s already-low rate will drop even more.

DeepSeek is back, and it appeared a couple of hours after OpenAI dropped GPT-5.5. Coincidence? Perhaps. However if you’re a Chinese AI laboratory that the U.S. federal government has actually been attempting to decrease with chip export prohibits for the previous 3 years, your sense of timing gets quite sharp.

The Hangzhou-based laboratory launched sneak peek variations of DeepSeek-V4-Pro and DeepSeek-V4-Flash today, both open-weight, both with one million token context windows. That implies you can essentially deal with a context approximately the size of the Lord of the Rings Trilogy before the design collapses. Both are likewise priced well listed below anything equivalent in the West, and both are totally free for those efficient in running in your area.

DeepSeek’s last significant interruption– R1 in January 2025– cleaned $600 billion from Nvidia’s market cap in a single day as financier questioned whether American business actually required such substantial financial investments to produce outcomes that a little chinese laboratory accomplished with a portion of the expense. V4 is a various type of relocation: quieter, more technical, and more concentrated on performance for anybody really developing with AI.

2 designs, really various tasks

Of the 2 brand-new designs, DeepSeek’s V4-Pro is the huge one, with 1.6 trillion overall specifications. To put that in viewpoint, specifications are the internal “settings” or “brain cells” that a design utilizes to keep understanding and acknowledge patterns– the more specifications a design has, the more intricate details it can in theory hold. That makes it the greatest open-source design in the LLM market to date. The size might sound outrageous up until you discover it just triggers 49 billion of them per reasoning pass.

This is the Mixture-of-Experts technique DeepSeek has actually improved considering that V3: The complete design sits there, however just the pertinent piece of it gets up for any provided demand. More understanding, exact same calculate expense.

” DeepSeek-V4-Pro-Max, the optimum thinking effort mode of DeepSeek-V4-Pro, substantially advances the understanding abilities of open-source designs, strongly developing itself as the very best open-source design offered today,” Deepseek composed in the design’s main card on Huggingface. “It accomplishes top-tier efficiency in coding criteria and substantially bridges the space with leading closed-source designs on thinking and agentic jobs.”

V4-Flash is the useful one: 284 billion overall specifications, 13 billion active. It’s developed to be much faster, more affordable, and according to DeepSeek’s own criteria, “accomplishes equivalent thinking efficiency to the Pro variation when provided a bigger thinking spending plan.”

Both support one million tokens of context. That’s approximately 750,000 words– approximately the whole “Lord of the Rings” trilogy plus modification. Which’s as a basic function, not a premium tier.

Deepseek’s (not so) secret sauce: Making attention not dreadful at scale

Here’s the technical part for geeks or those thinking about the magic powering the design. Deepseek does not conceal its tricks, and whatever is offered free of charge– the complete paper is offered on Github.

Basic AI attention– the system that lets a design comprehend relationships in between words– has a ruthless scaling issue. Each time you double the context length, the calculate expense approximately quadruples. So running a design on a million tokens isn’t simply two times as pricey as 500,000 tokens. It’s 4 times as pricey. This is why long context has actually traditionally been a checkbox laboratories include and after that quietly throttle behind rate limitations.

DeepSeek developed 2 brand-new attention types to navigate this. The very first, Compressed Sporadic Attention, operates in 2 actions. It initially compresses groups of tokens– state, every 4 tokens– into a single entry. Then, rather of addressing all of those compressed entries, it utilizes a “Lightning Indexer” to select just the most pertinent outcomes for any provided inquiry. Your design goes from addressing a million tokens to addressing a much smaller sized set of the most crucial portions, type of like a curator who does not check out every book however understands precisely which rack to inspect.

The 2nd, Greatly Compressed Attention, is more aggressive. It collapses every 128 tokens into a single entry– no sporadic choice, simply harsh compression. You lose fine-grained information, however you get a very inexpensive international view. The 2 attention types run in rotating layers, so the design gets both the information and the introduction.

The outcome, from the technical paper: At one million tokens, V4-Pro utilizes 27% of the calculate its predecessor (V3.2) required. KV cache– the memory the design requires to track context– drops to simply 10% of V3.2. V4-Flash presses that more: 10% of calculate, 7% of memory.

And this wound up with Deepseek having the ability to use a more affordable rate per token than its rivals, while supplying equivalent outcomes. To put that in dollar terms: GPT-5.5 released the other day at $5 input and $30 output per million tokens with GPT-5.5 Pro priced at $30 per million input tokens and $180 per million output tokens.

Deepseek V4-Pro is $1.74 input and $3.48 output. V4-Flash is $0.14 input and $0.28 output. Cline CEO Saoud Rizwan mentioned that if Uber had actually utilized DeepSeek rather of Claude, its 2026 AI spending plan– supposedly enough for 4 months of use– would have lasted 7 years.

deepseek v4 is now the most inexpensive sota design offered at 1/20th the expense of opus 4.7.

for viewpoint, if uber utilized deepseek rather of claude their 2026 ai spending plan would have lasted 7 years rather of just 4 months. pic.twitter.com/i9rJZzvRBV

— Saoud Rizwan (@sdrzn) April 24, 2026

The criteria

DeepSeek does something uncommon in its technical report: It releases the spaces. The majority of design releases cherry-pick the criteria where they win. DeepSeek ran the complete contrast versus GPT-5.4 and Gemini-3.1- Pro, discovered that V4-Pro’s thinking drags those designs by about 3 to 6 months, and printed it anyhow.

Where V4-Pro-Max really wins: Codeforces, competitive programs criteria, ranked like human chess. V4-Pro scored 3,206, positioning it around 23rd amongst real human contest individuals. On Pinnacle Shortlist, a curated set of tough mathematics and STEM issues, it scored a pass rate and struck 90.2% versus Opus 4.6’s 85.9% and GPT-5.4’s 78.1%. On SWE-Verified, which determines whether a design can fix genuine GitHub concerns pulled from real open-source repositories, it scored 80.6%– matching Claude Opus 4.6.

Where it tracks: multitasking benchmark MMLU-Pro (Gemini-3.1- Pro at 91.0% vs V4-Pro at 87.5%), specialist understanding criteria GPQA Diamond (Gemini 94.3 vs V4-Pro 90.1), and Mankind’s Last Test, a graduate-level criteria where Gemini-3.1- Pro’s 44.4% still beats V4-Pro’s 37.7%.

On long context particularly, V4-Pro leads open-source designs and beats Gemini-3.1- Pro on the CorpusQA criteria (a test imitating genuine file analysis at one million tokens), however loses to Claude Opus 4.6 on MRCR– a test determining how well a design recovers particular needles buried deep in a long haystack.

Developed to run representatives, not simply address concerns

The agentic things is where this release gets fascinating for designers really delivering items.

V4-Pro can run in Claude Code, OpenCode, and other AI coding tools. According to DeepSeek’s internal study of 85 designers who utilized V4-Pro as their main coding representative, 52% stated it was prepared to be their default design, 39% favored yes, and less than 9% stated no. Internal workers stated it outshines Claude Sonnet and approaches Claude Opus 4.5 on agentic coding jobs.

Synthetic Analysis, which runs independent examinations of AI designs on real-world jobs, ranked V4-Pro initially amongst all open-weight designs on GDPval-AA– a benchmark screening financially important understanding work throughout financing, legal, and research study jobs, scored through Elo. V4-Pro-Max scored 1,554 Elo, ahead of GLM-5.1 (1,535) and MiniMax’s M2.7 (1,514). For recommendation, Claude Opus 4.6 ratings 1,619 on the exact same criteria– still ahead, however the space is closing.

DeepSeek V4 Pro is the # 1 open weights design on GDPval-AA, our agentic real-world work jobs assessment@deepseek_ai has actually launched V4 Pro (1.6 T overall/ 49B active) and V4 Flash (284B overall/ 13B active). V4 is DeepSeek’s very first brand-new size considering that V3, with all intermediate designs … pic.twitter.com/2kJWVrKQjF

— Synthetic Analysis (@ArtificialAnlys) April 24, 2026

Deepseek’s V4 likewise presents something called “interleaved thinking.” In previous designs, if you were running a representative that made numerous tool calls– state, it browsed the web, then ran some code, then browsed once again– the design’s thinking context got flushed in between rounds. Each brand-new action, the design needed to reconstruct its psychological design from scratch. V4 maintains the complete chain of believed throughout tool calls, so a 20-step representative workflow does not experience amnesia midway through. This matters more than it sounds for anybody running complex automated pipelines.

Deepseek and the U.S.-China AI war

The U.S. has actually been limiting high-end Nvidia chip exports to China considering that 2022. The mentioned objective was to slow Chinese AI advancement, however the chip restriction didn’t stop DeepSeek and rather made them develop a more effective architecture and develop out domestic hardware supply.

DeepSeek didn’t launch V4 in a vacuum– the AI area has actually been flush with activity since late: Anthropic delivered Claude Opus 4.7 on April 16– a design Decrypt checked and discovered strong on coding and thinking, with especially high token use. The day before that, Anthropic was likewise resting on Claude Mythos, a cybersecurity design it states it can’t launch openly due to the fact that it’s too proficient at self-governing network attacks.

Xiaomi dropped MiMo V2.5 Pro on April 22, going complete multimodal– image, audio, video. Expenses $1 input and $3 output per million tokens. It matches Opus 4.6 on a lot of coding criteria. 3 months back, no one was speaking about Xiaomi as a frontier AI business. Now it’s delivering competitive designs much faster than a lot of Western laboratories.

OpenAI’s GPT-5.5 landed the other day with expenses surging as much as $180 per million tokens of output in the Pro variation. It beats V4-Pro on Terminal Bench 2.0 (82.7% vs 70.0%), which evaluates intricate command-line representative workflows. However it costs substantially more than V4-Pro for comparable jobs. That exact same day Tencent launched Hy3, another modern design concentrated on performance.

What this implies for you

So with numerous brand-new designs offered, the concern designers are really asking: When is the premium worth it?

For business, the mathematics might have altered. A design that leads open-source criteria at $1.74 per million input tokens implies massive file processing, legal evaluation, or code generation pipelines that were pricey 6 months back are now more affordable. The one-million-token context implies you can feed whole codebases or regulative filings in a single demand rather of chunking them throughout numerous calls.

Besides, its open-source nature implies it can not just be run for totally free on regional hardware, however it can be personalized and enhanced based upon the business’s requirements and utilize cases.

For designers and solo home builders, V4-Flash is the one to enjoy. At $0.14 input and $0.28 output, it’s more affordable than designs that were thought about spending plan choices a year back– and it manages most jobs the Pro variation manages. DeepSeek’s existing deepseek-chat and deepseek-reasoner endpoints currently path to V4-Flash in non-thinking and believing modes respectively, so if you’re on the API, you’re currently utilizing it.

The designs are text-only in the meantime. DeepSeek stated it’s dealing with multimodal abilities, which implies other huge laboratories from Xiaomi to OpenAI still have that edge. Both designs are MIT accredited and offered on Hugging Face today. The old deepseek-chat and deepseek-reasoner endpoints retire on July 24, 2026.

Daily Debrief Newsletter

Start every day with the leading newspaper article today, plus initial functions, a podcast, videos and more.

Source

Sungrow Hosts GRES 2026, Showcasing Value-Driven Innovation Across Full Energy Scenarios

Sungrow Hosts GRES 2026, Showcasing Value-Driven Innovation Across Full Energy Scenarios

New Jersey American Water Proudly Recognizes American Water Charitable Foundation 2026 Water and Environm

Greenland Energy Announces Pricing of $70 Million Public Offering – Greenland Energy (NASDAQ:GLND)

Bitcoin Rally From February Lows Driven by Regular Strategy Buys

Bitcoin Rally From February Lows Driven by Regular Strategy Buys

Bitcoin Drops Under $76K As Investors Weigh Regulatory, AI Risk

RedStone Launches Settlement Layer to Address RWA Liquidity Gap in DeFi Lending

‘Fintechs are a force for social good,’ says QED’s Nigel Morris

‘Fintechs are a force for social good,’ says QED’s Nigel Morris

AI Agent Deletes Startup’s Database in 9 Seconds, Founder Says

Google DeepMind Veteran Raises $1.1 Billion to Build AI That Isn’t Trained With Human Data

Defense stocks have floundered since the Iran war began. Here’s why

Defense stocks have floundered since the Iran war began. Here’s why

This name powering AI reports earnings after the bell. Watch these levels, according to the charts

Josh Brown likes next-generation aviation stock as long-shot play on potential electric aircraft boom

AI Agent Deletes Startup’s Database in 9 Seconds, Founder Says

Google DeepMind Veteran Raises $1.1 Billion to Build AI That Isn’t Trained With Human Data

Dead Internet? A Third of New Websites Are AI-Generated, Says Stanford

OpenClaw Insider Builds the Enterprise Safety Layer the Project Never Shipped

Google Signs AI Deal With Pentagon for Classified Work as Employees Object

Elon Musk attorney claims OpenAI, Sam Altman ‘stole a charity’ as high-stakes legal fight begins

‘Fintechs are a force for social good,’ says QED’s Nigel Morris

AI Agent Deletes Startup’s Database in 9 Seconds, Founder Says

Bitcoin Drops Under $76K As Investors Weigh Regulatory, AI Risk

Google DeepMind Veteran Raises $1.1 Billion to Build AI That Isn’t Trained With Human Data

Dead Internet? A Third of New Websites Are AI-Generated, Says Stanford

Defense stocks have floundered since the Iran war began. Here’s why

OpenClaw Insider Builds the Enterprise Safety Layer the Project Never Shipped

This name powering AI reports earnings after the bell. Watch these levels, according to the charts

RedStone Launches Settlement Layer to Address RWA Liquidity Gap in DeFi Lending

Popular News

Metaplanet Raises $50M in Zero-Interest Bonds to Buy Bitcoin

ETH Buy Pressure Hits $5.5B As Price Nears Key Breakout

XRP Eyes 30% Gains as Exchange Outflows Hit 35M Tokens in a Day

DeepSeek V4 Is Here—Its Pro Version Costs 98% Less Than GPT 5.5 Pro

In short

2 designs, really various tasks

Deepseek’s (not so) secret sauce: Making attention not dreadful at scale

The criteria

Developed to run representatives, not simply address concerns

Deepseek and the U.S.-China AI war

What this implies for you

Daily Debrief Newsletter

Related Articles

Subscribe to Updates