In quick
- Hy3 sneak peek is a 295 billion criterion Mixture-of-Experts design with just 21 billion active specifications, making it more affordable to run than the majority of competitors of comparable ability.
- On SWE-bench Validated– a coding criteria screening genuine GitHub bug repairs– it leapt from 53% (Hy2) to 74.4%, a 40% enhancement over the previous generation.
- The design is currently live throughout Tencent’s app community consisting of Yuanbao, QQ, and Tencent Docs, with API gain access to on Tencent Cloud beginning at approximately $0.18 per million input tokens.
Tencent silently dropped its most capable AI design yet on Thursday, and the benchmark numbers are difficult to neglect. Hy3 sneak peek, the business’s very first design after a complete facilities reconstruct, went open-source today throughout GitHub, Hugging Face, and ModelScope.
It’s likewise readily available on Tencent Cloud’s main site, under a paid strategy.
My3 loads 295 billion overall specifications (a measurement of a design’s prospective breadth of understanding) however just 21 billion active at any provided time. That’s the charm of a Mixture-of-Experts architecture– the design paths each question to a specialized subset of its “specialist” sub-networks rather of running whatever at the same time. Less calculate, lower expense, approximately comparable output quality. It likewise supports approximately 256,000 tokens of context, which suffices to swallow a full-length book in a single timely.
The design was developed to stabilize 3 things Tencent states it stopped compromising for each other: ability breadth, sincere assessment, and cost-efficiency. Their previous flagship, Hy2, had more than 400 billion specifications. Tencent clearly strolled that back, arguing 295 billion is the optimum sweet area where thinking totally grows however the expense of including more specifications stops settling.
This likewise does not imply the design is even worse. Designs with much better training and lower specifications outshine larger generalist ones rather often.
On coding, the enhancement is significant. SWE-bench Verified is a criteria that evaluates whether a design can in fact repair genuine bugs from GitHub repositories– not toy issues, however production code. Hy2 scored 53.0%. Hy3 sneak peek ratings 74.4%. That’s a 40% dive in one generation, landing it in series of Claude Opus 4.6 (80.8%) and above GLM-5 (77.8%) and Kimi-K2.5 (76.8%). Terminal-Bench 2.0, which determines self-governing job execution in a genuine command-line environment, went from 23.2% to 54.4%– likewise a huge leap.
The design, nevertheless, can be a really fascinating option for individuals constructing with representatives. Representatives have a really intricate set of guidelines that include memories, abilities, and tool calls. They typically miss out on something, which can destroy a workflow or produce bad outcomes. That’s why agentic abilities are ending up being a growing number of essential for AI designers as this location ends up being the most hyped thing in the market. It’s likewise why the design was right away provided on Openclaw.
Browse and searching representatives– where designs need to recover, filter, and manufacture info from the open web without human assistance– likewise enhanced greatly. On BrowseComp, a benchmark tracking complex web research study jobs, Hy3 sneak peek reached 67.1% (up from Hy2’s 28.7%). On WideSearch, it struck 70.2%, outshining GLM-5 and Kimi-K2.5 however tracking Claude Opus 4.6’s 77.2%.
In thinking, the design topped every Chinese rival on Tsinghua University’s mathematics PhD certifying examination (Spring 2026), scoring 88.4 on the average of 3 runs avg@3. That’s a real-world examination, not a curated dataset– the type of assessment Tencent states it’s focusing on to prevent benchmark video gaming. The design likewise scored 87.8 on CHSBO 2025 (China’s nationwide high school biology olympiad), greatest amongst Chinese designs because classification.
Hy3 sneak peek began training in late January 2026 and released Thursday– under 3 months from cold start to open-source release. Uncommonly quickly for a frontier-class design. Tencent associates it to a February facilities overhaul led by Yao Shunyu, its chief AI researcher, who pressed a complete reconstruct of the pretraining and support knowing stack.
This is a really various method from what Chinese AI laboratories were doing a year earlier, when DeepSeek’s R1 stunned the market with its cost-efficiency.
Hy3 still routes OpenAI and Google DeepMind’s flagships, however by the size-to-performance ratio, Hy3 sneak peek is difficult to dismiss: the representative criteria composite programs it in the “optimum zone” with ~ 295 billion specifications, ahead of DeepSeek-V3.2 (600 billion+) and matching Kimi-K2.5 (over 1 trillion specifications) at a portion of the calculate expense.
Hunyuan designs have actually currently been released throughout Yuanbao, CodeBuddy, WorkBuddy, QQ, and Tencent Docs. On CodeBuddy and WorkBuddy, first-token latency dropped 54%, end-to-end generation time fell 47%, and the design effectively ran representative workflows as long as 495 actions. Tencent Cloud is using API gain access to at around $0.18 per million input tokens and $0.59 per million output tokens, with individual Token Strategy bundles beginning at around $4.10 monthly.
Daily Debrief Newsletter
Start every day with the leading newspaper article today, plus initial functions, a podcast, videos and more.
