In quick
- AI engineer Kyle Hessling combined 2 of Jackrong’s Claude Opus 4.6 and GLM-5.1 distilled finetunes into a single “frankenmerge.”
- A post-merge “recover fine-tune” was needed to repair garbled code output brought on by the layer limit in between the 2 independently-trained designs.
- The design over-reasons on some jobs, however it’s an understandable issue.
You believed Qwopus was cool since it combined Qwen and Opus? Well, Kyle Hessling, an AI engineer with a great deal of understanding and downtime simply took that dish and tossed GLM– among the very best thinking designs out there– into the mix. The outcome is an 18 billion specification frankenmerge that fits on a low-cost GPU and outshines Alibaba’s most recent 35B design.
For those who do not understand, criteria are the mathematical worths baked into a neural network throughout training, like dials that a neural network can change– the more of them, the more understanding and intricacy the design can manage, and the more memory it requires to run.
Hessling, an AI facilities engineer, stacked 2 of Jackrong’s Qwen3.5 finetunes on top of each other: layers 0 through 31 from Qwopus 3.5-9B-v3.5, which distills Claude 4.6 Opus’s thinking design into Qwen as a base design, and layers 32 through 63 from Qwen 3.5-9B-GLM5.1- Distill-v1, trained on thinking information from z.AI’s GLM-5.1 instructor design on top of the exact same Qwen base.
The hypothesis: Offer the design Opus-style structured preparation in the very first half of the thinking and GLM’s issue decay scaffold in the 2nd– 64 layers overall, in one design.
The strategy is called a passthrough frankenmerge– no mixing, no averaging of weights, simply raw layer stacking. Hessling needed to compose his own combine script from scratch since existing tools do not support Qwen 3.5’s hybrid linear/full attention architecture. The resulting design passed 40 out of 44 ability tests, beating Alibaba’s Qwen 3.6-35B-A3B MoE– which needs 22 GB of VRAM– while operating on simply 9.2 GB in Q4_K_M quantization.
An NVIDIA RTX 3060 manages it great … in theory.
Hessling discusses that making this design wasn’t simple. The raw combine utilized to toss garbled code. However nevertheless, the test designs he released went sort of viral amongst lovers.
Hessling’s last repair was a “recover fine-tune”– essentially a QLoRA (a little bit of code that is embedded into the design like an appendix and greatly conditions the last output) targeting all attention and forecasts.
We attempted it, and despite the fact that the concept of having Qwen, Claude Opus, and GLM 5.1 running in your area in our potato is beyond appealing, in truth we discovered that the design is so proficient at thinking through things that it winds up overthinking.
When checked it on an M1 MacBook running an MLX quantized variation (a design enhanced to operate on Macs). When triggered to create our typical test video game, the thinking chain ran so long it struck the token limitation and offered us a good long piece of thinking without a working lead to an absolutely no shot interaction. That’s a daily-use blocker for anybody wishing to run this in your area on customer hardware for any severe application.
We went a bit softer and things still were difficult. An easy “compose a Snake video game” timely took control of 40 minutes in thinking … great deals of it.
You can see the lead to our Github repository.
This is a recognized stress in the Qwopus family tree: Jackrong’s v2 finetunes were developed to deal with Qwen 3.5’s propensity towards repeated internal loops and “believe more financially.” Stacking 64 layers of 2 thinking distills appears to magnify that habits on particular triggers.
That’s an understandable issue, and the open-source neighborhood will likely resolve it. What matters here is the more comprehensive pattern: a pseudonymous designer releases specialized finetunes with complete training guides, another lover stacks them with a customized script, runs 1,000 recovery actions, and lands a design that outshines a 35 billion specification release from among the world’s biggest AI laboratories. The entire thing suits a little file.
This is what makes open-source worth seeing– not simply the huge laboratories launching weights, however the layer-by-layer options, the expertise taking place listed below the radar. The space in between a weekend task and a frontier implementation is narrower the more designers sign up with the neighborhood.
Jackrong has actually given that mirrored Hessling’s repository, and the design had actually collected over 3 thousand downloads within its very first 2 weeks of schedule.
Daily Debrief Newsletter
Start every day with the leading newspaper article today, plus initial functions, a podcast, videos and more.
