If you have actually been following the regional AI scene, you most likely understand Qwopus– the open-source design that attempted to boil down Claude Opus 4.6’s thinking into Alibaba’s Qwen, so you might run something looking like Opus by yourself hardware totally free. It worked remarkably well. The apparent catch: Qwen is a Chinese design, and not everybody is comfy with that.
Jackrong, the exact same pseudonymous designer behind that job, heard the feedback. His response is Gemopus– a brand-new household of Claude Opus-style tweaks constructed totally on Google’s open-source Gemma 4. All-American DNA, exact same concept: frontier-level thinking, running in your area on hardware you currently own.
The household is available in 2 tastes. Gemopus-4-26B-A4B is the much heavier alternative– a Mix of Professionals design that has 26 billion overall specifications however just triggers around 4 billion throughout reasoning, which indicates it punches well above its weight on constrained hardware.
Criteria are what figure out an AI’s capability to discover, factor, and shop details. Having 26 billion overall specifications provides the design a substantial breadth of understanding. However by just “awakening” the 4 billion specifications appropriate to your particular timely, it provides the premium outcomes of an enormous AI while staying light-weight adequate to run efficiently on daily hardware.
The other is Gemopus-4-E4B, a 4-billion specification edge design crafted to run easily on a contemporary iPhone or a thin-and-light MacBook– no GPU needed.
The base design option matters here. Google’s Gemma 4, launched on April 2, is constructed straight from the exact same research study and innovation as Gemini 3– the business stated so clearly at launch. That indicates Gemopus brings something no Qwen-based fine-tune can declare: The DNA of Google’s own advanced closed design under the hood, covered in Anthropic’s believing design on top. The very best of both worlds, basically.
What makes Gemopus various from the wave of other Gemma tweaks flooding Hugging Face today is the approach behind it. Jackrong intentionally selected not to require Claude’s chain-of-thought thinking traces into Gemma’s weights– a faster way most competing releases take.
His argument, backed by current research study, is that packing a trainee design with an instructor’s surface-level thinking text does not really move genuine thinking capability. It teaches replica, not reasoning. “There is no requirement for extreme creativity or superstitious duplication of the Claude-style chain of idea,” the design card checks out. Rather, he concentrated on response quality, structural clearness, and conversational naturalness– repairing Gemma’s stiff Wikipedia tone and its propensity to lecture you about things you didn’t ask.
AI facilities engineer Kyle Hessling ran independent standards and released the outcomes straight on the design card. His decision on the 26B variation was quite beneficial. “Delighted to have actually benched this one quite hard and it is an outstanding finetune of a currently extraordinary design,” he composed on X. “It rocks at one-shot demands over long contexts, and runs extremely quick thanks to the MOE (mix of professionals) architecture.”
Gemopus-4-26B-A4B from Jackrong is LIVE!
Delighted to have actually benched this one quite hard (see my benches in the design card) and it is an outstanding finetune of a currently extraordinary design! My pal Jackrong is constantly preparing the best!
It rocks at one-shot demands over long …
— Kyle Hessling (@KyleHessling1) April 10, 2026
The smaller sized E4B variation passed all 14 core proficiency tests– direction following, coding, mathematics, multi-step thinking, translation, security, caching– and cleared all 12 long-context tests at 30K and 60K tokens. On needle-in-haystack retrieval, it passed 13 out of 13 probes consisting of a stretch test at one million tokens with YaRN 8 × RoPE scaling.
The 26B extends natively to 131K context and all the escape to 524K with YaRN, which Hessling likewise stress-tested: “It likewise squashed my easy needle-in-the-haystack tests all the method out to a prolonged context of 524k!”
On edge hardware, the E4B is truly quickly. Jackrong reports 45– 60 tokens per second on iPhone 17 Pro Max, and 90– 120 tokens per second on MacBook Air M3/M4 through MLX. The 26B MoE architecture indicates it unloads with dignity on combined memory systems or GPUs with under 10GB of VRAM. Hessling called it his everyday chauffeur suggestion for VRAM-starved setups.
Both designs are offered in GGUF format, which indicates you can drop them directly into LM Studio or llama.cpp without setup. The complete training code and a detailed fine-tuning guide are on Jackrong’s GitHub– exact same pipeline he utilized for Qwopus, exact same Unsloth and LoRA setup, reproducible on Colab.
Gemopus is not without its rough edges. Tool calling stays damaged throughout the whole Gemma 4 series in llama.cpp and LM Studio– call failures, format inequalities, loops– so if your workflow depends upon representatives utilizing external tools, this is not your design yet. Jackrong himself calls it “an engineering expedition recommendation instead of a totally production-ready service,” and advises his own Qwopus 3.5 series for anybody who requires something more steady genuine work.
And due to the fact that Jackrong intentionally prevented aggressive Claude-style chain-of-thought distillation, do not anticipate it to feel as deeply Opus-brained as Qwopus– that was a mindful tradeoff for stability, not an oversight.
Yeah the approach on this one was stability initially, it is my understanding that the Gemma designs tend to end up being unsteady if you require a lot of Claude believing traces into them, you can see this when checking numerous other Opus gemma tweak on hugging face.
Jackrong attempted a.
— Kyle Hessling (@KyleHessling1) April 10, 2026
For those who wish to go deeper into Gemma fine-tuning for thinking particularly, there is likewise a different neighborhood job worth viewing: Ornstein by pseudonmyous designer DJLougen, which takes the exact same 26B Gemma 4 base and focuses particularly on enhancing its thinking chains without depending on the reasoning or design of any particular 3rd party design.
One truthful caution: Gemma’s training characteristics are messier than Qwen’s for fine-tuners– broader loss changes, more hyperparameter level of sensitivity. Jackrong states so himself. If you require a more battle-tested regional design for production workflows, his Qwopus 3.5 series stays more robustly verified. However if you desire an American design with Opus-style polish, Gemopus is presently your finest offered alternative. A denser 31B Gemopus variation is likewise in the pipeline, with Hessling teasing it as “a banger for sure.”
If you wish to attempt running regional designs on your own hardware, examine our guide on how to start with regional AI.
Daily Debrief Newsletter
Start every day with the leading newspaper article today, plus initial functions, a podcast, videos and more.
