Moonshot AI, the Chinese language AI lab behind the Kimi assistant, immediately open-sourced Kimi K2.6 — a local multimodal agentic mannequin that pushes the boundaries of what an AI system can do when left to run autonomously on laborious software program engineering issues. The discharge targets sensible deployment eventualities: long-running coding brokers, front-end technology from pure language, massively parallel agent swarms coordinating a whole lot of specialised sub-agents concurrently, and a brand new open ecosystem the place people and brokers from any gadget collaborate on the identical activity. The mannequin is out there now on Kimi.com, the Kimi App, the API, and Kimi Code CLI. Weights are revealed on Hugging Face underneath a Modified MIT License.
What Form of Mannequin is This, Technically?
Kimi K2.6 is a Combination-of-Specialists (MoE) mannequin — an structure that’s grow to be more and more dominant at frontier scale. As a substitute of activating all of a mannequin’s parameters for each token it processes, a MoE mannequin routes every token to a small subset of specialised ‘experts.’ This lets you construct a really giant mannequin whereas conserving inference compute tractable.
Kimi K2.6 has 1 trillion whole parameters, however solely 32 billion are activated per token. It has 384 consultants in whole, with 8 chosen per token, plus 1 shared skilled that’s all the time lively. The mannequin has 61 layers (together with one dense layer), makes use of an consideration hidden dimension of seven,168, a MoE hidden dimension of two,048 per skilled, and 64 consideration heads.
Past textual content, K2.6 is a native multimodal mannequin — that means imaginative and prescient is baked in architecturally, not bolted on. It makes use of a MoonViT imaginative and prescient encoder with 400M parameters and helps picture and video enter natively. Different architectural particulars: it makes use of Multi-head Latent Consideration (MLA) as its consideration mechanism, SwiGLU because the activation operate, a vocabulary dimension of 160K tokens, and a context size of 256K tokens.
For deployment, K2.6 is advisable to run on vLLM, SGLang, or KTransformers. It shares the identical structure as Kimi K2.5, so present deployment configurations may be reused instantly. The required transformers model is >=4.57.1, .
The Lengthy-Horizon Coding Headline Numbers
The metric that can seemingly get probably the most consideration from dev groups is SWE-Bench Professional — a benchmark testing whether or not a mannequin can resolve real-world GitHub points in skilled software program repositories.
Kimi K2.6 scores 58.6 on SWE-Bench Professional, in comparison with 57.7 for GPT-5.4 (xhigh), 53.4 for Claude Opus 4.6 (max effort), 54.2 for Gemini 3.1 Professional (pondering excessive), and 50.7 for Kimi K2.5. On SWE-Bench Verified it scores 80.2, sitting inside a decent band of top-tier fashions.
On Terminal-Bench 2.0 utilizing the Terminus-2 agent framework, K2.6 achieves 66.7, in comparison with 65.4 for each GPT-5.4 and Claude Opus 4.6, and 68.5 for Gemini 3.1 Professional. On LiveCodeBench (v6), it scores 89.6 vs. Claude Opus 4.6’s 88.8.
Maybe probably the most putting quantity for agentic workloads is Humanity’s Final Examination (HLE-Full) with instruments: K2.6 scores 54.0 — main each mannequin within the comparability, together with GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Professional (51.4). HLE is extensively thought-about one of many hardest information benchmarks, and the with-tools variant particularly checks how effectively a mannequin can leverage exterior sources autonomously. Internally, Moonshot evaluates long-horizon coding positive aspects utilizing their Kimi Code Bench, an inner benchmark overlaying various, difficult end-to-end duties throughout languages and domains, the place K2.6 demonstrates vital enhancements over K2.5.
What 13 Hours of Autonomous Coding Really Seems Like
Two engineering case research within the launch doc what ‘long-horizon coding’ means in follow.
Within the first, Kimi K2.6 efficiently downloaded and deployed the Qwen3.5-0.8B mannequin regionally on a Mac, then applied and optimized mannequin inference in Zig — a extremely area of interest programming language — demonstrating distinctive out-of-distribution generalization. Throughout 4,000+ device calls, over 12 hours of steady execution, and 14 iterations, K2.6 improved throughput from roughly 15 to roughly 193 tokens/sec, finally attaining speeds roughly 20% sooner than LM Studio.
Within the second, Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source monetary matching engine. Over a 13-hour execution, the mannequin iterated by means of 12 optimization methods, initiating over 1,000 device calls to exactly modify greater than 4,000 traces of code. Appearing as an skilled methods architect, K2.6 analyzed CPU and allocation flame graphs to pinpoint hidden bottlenecks and reconfigured the core thread topology from 4ME+2RE to 2ME+1RE — extracting a 185% medium throughput leap (from 0.43 to 1.24 MT/s) and a 133% efficiency throughput achieve (from 1.23 to 2.86 MT/s).
Agent Swarms: Scaling Horizontally, Not Simply Vertically
Considered one of K2.6’s most architecturally attention-grabbing capabilities is its Agent Swarm — an method to parallelizing advanced duties throughout many specialised sub-agents, moderately than counting on a single, deeper reasoning chain.
The structure scales horizontally to 300 sub-agents executing throughout 4,000 coordinated steps concurrently, a considerable growth from K2.5’s 100 sub-agents and 1,500 steps. The swarm dynamically decomposes duties into heterogeneous subtasks — combining broad net search with deep analysis, large-scale doc evaluation with long-form writing, and multi-format content material technology in parallel — then delivers consolidated outputs together with paperwork, web sites, slides, and spreadsheets inside a single autonomous run. The swarm additionally introduces a concrete Expertise functionality: it might probably convert any high-quality PDF, spreadsheet, slide, or Phrase doc right into a reusable Talent. K2.6 captures and maintains the doc’s structural and stylistic DNA, permitting it to breed the identical high quality and format in future duties — consider it as instructing the swarm by instance moderately than immediate.
Concrete demonstrations embody: a 100-sub-agent run that matched a single uploaded CV towards 100 related roles in California and delivered 100 absolutely personalized resumes; one other that recognized 30 retail shops in Los Angeles with out web sites from Google Maps and generated touchdown pages for every; and one which turned an astrophysics paper right into a reusable tutorial ability after which produced a 40-page, 7,000-word analysis paper alongside a structured dataset with 20,000+ entries and 14 astronomy-grade charts.
On the BrowseComp benchmark in Agent Swarm mode, K2.6 scores 86.3 in comparison with 78.4 for Kimi K2.5. On DeepSearchQA (f1-score), K2.6 scores 92.5 towards 78.6 for GPT-5.4.
Carry Your Personal Brokers: Claw Teams
Past Moonshot’s personal swarm infrastructure, K2.6 introduces Claw Teams as a analysis preview — a brand new characteristic that opens the agent swarm structure to an exterior, heterogeneous ecosystem.
The important thing design precept: a number of brokers and people function as real collaborators in a shared operational house. Customers can onboard brokers from any gadget, working any mannequin, every carrying their very own specialised toolkits, abilities, and chronic reminiscence contexts — whether or not deployed on native laptops, cellular gadgets, or cloud cases. On the heart of this swarm, K2.6 serves as an adaptive coordinator: it dynamically matches duties to brokers based mostly on their particular ability profiles and out there instruments, detects when an agent encounters failure or stalls, mechanically reassigns the duty or regenerates subtasks, and manages the complete lifecycle of deliverables from initiation by means of validation to completion.
Moonshot has been utilizing Claw Teams internally to run their very own content material manufacturing and launch campaigns, with specialised brokers together with Demo Makers, Benchmark Makers, Social Media Brokers, and Video Makers working in parallel — with K2.6 coordinating the method. For devs interested by multi-agent orchestration architectures, that is price trying into: it represents a shift from ‘AI does tasks for you’ to ‘AI coordinates a team of heterogeneous agents, some of which you built, on your behalf.’
Proactive Brokers: 5 Days of Autonomous Operation
K2.6 demonstrates sturdy efficiency in persistent, proactive brokers equivalent to OpenClaw and Hermes, which function throughout a number of functions with steady, 24/7 execution. These workflows require AI to proactively handle schedules, execute code, and orchestrate cross-platform operations with out human oversight.
Moonshot’s personal RL infrastructure staff used a K2.6-backed agent that operated autonomously for five days, managing monitoring, incident response, and system operations — demonstrating persistent context, multi-threaded activity dealing with, and full-cycle execution from alert to decision.
Efficiency on this regime is measured by an inner Claw Bench, an analysis suite spanning 5 domains: Coding Duties, IM Ecosystem Integration, Info Analysis & Evaluation, Scheduled Process Administration, and Reminiscence Utilization. Throughout all 5, K2.6 considerably outperforms K2.5 in activity completion charges and power invocation accuracy — notably in workflows requiring sustained autonomous operation with out human oversight.
Two Operational Modes: Pondering and Instantaneous
For devs integrating by way of API, K2.6 exposes two inference modes that matter for latency/high quality tradeoffs:
Pondering mode prompts full chain-of-thought reasoning — the mannequin causes by means of an issue earlier than producing a remaining reply. That is advisable for advanced coding and agentic duties, with a advisable temperature of 1.0. There may be additionally a protect pondering mode, which retains full reasoning content material throughout multi-turn interactions and enhances efficiency in coding agent eventualities — disabled by default, however price enabling when constructing brokers that want to keep up coherent reasoning state throughout many steps.
Instantaneous mode disables prolonged reasoning for lower-latency responses. To make use of Instantaneous mode by way of the official API, go {'pondering': {'kind': 'disabled'}} in extra_body. For vLLM or SGLang deployments, go {'chat_template_kwargs': {"thinking": False}} as an alternative, with a advisable temperature of 0.6 and top-p of 0.95.
Key Takeaways
- Kimi K2.6 is a 1-trillion-parameter, native multimodal MoE mannequin with solely 32B parameters activated per token, launched absolutely open-source underneath a Modified MIT License.
- K2.6 leads all frontier fashions on HLE-Full with instruments (54.0), outperforming GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Professional (51.4) on one among AI’s hardest agentic benchmarks.
- In real-world checks, K2.6 autonomously overhauled an 8-year-old monetary matching engine over 13 hours, delivering a 185% medium throughput leap and a 133% efficiency throughput achieve.
- The Agent Swarm structure scales to 300 sub-agents executing 4,000 coordinated steps concurrently, and might convert any PDF, spreadsheet, or slide right into a reusable Talent that preserves structural and stylistic DNA.
- Claw Teams, launched as a analysis preview, lets people and brokers from any gadget working any mannequin collaborate in a shared swarm, with K2.6 serving as an adaptive coordinator that dynamically assigns duties, detects failures, and manages full supply lifecycles.
Take a look at the Model Weights, API Access and Technical details. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Have to companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Connect with us

