Blog
Back to all posts

Kimi K2.5: 1T Open-Source Model with Agent Swarms

Moonshot AI has released Kimi K2.5 — and the model is genuinely impressive. An open-source model with 1 trillion parameters that ranks #1 among open-source models on Artificial Analysis's intelligence ranking, with unique capabilities that set it apart from everything else.

What makes it special.

The Numbers

MetricValue
Total Parameters1 Trillion
Active Parameters32B (MoE)
Context Window256K Tokens
Intelligence Rank#1 / 60 on Artificial Analysis
Speed113.9 tok/s (#4)
API Cost$0.60/M Input, $3.00/M Output

K2.5 achieves a score of 47 on the Artificial Analysis Intelligence Index — well above the average of 24. It is the highest-ranked model on this leaderboard.

Video-to-Website Generation

This is the headline feature. You record a screen video of a UI interaction — animations, transitions, hover effects — and K2.5 generates the complete code.

No detailed description needed. Just: "Clone this website with all UX designs."

The model analyzes the video, extracts interaction logic and visual styles, and outputs functional HTML/CSS/JS including animations. In tests it even added details that surpassed the original.

This works because K2.5 is natively multimodal — trained from scratch on 15 trillion mixed visual and text tokens, not a text model with a vision module bolted on.

Agent Swarm: 100 Parallel Agents

Agent Swarm Architecture

K2.5 can spawn up to 100 sub-agents in parallel, coordinating up to 1,500 tool calls without human intervention. No predefined workflows — the orchestrator dynamically creates specialized agents based on the task.

Ask it to research a topic, and it might spawn:

  • InferenceStackResearcher
  • QuantizationHardwareResearcher
  • CostControlResearcher
  • FactChecker

Each agent uses tools independently — searching, browsing, analyzing — then results are merged at the orchestrator.

Result: 4.5× faster execution compared to single-agent approaches.

This was trained with PARL (Parallel-Agent Reinforcement Learning), which specifically teaches the model to avoid "Serial Collapse" — the tendency of multi-agent systems to fall back to sequential execution.

Four Operating Modes

Moonshot offers K2.5 in four modes on Kimi.com and the Kimi app:

  • K2.5 Instantfast, lightweight responses
  • K2.5 Thinkingextended reasoning with chain-of-thought
  • K2.5 Agentsingle agent with preconfigured tools (search, code interpreter, web browsing)
  • K2.5 Agent Swarm (Beta)the full parallel orchestration system

Agent Swarm is currently in beta with free credits for top-tier paying users. The separation makes sense — you pick the complexity level that fits the task instead of paying for orchestration overhead on simple questions.

Kimi Code: Open-Source Coding Assistant

Alongside K2.5, Moonshot released Kimi Code — an open-source coding assistant that runs in the terminal and integrates with VSCode, Cursor, and Zed.

What makes it interesting: Kimi Code accepts images and videos as input, leveraging K2.5's native multimodal capabilities. It also automatically detects existing skills and MCPs and migrates them into the workspace.

The standout demo feature is autonomous visual debugging. K2.5 generates UI code, visually inspects its own output, looks up documentation, and iterates — all without human intervention. In one example it translated the aesthetics of Matisse's La Danse into a fully designed webpage — start to finish.

Pricing: $15–$200/month depending on usage tier, cached input tokens at $0.10/M.

Office Productivity

K2.5 Agent can handle extensive office work end-to-end — documents, spreadsheets, PDFs, and presentations generated directly through conversation.

On Moonshot's internal benchmarks, K2.5 shows a 59.3% improvement on the AI Office Benchmark and 24.3% on the General Agent Benchmark vs. K2 Thinking. Specific capabilities:

  • Insert annotations in Word documents
  • Build financial models with pivot tables
  • Write LaTeX equations in PDFs
  • Generate 10,000-word articles or 100-page documents

Tasks that used to take hours or days are now done in minutes. These are internal benchmarks, so treat them with appropriate skepticism — but the direction is clear: K2.5 is positioned as a knowledge worker, not just a chatbot.

Context Stability

K2.5 achieves 69.4% on LongBench-V2 with 128K context, outperforming GPT-5.2 (54.5%) and Gemini 3 Pro (68.2%).

The 256K-token context window handles complex long-horizon tasks with stable tool use across 200–300 sequential calls. When the context fills up, K2.5 fades out earlier tool outputs to stay within the limit.

K2.5 also performs well on Fiction.LiveBench — a benchmark that tests real narrative comprehension, not just simple retrieval. Unlike 'Needle in a Haystack' tests, it evaluates theory of mind, event chronology, and implicit inferences across long stories.

Model01k4k8k16k32k60k120k192k
gpt-5.210010010097.210097.297.210096.9
kimi-k2.510010010088.986.188.989.878.187.5
gemini-3-pro10010010097.296.694.410096.996.9
claude-opus-4-587.510094.497.291.794.497.293.880.0

K2.5 holds strong scores across all context lengths, with 87.5% at 192K tokens. This matters for agentic tasks where coherent understanding across long sessions is critical.

Cost Efficiency

Cost vs Performance

K2.5 is dramatically cheaper than the competition at comparable performance:

  • 5.1× savings on SWE-Verified vs. GPT-5.2
  • 21.1× savings on BrowseComp
  • 10.1× savings on HLE Benchmark

At $0.60 per million input tokens it is approximately 9× cheaper than Claude Opus 4.5.

The Catch: VRAM Requirements

Here's the reality check. Although the MoE architecture only activates 32B parameters per token, the entire 1T model must remain in memory for token routing.

QuantizationRequired VRAM
FP16~2 TB
Q8~1.09 TB
Q4_K_M~621 GB
2-bit~374 GB
1.58-bit~240 GB

Even the most aggressive quantization requires 240 GB+ RAM for local deployment. That means either enterprise hardware (4× H100) or massive system RAM with CPU offloading.

For most people the API at $0.60/M tokens is the practical choice.

When to Use K2.5

K2.5 shines at:

  • Agentic automation and multi-step workflows
  • Vision-to-code tasks (screenshots, videos)
  • Web browsing and research (74.9 on BrowseComp)
  • Cost-sensitive production deployments

Look elsewhere for:

  • Pure code quality (Claude Opus 4.5 still leads on SWE-Bench)
  • Maximum reasoning capability (GPT-5.2 has slight advantages)
  • Local operation without enterprise hardware

Conclusion

Kimi K2.5 marks a genuine step forward for open-source models. The combination of native multimodality, agent orchestration, and competitive benchmark scores — at a fraction of proprietary prices — makes it a serious option for agentic workflows.

Just don't expect to run it on your laptop.


Links: