Moonshot AI has released Kimi K2.5 — and the model is genuinely impressive. An open-source model with 1 trillion parameters that ranks #1 among open-source models on Artificial Analysis's intelligence ranking, with unique capabilities that set it apart from everything else.
What makes it special.
The Numbers
| Metric | Value |
|---|---|
| Total Parameters | 1 Trillion |
| Active Parameters | 32B (MoE) |
| Context Window | 256K Tokens |
| Intelligence Rank | #1 / 60 on Artificial Analysis |
| Speed | 113.9 tok/s (#4) |
| API Cost | $0.60/M Input, $3.00/M Output |
K2.5 achieves a score of 47 on the Artificial Analysis Intelligence Index — well above the average of 24. It is the highest-ranked model on this leaderboard.
Video-to-Website Generation
This is the headline feature. You record a screen video of a UI interaction — animations, transitions, hover effects — and K2.5 generates the complete code.
No detailed description needed. Just: "Clone this website with all UX designs."
The model analyzes the video, extracts interaction logic and visual styles, and outputs functional HTML/CSS/JS including animations. In tests it even added details that surpassed the original.
This works because K2.5 is natively multimodal — trained from scratch on 15 trillion mixed visual and text tokens, not a text model with a vision module bolted on.
Agent Swarm: 100 Parallel Agents

K2.5 can spawn up to 100 sub-agents in parallel, coordinating up to 1,500 tool calls without human intervention. No predefined workflows — the orchestrator dynamically creates specialized agents based on the task.
Ask it to research a topic, and it might spawn:
InferenceStackResearcherQuantizationHardwareResearcherCostControlResearcherFactChecker
Each agent uses tools independently — searching, browsing, analyzing — then results are merged at the orchestrator.
Result: 4.5× faster execution compared to single-agent approaches.
This was trained with PARL (Parallel-Agent Reinforcement Learning), which specifically teaches the model to avoid "Serial Collapse" — the tendency of multi-agent systems to fall back to sequential execution.
Four Operating Modes
Moonshot offers K2.5 in four modes on Kimi.com and the Kimi app:
- K2.5 Instant — fast, lightweight responses
- K2.5 Thinking — extended reasoning with chain-of-thought
- K2.5 Agent — single agent with preconfigured tools (search, code interpreter, web browsing)
- K2.5 Agent Swarm (Beta) — the full parallel orchestration system
Agent Swarm is currently in beta with free credits for top-tier paying users. The separation makes sense — you pick the complexity level that fits the task instead of paying for orchestration overhead on simple questions.
Kimi Code: Open-Source Coding Assistant
Alongside K2.5, Moonshot released Kimi Code — an open-source coding assistant that runs in the terminal and integrates with VSCode, Cursor, and Zed.
What makes it interesting: Kimi Code accepts images and videos as input, leveraging K2.5's native multimodal capabilities. It also automatically detects existing skills and MCPs and migrates them into the workspace.
The standout demo feature is autonomous visual debugging. K2.5 generates UI code, visually inspects its own output, looks up documentation, and iterates — all without human intervention. In one example it translated the aesthetics of Matisse's La Danse into a fully designed webpage — start to finish.
Pricing: $15–$200/month depending on usage tier, cached input tokens at $0.10/M.
Office Productivity
K2.5 Agent can handle extensive office work end-to-end — documents, spreadsheets, PDFs, and presentations generated directly through conversation.
On Moonshot's internal benchmarks, K2.5 shows a 59.3% improvement on the AI Office Benchmark and 24.3% on the General Agent Benchmark vs. K2 Thinking. Specific capabilities:
- Insert annotations in Word documents
- Build financial models with pivot tables
- Write LaTeX equations in PDFs
- Generate 10,000-word articles or 100-page documents
Tasks that used to take hours or days are now done in minutes. These are internal benchmarks, so treat them with appropriate skepticism — but the direction is clear: K2.5 is positioned as a knowledge worker, not just a chatbot.
Context Stability
K2.5 achieves 69.4% on LongBench-V2 with 128K context, outperforming GPT-5.2 (54.5%) and Gemini 3 Pro (68.2%).
The 256K-token context window handles complex long-horizon tasks with stable tool use across 200–300 sequential calls. When the context fills up, K2.5 fades out earlier tool outputs to stay within the limit.
K2.5 also performs well on Fiction.LiveBench — a benchmark that tests real narrative comprehension, not just simple retrieval. Unlike 'Needle in a Haystack' tests, it evaluates theory of mind, event chronology, and implicit inferences across long stories.
| Model | 0 | 1k | 4k | 8k | 16k | 32k | 60k | 120k | 192k |
|---|---|---|---|---|---|---|---|---|---|
| gpt-5.2 | 100 | 100 | 100 | 97.2 | 100 | 97.2 | 97.2 | 100 | 96.9 |
| kimi-k2.5 | 100 | 100 | 100 | 88.9 | 86.1 | 88.9 | 89.8 | 78.1 | 87.5 |
| gemini-3-pro | 100 | 100 | 100 | 97.2 | 96.6 | 94.4 | 100 | 96.9 | 96.9 |
| claude-opus-4-5 | 87.5 | 100 | 94.4 | 97.2 | 91.7 | 94.4 | 97.2 | 93.8 | 80.0 |
K2.5 holds strong scores across all context lengths, with 87.5% at 192K tokens. This matters for agentic tasks where coherent understanding across long sessions is critical.
Cost Efficiency

K2.5 is dramatically cheaper than the competition at comparable performance:
- 5.1× savings on SWE-Verified vs. GPT-5.2
- 21.1× savings on BrowseComp
- 10.1× savings on HLE Benchmark
At $0.60 per million input tokens it is approximately 9× cheaper than Claude Opus 4.5.
The Catch: VRAM Requirements
Here's the reality check. Although the MoE architecture only activates 32B parameters per token, the entire 1T model must remain in memory for token routing.
| Quantization | Required VRAM |
|---|---|
| FP16 | ~2 TB |
| Q8 | ~1.09 TB |
| Q4_K_M | ~621 GB |
| 2-bit | ~374 GB |
| 1.58-bit | ~240 GB |
Even the most aggressive quantization requires 240 GB+ RAM for local deployment. That means either enterprise hardware (4× H100) or massive system RAM with CPU offloading.
For most people the API at $0.60/M tokens is the practical choice.
When to Use K2.5
K2.5 shines at:
- Agentic automation and multi-step workflows
- Vision-to-code tasks (screenshots, videos)
- Web browsing and research (74.9 on BrowseComp)
- Cost-sensitive production deployments
Look elsewhere for:
- Pure code quality (Claude Opus 4.5 still leads on SWE-Bench)
- Maximum reasoning capability (GPT-5.2 has slight advantages)
- Local operation without enterprise hardware
Conclusion
Kimi K2.5 marks a genuine step forward for open-source models. The combination of native multimodality, agent orchestration, and competitive benchmark scores — at a fraction of proprietary prices — makes it a serious option for agentic workflows.
Just don't expect to run it on your laptop.
Links: