Complete Guide to MiniMax's SOTA Breakthrough—Free for One Week in Kilo's CLI, VS Code Extension, and More—Rivaling Claude Opus 4.6, Beating Gemini 3 Pro, at Fraction of the Cost
MiniMax M2.5 represents a quantum leap for the Chinese AI lab, achieving 80.2% on SWE-Bench Verified (human-validated real-world GitHub issues) matching Claude Opus 4.6's standard performance, outperforming Gemini 3 Pro on SWE-Bench Pro (55.4% vs 43.3%) and Multi-SWE-Bench (51.3% vs 50.3%), while delivering 100 tokens per second throughput (3× faster than Opus), all with only 10B activated parameters (smallest Tier-1 model). The Free Access: Available completely free for one week in Kilo Code (CLI, VS Code extension, IDE integrations) with no credits required, making SOTA performance accessible without “frontier tax.” The Architecture: “Total overhaul of M2.1 architecture” engineered specifically for Agent-Verse and agentic workflows, optimized for “thinking efficiency” through planning, achieving $0.3/M input and $0.06/M blended cost with cache (best price for SOTA model). The Competitive Position: MiniMax moves “into big leagues as truly SOTA lab,” blurring “OSS vs Proprietary distinction” with 51.3% Multi-SWE-Bench, 76.3% BrowseComp, rivaling best frontier models while maintaining cost efficiency. The Popularity: M2.1 already “most popular open-weight model on Kilo to date,” with M2.5 poised to “rule every category” on Kilo's leaderboard. The Self-Hosting Advantage: 10B parameter efficiency enables self-hosting without “massive clusters,” offering “unparalleled advantage for developers.” Access Points: Kilo Code via kilo.ai/landing/minimax-m25, CLI installation, VS Code marketplace extension, all IDE integrations, joining 1.5M+ developers using Kilo platform.
Part I: The Breakthrough Performance
SWE-Bench Verified: 80.2% Matching Opus 4.6
The Benchmark: Human-validated subset of real-world GitHub issues testing production-level bug solving
MiniMax M2.5: 80.2% accuracy
Claude Opus 4.6: “Just below 80%” on standard trials
The Significance: “M2.5 sits comfortably at 80.2% out-of-the-box”
Anthropic's Prompt Modification: Opus 4.6 reaches 81.42% with specific prompt adjustment
Real-World Testing: “Fits with what we're seeing with Opus in the wild”
Competitive Assessment: “Formidable powerhouse rivaling best in industry”
SWE-Bench Pro: Dominating Gemini 3 Pro
MiniMax M2.5: 55.4%
Gemini 3 Pro: 43.3%
Performance Gap: +12.1 percentage points
What It Tests: “Increased difficulty and realism of advanced software engineering tasks”
Proof: “Can handle” most rigorous engineering challenges
Multi-SWE-Bench: Complex Multi-Step Superiority
MiniMax M2.5: 51.3%
Gemini 3 Pro: 50.3%
What It Measures: “Complex, multi-step software suites”
Capability Demonstrated: “Superior autonomous execution in long-horizon tasks”
Implication: Better at sustained reasoning and multi-phase problem-solving
BrowseComp: Agentic Search Excellence
MiniMax M2.5: 76.3%
What It Tests: Information retrieval and research capabilities
Agentic Strength: “Catching up to and often surpassing major models like GPT-5.2 and Gemini 3 Pro in coding and research”
The Complete Picture
Benchmark Summary:
- SWE-Bench Verified: 80.2% (matches Opus 4.6)
- SWE-Bench Pro: 55.4% (beats Gemini 3 Pro by 12.1 points)
- Multi-SWE-Bench: 51.3% (edges Gemini 3 Pro)
- BrowseComp: 76.3% (strong research capability)
Overall Assessment: “Not just incremental update; total overhaul of M2.1 architecture”
Part II: Speed and Efficiency Revolution
Lightning Fast Throughput: 100 TPS
Speed: 100 tokens per second
Versus Opus: 3× faster in early testing
What It Means: Dramatically faster response times
User Experience: Near-instant feedback for complex queries
Agentic Advantage: Rapid iteration in autonomous workflows
Thinking Efficiency Optimization
Training Focus: “Trained to optimize actions and output through planning”
Token Efficiency: More efficient than previous generations
Cost Impact: Lower token consumption for same quality
Planning Integration: Built-in planning reduces wasted tokens
Result: Better performance with fewer resources
The 10B Parameter Advantage
Activated Parameters: Only 10 billion
Significance: “Smallest Tier-1 model in existence”
Comparison: Other Tier-1 models require “massive clusters”
Self-Hosting Benefit: “Unparalleled advantage for developers who want to self-host”
Deployment Flexibility: Feasible on consumer-grade hardware
Cost Savings: Lower infrastructure requirements
Always-On Efficiency
Input Pricing: $0.3/M tokens
Blended Cost with Cache: $0.06/M tokens
Competitive Assessment: “Best price of any SOTA model for always-on agents”
Use Case: Continuous monitoring, real-time assistance, persistent agents
Economic Impact: Enables 24/7 agent deployment affordably
Part III: Free Access Through Kilo Code
The One-Week Free Promotion
Duration: One week from launch (Feb 12, 2026)
Scope: “Completely free for all Kilo users”
No Restrictions: “No credits required—just pure, unadulterated SOTA power”
Access Method: Select MiniMax M2.5 from model dropdown
Philosophy: “Give every developer world's most powerful tools without ‘frontier tax'”
Scale: “Biggest leap yet” for Kilo Code
Platform Integration
Kilo CLI: Command-line interface access
VS Code Extension: Marketplace.visualstudio.com/items?itemName=kilocode.Kilo-Code
IDE Support: All major IDEs integrated
Cloud Access: Kilo Cloud platform
Slack Integration: Kilo for Slack (previously made M2.1 free)
User Base: Join 1.5M+ developers
Installation Process
Step 1: Visit kilo.ai/landing/minimax-m25
Step 2: Click “Install Kilo Code”
Step 3: Choose installation method (CLI, VS Code, etc.)
Step 4: Select MiniMax M2.5 from dropdown
Step 5: Start building immediately
Simplicity: No complex setup, instant access
Part IV: The MiniMax Model Family
MiniMax M2.5 (New Release)
Status: Free in Kilo (one week promotion)
Performance: SOTA on multiple benchmarks
Capabilities:
- Exceptional reasoning
- Superior coding
- Fast inference (100 TPS)
- Open weights coming
Best For: Production coding, complex problem-solving, agentic workflows
MiniMax M2.1 (High Performer)
Pricing: $0.27/M input tokens
Status: “Most popular open-weight model on Kilo to date”
Performance: “Competitive performance on practical coding benchmarks”
Reliability: “Reliable for production use cases”
Cost Position: “Fraction of frontier model costs”
Best For: Cost-effective production deployment
MiniMax M1-80k (Long Context)
Pricing: $0.80/M input tokens
Context Window: 80,000 tokens
Reasoning: “Advanced chain-of-thought reasoning”
Specialty: “Excellent for complex multi-step tasks”
Capabilities:
- Deep reasoning
- Complex task handling
- Extended context understanding
Best For: Multi-step analysis, large codebases, comprehensive reasoning
Part V: MiniMax Company Context
The Organization
Founded: 2022
Location: China (leading Chinese AI company)
Backing: Major investors including Alibaba, Tencent, HongShan
User Base: Over 200 million users globally
Specialization: Large language models and multi-modal AI technology
Open-Weight Philosophy
Commitment: “Making advanced AI accessible to developers worldwide”
Previous Releases: M2 series, M1 models
Recognition: “Competitive performance on coding and reasoning benchmarks”
Future Plans: “MiniMax typically releases open weights for their models”
M2.5 Timeline: “Expect M2.5 weights on HuggingFace soon”
Community Benefit: Open weights enable research, fine-tuning, self-hosting
Part VI: Agentic Engineering Capabilities
Designed for the Agent-Verse
Core Design: “Engineered from ground up for Agent-Verse”
Primary Role: “Primary workhorse for future workspace”
Kilo's Excitement: “Particularly excited about agentic capabilities for planning and executing large-scale dev projects”
Optimization: “Specifically designed for agentic workflows”
Planning and Execution
Planning Capability: Built-in task decomposition and strategy
Execution Quality: “Superior autonomous execution in long-horizon tasks”
Multi-Step Mastery: Handles complex sequential workflows
Error Recovery: Robust handling of obstacles in multi-phase tasks
Adaptability: Adjusts approach based on intermediate results
Kilo Code Modes for Every Workflow Step
Ask Mode: “Knowledgeable technical assistant focused on answering questions without changing codebase”
Architect Mode: System design and architecture planning
Code Mode: Active code generation and modification
Debug Mode: Issue identification and resolution
Orchestrator Mode: Multi-agent coordination and workflow management
Custom Mode: User-defined specialized behaviors
M2.5 Performance: Expected to “rule every category” on Kilo leaderboard
Part VII: Competitive Landscape
The “OSS vs Proprietary” Blur
Previous Distinction: Clear gap between open-source and proprietary performance
M2.5 Impact: “OSS vs Proprietary distinction is blurring”
SOTA Lab Status: MiniMax now “truly SOTA lab with truly SOTA model”
Market Position: Competing directly with GPT-5.2, Claude Opus 4.6, Gemini 3 Pro
Kilo Code vs. Other Tools
GitHub Copilot: Different pricing model, limited model selection
Cursor: Proprietary editor lock-in
Windsurf: Alternative agentic coding tool
Cline: Autonomous coding agent
Amp Code (Sourcegraph/Cody): Shutting down VS Code extension
Kilo Advantage: Model flexibility, platform agnostic, competitive pricing
The Leaderboard Dominance
Current Status: “M2.1 has been top model on Kilo's leaderboard for every mode except Architect and Orchestrator”
M2.5 Potential: “Will M2.5 push MiniMax over edge to rule every category?”
User Validation: Popularity driven by actual developer preference
Performance Proof: Real-world results on Kilo platform
Part VIII: Practical Applications
Production Bug Solving
Use Case: Real-world GitHub issue resolution
Evidence: 80.2% SWE-Bench Verified
Workflow:
- Analyze bug report
- Navigate codebase
- Identify root cause
- Implement fix
- Validate solution
Speed: 100 TPS enables rapid iteration
Large-Scale Development Projects
Planning: Task decomposition and sequencing
Execution: Multi-file code generation
Integration: Component coordination
Testing: Validation and debugging
Documentation: Comprehensive commenting
Complex Multi-Step Tasks
Research: Information gathering via BrowseComp capability
Analysis: Deep reasoning through chain-of-thought
Synthesis: Combining multiple sources
Implementation: Translation to working code
Optimization: Iterative refinement
Always-On Agent Deployment
Cost-Effective: $0.06/M blended cost with cache
Use Cases:
- Code review automation
- Continuous testing
- Real-time documentation
- Issue triage
- Pull request assistance
Economic Viability: “Best price any SOTA model for always-on agents”
Part IX: Why This Matters
Democratizing SOTA Performance
Philosophy: “Give every developer world's most powerful tools without ‘frontier tax'”
Free Access: No cost barrier for one week
Post-Promotion: Still most affordable SOTA option
Impact: Levels playing field for individual developers, startups, students
The Self-Hosting Future
10B Parameter Efficiency: Makes self-hosting practical
Infrastructure Savings: No massive cluster requirements
Data Sovereignty: Keep models on own infrastructure
Customization: Fine-tune for specific domains
Independence: Not dependent on API availability
Moving the Industry Forward
Quote: “Best way to move industry forward is to put best models in hands of every developer”
Competition Effect: Pressure on proprietary labs to improve
Innovation Acceleration: More developers with SOTA tools create faster progress
Knowledge Sharing: Open weights enable community research
Conclusion: The New SOTA Standard
What M2.5 Achieves
Performance: 80.2% SWE-Bench Verified matching Opus 4.6
Speed: 100 TPS, 3× faster than Opus
Efficiency: 10B parameters, smallest Tier-1 model
Cost: $0.06/M blended, best SOTA pricing
Versatility: Excellence across coding, reasoning, research
Accessibility: Free in Kilo for one week, open weights coming
Why It's Revolutionary
Blurs OSS/Proprietary Gap: Open-weight model matching closed-source performance
Enables Self-Hosting: First Tier-1 model practical for individual deployment
Affordable Always-On Agents: Economic viability of continuous AI assistance
Democratized SOTA: No frontier tax for world-class performance
How to Get Started
Immediate Action: Visit kilo.ai/landing/minimax-m25
Install: Choose CLI, VS Code extension, or IDE integration
Select: Pick MiniMax M2.5 from dropdown
Build: Start coding with SOTA assistance immediately
Free Window: One week, no credits required
Post-Promotion: Still most affordable SOTA option available
Try M2.5 Free:
- Landing Page: kilo.ai/landing/minimax-m25
- CLI: Install via Kilo Code CLI
- VS Code: marketplace.visualstudio.com (search “Kilo Code”)
- Docs: kilo.ai/docs
- Blog: blog.kilo.ai
Join: 1.5M+ developers using Kilo Code
The Bottom Line: MiniMax M2.5 achieves breakthrough SOTA performance—80.2% SWE-Bench Verified (matching Claude Opus 4.6), 55.4% SWE-Bench Pro (beating Gemini 3 Pro by 12.1 points), 51.3% Multi-SWE-Bench, 76.3% BrowseComp—at 100 tokens/second speed (3× faster than Opus) with only 10B activated parameters (smallest Tier-1 model enabling self-hosting) and $0.06/M blended cost (best SOTA pricing). Available completely free for one week in Kilo Code (CLI, VS Code, all IDEs) with no credits required, blurring “OSS vs Proprietary distinction” as MiniMax moves “into big leagues as truly SOTA lab.” Engineered for Agent-Verse with planning optimization, expected to “rule every category” on Kilo leaderboard, joining M2.1 as “most popular open-weight model” while providing “world's most powerful tools without frontier tax.” Open weights coming to HuggingFace soon. Access at kilo.ai/landing/minimax-m25—join 1.5M+ developers experiencing production-grade AI assistance democratized.
The future of coding assistance just got faster, cheaper, and accessible to everyone. One week free. No frontier tax. Pure SOTA power.





