AI Models / Compare
MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - vis
- Creator
- Xiaomi
- Lifecycle
- Active
- Context
- 262.1K
- Max output
- 65.5K
- Released
- Mar 18, 2026
- Status
- unknown
- Input
- $0.40 / 1M tokens
- Output
- $2.00 / 1M tokens
- Cached read
- $0.08 / 1M tokens
- Cached write
- — / 1M tokens
- Batch discount
- —%
- Source
- OpenRouter
- Verified
- Apr 5, 2026 (High)
Capabilities
- Modalities
- textaudioimagevideo→text
- Capabilities
- reasoningaudioInputimageInputpromptCachingfunctionCallingstructuredOutputs