Mistral releases Small 4 — 119B MoE model unifying reasoning, vision, and coding in one Apache 2.0 release
Mistral released Mistral Small 4 on March 16. The model consolidates three previously separate Mistral flagship products — Magistral (reasoning), Pixtral (vision/multimodal), and Devstral (agentic coding) — into a single deployable model. Architecture: 128 experts with 4 active per token, 119B total parameters (6B active per token), 256k context window, native text and image inputs. Reasoning depth is configurable via a `reasoning_effort` parameter — `none` for fast responses, `high` for deep step-by-step reasoning. Released under Apache 2.0. Performance vs. Mistral Small 3: 40% faster end-to-end, 3x more throughput. Benchmarks show it matching or surpassing GPT-OSS 120B on LCR, LiveCodeBench, and AIME 2025 with 20–75% shorter output length. Available via Mistral API, Hugging Face, NVIDIA NIM, vLLM, llama.cpp, and SGLang. Mistral also announced it is joining the NVIDIA Nemotron Coalition as a founding member.
The multi-model problem is real enterprise friction. Organizations currently deploying Mistral need separate endpoints and routing logic to handle reasoning tasks, vision inputs, and coding agent workflows — three different models, three integration surfaces, three billing lines. Small 4 eliminates that by collapsing all three into one model with a parameter toggle. That's not a convenience feature; it's an infrastructure simplification that directly reduces deployment cost and complexity.
Every story from each day, delivered at midnight UTC.