Google launches Gemini 3.1 Pro — 77.1% on ARC-AGI-2, 80.6% on SWE-Bench Verified
Google launched Gemini 3.1 Pro in preview on February 19. The model scores 77.1% on ARC-AGI-2 — more than double the 3 Pro score on the same benchmark — and 80.6% on SWE-Bench Verified, a benchmark measuring autonomous software engineering task completion. Gemini 3.1 Pro processes text, audio, images, video, and entire code repositories and is available to developers via the Gemini API.
ARC-AGI-2 is one of the most respected capability benchmarks because it was specifically designed to resist pattern memorization — the set it uses for evaluation was held out from training data. A jump from below 40% to 77.1% in a single model generation is a statistically significant capability leap, not incremental polish.
Every story from each day, delivered at midnight UTC.