Google launches Gemini 3.1 Pro — 77.1% on ARC-AGI-2, 80.6% on SWE-Bench Verified

SHAREPOST ON X →SHARE ON LINKEDIN →

Source

↗

Blog post

Google DeepMind

What Happened

Google launched Gemini 3.1 Pro in preview on February 19. The model scores 77.1% on ARC-AGI-2 — more than double the 3 Pro score on the same benchmark — and 80.6% on SWE-Bench Verified, a benchmark measuring autonomous software engineering task completion. Gemini 3.1 Pro processes text, audio, images, video, and entire code repositories and is available to developers via the Gemini API.

Why It Matters

ARC-AGI-2 is one of the most respected capability benchmarks because it was specifically designed to resist pattern memorization — the set it uses for evaluation was held out from training data. A jump from below 40% to 77.1% in a single model generation is a statistically significant capability leap, not incremental polish.