14:00 UTC·
Google[GOOGLE]

Google launches Gemini 3.1 Pro — 77.1% on ARC-AGI-2, 80.6% on SWE-Bench Verified

Source
Blog post
Google DeepMind
What Happened

Google launched Gemini 3.1 Pro in preview on February 19. The model scores 77.1% on ARC-AGI-2 — more than double the 3 Pro score on the same benchmark — and 80.6% on SWE-Bench Verified, a benchmark measuring autonomous software engineering task completion. Gemini 3.1 Pro processes text, audio, images, video, and entire code repositories and is available to developers via the Gemini API.

Why It Matters

ARC-AGI-2 is one of the most respected capability benchmarks because it was specifically designed to resist pattern memorization — the set it uses for evaluation was held out from training data. A jump from below 40% to 77.1% in a single model generation is a statistically significant capability leap, not incremental polish.

More from Google
GET THE DAILY DIGEST IN YOUR INBOX →

Every story from each day, delivered at midnight UTC.

← back to 2026-02-19
NWSRM · AI FEED
Built by [COMPANY] · Powered by nwsrm.ai