AI Landscape Snapshot — Week 21
Google I/O 2026 launches Gemini 3.5 Flash and Spark agent, plus model leaderboard updates and framework releases for AI practitioners.
Google I/O 2026: Gemini Goes Agentic
The biggest event of the week was Google I/O 2026 (May 19-20), which centered almost entirely on Gemini and the shift toward agentic AI.
Gemini 3.5 Flash is the headline model release — the first in the 3.5 series, built for agentic workflows. Google reports it outperforms Gemini 3.1 Pro on coding and agentic benchmarks: 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, and 83.6% on MCP Atlas. It runs 4x faster in output tokens/second than other frontier models. It's available now via the Gemini API, Google AI Studio, and Antigravity 2.0. Gemini 3.5 Pro is in testing and expected next month.
Gemini Omni is a new model series that merges Gemini's reasoning with generative capabilities from Nano Banana (images) and Veo (video) in a single pipeline. It accepts text, image, audio, and video input and outputs video. It's live now for AI Plus, Pro, and Ultra subscribers.
Google Antigravity 2.0 upgrades Google's agent-first development platform. A single API call now provisions a sandboxed Linux environment where an agent can reason, execute code, and browse the web. Developers can define agents using markdown files (AGENTS.md, SKILL.md) instead of writing complex orchestration code.
Gemini Spark is a 24/7 personal AI agent that runs in the cloud, proactively working on tasks. It launches next week for US AI Ultra subscribers, with MCP support for third-party apps (Canva, Instacart, OpenTable) coming this summer.
Google also overhauled its consumer pricing: AI Ultra dropped from $250 to $100/month, daily prompt limits are replaced by compute-based usage, and subscribers get access to Spark and expanded Gemini features. Google's capex guidance for 2026 is approximately $180-190 billion.
Current Model Leaderboard
Based on the Artificial Analysis Intelligence Index v4.0:
| Model | Intelligence Index | Context | API Pricing (Input/Output per M) |
|---|---|---|---|
| GPT-5.5 (xhigh) | 60 | 1M | $5 / $30 |
| Claude Opus 4.7 (max) | — | 1M | $5 / $25 |
| Gemini 3.1 Pro Preview | — | TBD | — |
| Kimi K2.6 (top open-weight) | 54 | 262K | varies by provider |
| MiMo-V2.5-Pro (open) | 54 | TBD | — |
GPT-5.5 leads overall intelligence. Claude Opus 4.7 remains the coding and agentic workflow favorite among many practitioners. Gemini 3.5 Flash hasn't been scored on the Index yet but outperforms 3.1 Pro on Google's own agentic benchmarks. On the open-source side, Kimi K2.6 (1T MoE, 32B active parameters) and MiMo-V2.5-Pro are tied at 54.
For speed: Mercury 2 leads at 899 tokens/second. For context: Llama 4 Scout supports 10M tokens. For cost: Qwen3.5 0.8B at $0.01 per million tokens.
Anthropic: Security Tools, Claude Code Updates, and Gates Partnership
Claude Opus 4.7 (released April 16) continues as Anthropic's flagship GA model with a 1M context window, 128K max output tokens, and high-resolution vision support up to 2576px. Claude Code's fast mode now defaults to Opus 4.7, and new claude agents flags allow configuring background sessions with custom settings.
Claude Security launched in public beta for Enterprise customers — it scans codebases for vulnerabilities and generates proposed fixes using Opus 4.7. In its first three weeks, it was used to patch over 2,100 vulnerabilities.
On May 14, Anthropic announced a $200 million partnership with the Gates Foundation — a four-year commitment covering global health (vaccine research, disease modeling), education (K-12 tutoring in US, India, sub-Saharan Africa), and agriculture. The partnership includes funding for open-source African language datasets.
Framework and Tooling Updates
Vercel AI SDK 6 shipped as a major release introducing the v3 Language Model Specification. Key additions: composable agents as a first-class primitive, tool execution approval workflows, full MCP support (OAuth, resources, prompts, elicitation), DevTools for debugging, and image editing/reranking capabilities. Migration from v5 is straightforward — run npx @ai-sdk/codemod v6.
Cursor released Composer 2.5, built on the Kimi K2.5 base model. The coding tool ecosystem continues to consolidate around a few key players: Claude Code, GitHub Copilot (now defaulting to Opus 4.7), Cursor, and OpenAI Codex.
Adobe, Canva, and CapCut announced Gemini integrations to let users access their editing tools directly within the Gemini app.
Regulatory and Industry Signals
The Trump administration delayed signing a new AI executive order. Microsoft and xAI have reportedly agreed to give US regulators early access to their models before public release — a notable shift toward pre-release government testing.
The AI Security Institute tested GPT-5.5 and reported a 71.4% average pass rate on expert-level cyber tasks, calling it potentially "the strongest model we have tested" on that measure.
An IBM report found that 76% of surveyed organizations now have a Chief AI Officer, up from 26% in 2025. Google confirmed Gemini will power a more personalized version of Siri under a ~$1B/year licensing deal with Apple.
What This Means for Practitioners
Model selection is a multi-axis decision. GPT-5.5 leads on raw intelligence benchmarks. Claude Opus 4.7 is the coding and long-running-agent specialist. Gemini 3.5 Flash is now the strongest agent-first model from Google and may offer the best speed-to-intelligence ratio for agentic work. For open-source deployments, Kimi K2.6 competes with closed-source models on coding tasks.
Agent frameworks are maturing fast. Google Antigravity 2.0, Vercel AI SDK 6, and the expanding MCP ecosystem all point toward a world where building agents is becoming infrastructure rather than experimentation. If you haven't explored MCP integrations yet, now is the time — it's becoming the standard for tool interoperability.
Assess your skills. The pace of change makes it worth regularly checking where you stand. The AISA assessment maps your current AI capabilities against what the market demands. Whether you're a developer building with these tools or a leader evaluating adoption, the AI skills rubric provides a structured framework for tracking your growth as the landscape shifts week by week.

Ozan Dagdeviren
Founder of AISA — the AI skills assessment platform used by professionals worldwide to measure, certify, and develop their AI fluency. More about AISA
Ready to try the free AI skills assessment yourself?