AI Landscape Snapshot — Week 23
Weekly AI roundup: Claude Opus 4.8 leads benchmarks, GitHub Copilot goes token-based, Anthropic files S-1, US AI Act draft released.
Model Leaderboard: Claude Opus 4.8 Takes the Crown
The Artificial Analysis Intelligence Index v4.0 now ranks Claude Opus 4.8 as the #1 model with a score of 61, followed by GPT-5.5 at 60 (xhigh effort) and 59 (high effort). Gemini 3.1 Pro Preview and Claude Opus 4.7 share fifth place at 57.
Opus 4.8 launched May 28 with a 1M-token context window (now default, no beta header needed), 128K max output, and a new effort control system with settings from low through max. For Claude Code users, a new "ultracode" setting auto-triggers Dynamic Workflows — a research preview feature that lets one agent plan, fan out into parallel subagents, and merge results in a single session. Pricing holds at $5/$25 per million tokens, with Fast mode at $10/$50 delivering 2.5x speed at half the previous Fast mode cost.
GPT-5.5, released April 23, sits at $5/$30 per million tokens with a 1.05M context window. Its Pro variant costs $30/$180 for harder reasoning tasks. OpenAI claims a 52.5% reduction in hallucinated claims versus GPT-5.4.
Gemini 3.1 Pro Preview remains the value leader at $2/$12 per million tokens with configurable thinking levels (Low/Medium/High). Google also launched Gemini 3.5 Flash on May 19 at $1.50/$9 — beating 3.1 Pro on coding at roughly 25% lower cost.
Open-Weight Models: Three-Way Race
The open-weight tier has become genuinely competitive with the proprietary frontier. Kimi K2.6 from Moonshot AI leads with an AA Index score of 54 — a 1-trillion parameter MoE with 32B active, 256K context, and support for 300 parallel sub-agents. DeepSeek V4 Pro follows at 52 with a 1M context window and permanent pricing of $0.435/$0.87 per million tokens (made permanent May 22), making it roughly 10x cheaper than Opus for similar coding quality. GLM-5.1 rounds out the top three with a clean MIT license and strong self-hosting story.
GitHub Copilot's Billing Shakeup
The biggest practitioner-facing change this week: GitHub Copilot switched to token-based billing on June 1. Flat subscriptions now convert to equivalent AI Credits ($10/month = $10 in credits for Pro). Code completions and Next Edit suggestions remain free and unlimited, but chat, agentic workflows, and code review now consume credits at published API rates per model.
The fallback to cheaper models when credits run out is gone. Developer backlash has been substantial — some heavy users project 10-50x cost increases for agentic sessions. GitHub is offering promotional credits through August to cushion the transition. This shift forces teams to think about model routing and token efficiency in their coding workflows.
Anthropic's IPO Path and Safety Warnings
Anthropic filed a draft S-1 with the SEC on June 1, setting up an IPO at a $965B valuation after its $65B Series H. Revenue run-rate hit ~$47B in May 2026. In a notable disclosure, Anthropic revealed that over 80% of code merged into its production codebase in May was authored by Claude.
Alongside the business news, Anthropic issued a public call for coordinated safety measures, warning that AI systems may soon be capable of self-improvement without human oversight. The company argued current safety evaluation frameworks were designed for models that improve between training runs, not during deployment.
OpenAI: Domain-Specific Models and EU Expansion
OpenAI released an updated GPT-Rosalind on June 4, purpose-built for life sciences. It uses 31% fewer tokens than GPT-5.5 on genomics tasks while achieving higher accuracy. Separately, OpenAI granted the EU access to GPT-5.5-Cyber, a cybersecurity variant — a strategic move as OpenAI builds European government relationships ahead of its own IPO.
Policy: Great American AI Act Draft
On June 4, Representatives Obernolte (R-CA) and Trahan (D-MA) released a 269-page discussion draft of the Great American AI Act of 2026. Key provisions include a three-year federal preemption of state laws specifically regulating AI model development (not deployment or use), and $100M per year for the Center for AI Standards and Innovation. States retain the ability to regulate AI use in employment, healthcare, and consumer protection.
Framework and Tooling Updates
Vercel AI SDK 6 shipped with native agent primitives, tool execution approval, full MCP support (including OAuth authentication, resources, prompts, and elicitation), and DevTools. The v3 Language Model Specification powers new capabilities. Migration from v5 is straightforward: npx @ai-sdk/codemod v6.
Microsoft Project Polaris, announced at Build 2026, is Microsoft's first in-house coding model. It will replace GPT-4 Turbo as the default model inside GitHub Copilot in August 2026.
The MCP ecosystem continues to mature, with all major agent frameworks (LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK, Mastra, PydanticAI) now supporting MCP clients. Standardizing tools behind MCP servers is emerging as the default recommendation for production agent architectures — it makes the framework layer genuinely replaceable.
What This Means for Practitioners
Model selection is a multi-axis decision. Opus 4.8 leads on quality, Gemini 3.1 Pro leads on value, DeepSeek V4 Pro leads on open-weight cost efficiency. The right choice depends on your workload, data residency needs, and budget.
Token economics now matter everywhere. With GitHub Copilot going metered and model pricing becoming more granular (effort levels, fast modes, long-context surcharges), practitioners need to understand and manage token consumption. Take the AISA assessment to benchmark where your team stands on these skills.
Build tools behind MCP. Whether you're using LangGraph, CrewAI, or Vercel AI SDK, wrapping tools in MCP servers is the insurance policy against framework churn. Review the AI skills rubric to see how tool integration fits into your development maturity.
The agentic era has a price tag. The GitHub Copilot billing change is a preview of what's coming across all AI tooling. Teams building agentic workflows need cost monitoring, model routing, and effort-level tuning as core competencies. If you're an AI developer, these are skills to build now, not later.

Ozan Dagdeviren
Founder of AISA — the AI skills assessment platform used by professionals worldwide to measure, certify, and develop their AI fluency. More about AISA
Ready to try the free AI skills assessment yourself?