What Happened in AI in May 2026

The phrase "agentic AI" has been bouncing around conference slides for two years. May 2026 is the month it stopped being a talking point and became shipping infrastructure. Every major lab launched something designed to run in the background, unsupervised, for hours or days. The infrastructure race to support that is now clearly three-way. And the money got genuinely strange.

Google I/O: from chatbot to agent

Google framed the whole keynote around a single transition: "from chatbot to agent." That's a reasonable way to read what they shipped.

Gemini 3.5 Flash hitting GA is the version of the story most covered. Frontier-grade performance, 4x faster than comparable models, $1.50/$9 per 1M tokens. Worth knowing, but it's a model release. The more structurally interesting announcement was Gemini Spark: a 24/7 personal AI agent running on dedicated cloud VMs, integrated with Gmail, Calendar, and Docs via MCP, running recurring tasks in the background without you initiating them. That's a different product category than a chat interface.

The rest of I/O filled in the stack: Gemini Omni Flash for any-to-any multimodal generation (video first), 8th-gen TPUs at 2x performance-per-watt, and Gemini Antigravity as their unified agent-first dev platform. The platform naming is still a little much, but the hardware investment is real — you don't build 8th-gen TPUs without a clear picture of what workloads you're scaling.

Opus 4.8 and Dynamic Workflows

Anthropic shipped Opus 4.8 41 days after 4.7, which continues a pace that would have seemed impossible two years ago. Agentic coding benchmarks moved from 64.3% to 69.2%. Fast mode is now 3x cheaper.

The more substantive addition is Dynamic Workflows, currently in research preview: Opus can coordinate swarms of subagents and handle codebase-scale migrations from kickoff to merge. That's not a new benchmark score — it's a different operational mode. You hand it a migration spec, it figures out what to parallelize, and it reports back. Whether the research preview holds up at real scale is a different question, but the direction is unambiguous. Separately, Anthropic split out Agent SDK billing under Pro ($20/mo) and Max ($100–200) tiers, which suggests agent workloads are a distinct enough usage pattern to price separately. A "Mythos" model was teased and not explained.

The $65B round

Anthropic raised $65B in a Series H led by Altimeter, Dragoneer, Greenoaks, and Sequoia, taking the post-money valuation to $965B. That's the number. The same week saw Cognition raise $1B at $26B and OpenRouter close a $113M Series B. There's an obvious reading here — capital is concentrating around agentic infrastructure — but at these valuations you're also just watching a momentum trade. The question of which of these companies is actually profitable is still not answered clearly.

OpenAI goes systems integrator

On May 12, OpenAI launched "The Deployment Company," a $4B+ subsidiary that embeds Forward Deployed Engineers inside client organizations and then handles integration work that previously would have gone to an SI. They also acquired Tomoro, a ~150-engineer London firm. Clients include Fidelity, Virgin Atlantic, Tesco, and the NBA.

This is a significant strategic move that doesn't get enough attention. OpenAI is no longer just a model provider competing on benchmark scores — they're building the professional services layer that sits between the API and actual enterprise deployment. That changes the competitive dynamics for every other lab, and it changes what "winning enterprise AI" means. Selling Opus to a Fortune 500 just got harder if OpenAI is inside the building.

Earlier in the month, GPT-5.5 Instant became the default: 52.5% fewer hallucinated claims than 5.3 on high-stakes prompts, 30% shorter responses, and — worth noting — the first Instant-class model rated "High capability" in cybersecurity and bio/chem preparedness evaluations. The capability frontier and the safety evaluation frontier are moving together now, which is the right structure even if the absolute numbers remain contested.

DeepSeek and the price floor

DeepSeek made their V4-Pro price cut permanent on May 22: $0.435/$0.87 per 1M tokens, MIT license, 1.6T parameters with 49B active, 1M-token context. That's 8–10x cheaper than Opus 4.7 for tasks where the performance is comparable. It's the strongest open-weight model available, and the open license means you can run it on your own infrastructure.

The permanent price cut is the meaningful signal. It's not a promotional rate; it's a new floor. Every lab that charges $15/1M output tokens now has to justify that premium to customers who can see the gap.

Two things from ICLR worth keeping

A Salesforce paper, "LLMs Get Lost In Multi-Turn Conversation," ran 200K+ simulated conversations across top-tier models and found a ~39% performance drop compared to single-turn, driven primarily by reliability collapse rather than aptitude loss. The model doesn't forget how to reason mid-conversation; it loses track of constraints and instructions. That's directly actionable for anyone building multi-turn agents: the problem is attention drift over long context, not capability ceiling.

The other notable result — "Transformers are Inherently Succinct" — is more theoretical: EXPSPACE-complete Emptiness/Equivalence for transformer representational power versus RNNs. Foundational for understanding what transformers can and can't express, which matters more now that alternatives are starting to ship.

The alternative architecture question

Speaking of which: SubQ 1M-Preview is the first commercial subquadratic LLM — a genuine departure from transformer architecture. Native 12M-token context, approximately one-fifth the cost of frontier models on long-context tasks, and 52x faster attention at scale. The company raised $29M in seed funding.

One model on a $29M seed does not end the transformer era. But the transformer monoculture has been so total for so long that any credible architectural alternative is worth tracking. SubQ is the most credible early-stage challenge to the stack so far.

Infrastructure

AWS shipped Bedrock AgentCore Runtime to GA: stateful agent execution using Firecracker-style microVMs, isolated and durable, designed to give persistent agent runs on otherwise-stateless MCP infrastructure. That's the third pillar of what is now clearly a platform race: Google Antigravity, AWS AgentCore, and Azure Agent Framework (which hit 1.0 in April and saw wide adoption through May).

The convergence is interesting. Three hyperscalers, each building the same thing — durable stateful execution for long-running agents — using slightly different architectural approaches. The outcome of that race will determine what the agent deployment stack looks like in a year.

At the edges

A few other things that don't fit neatly: Alibaba's Qwen 3.7-Max-Preview completed a 35-hour autonomous tool-calling run with 1000+ tool calls, which is a meaningful durability demonstration. Cursor Composer 2.5 GA claimed 79.8% on SWE-Bench Multilingual and rough parity with Opus 4.7. xAI shipped an early beta of Grok Build CLI with parallel subagents and worktree support. Ollama 0.24 added Claude Desktop support and reworked the MLX sampler.

The through-line for May: the agent abstraction has stabilized enough that infrastructure is catching up to it. Labs are shipping background agents, clouds are building stateful runtimes to run them, and the pricing pressure from open-weight models is compressing margins at the top. That combination suggests the competitive differentiation in 2026H2 won't be "can your model code" — it'll be "how reliably does your agent run for 24 hours without losing the thread." That Salesforce paper on multi-turn reliability might end up being the most important research from the month.