Addy Osmani recently shared something that caught my attention: "Claude Code now supports agent teams (swarms)!" He followed with the key idea: "Instead of a single agent working through tasks sequentially, a lead agent can delegate to multiple teammates that work in parallel." That framing is simple, but it points at a deeper shift in how we should use LLMs for real software work.

In this post, I want to expand on what Addy is getting at: agent teams are not just a shiny feature. They are a concrete way to apply the same principle that makes human engineering teams effective: specialization through focused context.

What "agent teams" actually changes

Addy described a lead agent delegating to multiple teammates, "each a full Claude Code instance with its own context window, an inbox for inter-agent messaging, and a shared task list with dependency tracking." Read that again and notice what is being productized:

Parallelism: multiple agents can work at the same time, not turn-by-turn.
Separation of concerns: each agent has its own context window.
Coordination primitives: messaging plus a shared task list and dependencies.

This matters because many of the failures people attribute to "LLMs being unreliable" are actually workflow failures. We stuff too many concerns into one session, ask for too much in one go, and then act surprised when the output gets inconsistent.

"LLMs perform worse as context expands. Agent teams formalize the same principle human teams use - specialization through focus." - Addy Osmani

That quote is the center of gravity. If you only take one thing away, take this: a bigger context window is not the same as better thinking.

Why big context can make reasoning worse

Addy pointed out the uncomfortable truth: as you keep expanding context, performance can drop. In practice, that shows up as:

Goal drift: the model starts optimizing for the last few messages instead of the core objective.
Conflicting instructions: old decisions linger, new decisions override them, and you get a blended answer.
Attention dilution: critical constraints get buried under logs, discussion, and unrelated notes.

Humans handle this by splitting work. Your security reviewer does not sit in the same meeting as your micro-optimization discussion for two hours, then write a threat model. They get a focused brief.

Agent teams attempt to turn that human pattern into a default AI workflow: smaller, cleaner contexts with clearer ownership.

What works well with swarms (and why)

Addy called out several sweet spots: "competing debug hypotheses, parallel code review with different lenses, and cross-layer feature work spanning frontend, backend, and tests." Each one maps to a pattern where parallel thinking beats sequential thinking.

1) Competing debug hypotheses

Debugging is rarely linear. The fastest engineers keep multiple plausible causes alive, then kill them with evidence.

With a single agent, you often get one dominant narrative: it picks a hypothesis early and explains everything through that lens. A swarm lets you do something closer to good engineering:

Agent A: form a hypothesis based on recent changes and stack traces.
Agent B: audit the diff for edge cases and silent failures.
Agent C: search for known library issues or version mismatches.
Lead agent: consolidates, asks for one targeted experiment to disambiguate.

Even if one agent goes off track, you get "graceful degradation" because other agents can still produce useful paths forward.

2) Parallel code review with different lenses

Most teams already do this informally: one reviewer focuses on correctness, another on security, another on maintainability.

Agent teams make this explicit. You can assign:

A security-focused reviewer that only sees the diff and threat model prompt.
A performance reviewer that only sees hotspots and perf budgets.
A style and DX reviewer that checks readability, naming, and tests.

The key is Addy's point: "A security reviewer doesn't need performance optimization notes in its context, and a testing agent doesn't need the three-hour planning discussion." Clean inputs produce cleaner judgments.

3) Cross-layer feature work (frontend + backend + tests)

Full-stack features are coordination problems. They are also the kind of work that tempts you to open one giant chat, paste every file, and hope the model just "handles it."

A better pattern is:

Backend agent: implement endpoints per spec, write migration notes.
Frontend agent: update UI flows, handle error states, integrate API.
Testing agent: add unit tests and integration tests, validate edge cases.
Lead agent: keeps a dependency list (API contract first, UI second, tests alongside) and merges decisions.

This division aligns with how we already ship software. The swarm just compresses the cycle time.

The real cost: coordination and tokens

Addy also gave the warning people need to hear: "Agent teams add coordination overhead and use significantly more tokens than a single session." In other words, swarms are not a free lunch.

You pay in two ways:

Coordination overhead
Someone (the lead agent, and you) must write clearer tasks, reconcile conflicting outputs, and keep a shared plan consistent.
Token spend
Multiple agents mean multiple context windows, multiple outputs, and often duplicated reading of the same spec.

That is why Addy's guidance on scoping is so important: "They work best when teammates can operate independently on well-scoped tasks - "implement these five API endpoints per this spec" beats "build me an app."" The more ambiguous the task, the more the swarm multiplies confusion.

A practical playbook for trying agent teams

Addy shared the mechanical step: enable it with the "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS" flag in settings.json, then describe the team you want in natural language. But the bigger question is how to adopt it without turning your workflow into chaos.

Here is a simple progression that mirrors Addy's advice to "Start with research and review tasks, then scale to cross-layer features and larger refactors."

Phase 1: Research and review only

Low risk, high value.

Create a "Research" agent that summarizes docs, release notes, or a new library.
Create a "Reviewer" agent that audits a PR diff for correctness and missing tests.
Keep the lead agent focused on synthesizing and deciding.

Success metric: you save time without merging code automatically.

Phase 2: Bounded implementation tasks

Pick tasks with crisp inputs and clear outputs.

"Implement these endpoints exactly per OpenAPI spec."
"Refactor this module to remove duplication, no behavior changes, keep public API."
"Add tests for these scenarios, keep existing fixtures."

Success metric: you can review and land changes with fewer back-and-forth cycles.

Phase 3: Cross-layer work with dependency tracking

Now the shared task list and dependencies start to matter.

Define the contract first.
Run backend and frontend in parallel only after the contract is stable.
Have a testing agent validate assumptions and edge cases continuously.

Success metric: fewer integration surprises at the end.

Tips to reduce swarm chaos

Agent teams work best when you actively design for focus.

Give each agent a role statement and a hard boundary ("Do not propose UI changes" or "Do not change business logic").
Provide a single source of truth (spec, acceptance criteria, constraints) and avoid pasting everything everywhere.
Ask agents to return structured outputs (checklists, diffs, risk lists) so the lead can compare apples to apples.
Decide how conflicts are resolved (lead agent chooses, or you choose) and keep that explicit.

Why this is more than a feature announcement

Addy's post is a reminder that better AI outcomes often come from better decomposition, not from bigger prompts. Agent teams turn a set of "multi-agent orchestration" ideas into something you can actually use: parallel work, focused contexts, and explicit coordination.

When the work is well-scoped, this approach can improve reasoning per domain, give you independent quality checks, and fail more safely when one thread goes wrong. When the work is vague, it can burn tokens and time.

As Addy put it: let the problem guide the tooling.

This blog post expands on a viral LinkedIn post by Addy Osmani, Director, Google Cloud AI. Best-selling Author. Speaker. AI, DX, UX. I want to see you win.. View the original LinkedIn post →