Suhrab Khan recently shared something that caught my attention: "AI agent teams don’t fix your engineering org. They expose it. Fast." He followed it with a familiar scenario: a founder saying, "We’ll just run 10 agents and ship twice as fast." And then, as Suhrab puts it, "the repo hit them back."

That short exchange nails the current moment. Multi-agent coding setups can feel like a cheat code. You spin up parallel agents, watch them open files and produce diffs, and it looks like the future arrived early. But Suhrab’s point is sharper: agent teams do not create organizational maturity. They amplify whatever is already there, especially the parts you have been able to ignore because humans compensate with context.

In this post, I want to expand on what Suhrab is really warning about, and turn his three-step setup into a practical operating model you can apply to your repo.

AI agent teams are leverage, not a rescue plan

Suhrab is not saying agent teams are useless. He even cites a serious example: Claude Code agent teams can be "real leverage" when work splits cleanly across a codebase. Anthropic reportedly ran many parallel agents to build a C compiler that could compile the Linux kernel. That is not a toy demo.

The difference is not raw intelligence. It is structure.

When tasks are decomposable, interfaces are stable, and acceptance criteria are explicit, parallel agents can move like a well-drilled team. When those things are missing, adding agents is like adding more people to a project with no map. You get motion, not progress.

"It also only works when the work graph is real, the interfaces exist, and you stop pretending ‘we’ll figure it out’ is a spec." - Suhrab Khan

That last line is the key: agents punish ambiguity. Humans can negotiate meaning mid-flight. Agents will happily implement different interpretations at the same time.

Where teams faceplant: cross-layer changes

Suhrab calls out the most common failure mode: cross-layer changes. A small contract tweak in the frontend ripples into backend endpoints, validation, tests, migrations, and sometimes permissions or rate limits. In a single-threaded human workflow, one engineer often holds the whole mental model and keeps things coherent.

In a multi-agent workflow, you can easily get this:

Agent A updates a frontend contract and mocks data to make the UI pass.
Agent B "fixes" the API in a way that is backwards-incompatible.
Agent C updates tests, but asserts the old behavior because it read stale docs.
Agent D adjusts a migration script without coordinating rollout constraints.

Each agent is locally confident. Globally, you get what Suhrab describes as "merge-conflict karaoke" plus a bug hunt that feels endless.

This is not because the agents are bad. It is because the system lacks shared truth.

The real prerequisite: an explicit work graph

Suhrab mentions "the work graph". Translate that into practical terms: can you draw the dependency chain of the change?

For example, a cross-layer feature might require:

API contract update (OpenAPI or typed schema)
Backend handler implementation
Authz rules
Frontend client update
Integration tests
Migration and backfill
Observability (logs, metrics, alerts)

If you cannot list those artifacts and define their boundaries, you do not have a work graph. You have a wish.

Agents thrive when you hand them a bounded subgraph with a clear interface and a way to verify correctness.

A practical playbook based on Suhrab’s three steps

Suhrab lays out a setup that is closer to an engineering operating system than a prompt. Here is how I would implement it in a real repo.

Step 1: Scaffold the system with a Lead agent

Suhrab’s first move is organizational: "A Lead agent owns the plan, boundaries, and acceptance tests." Subagents own components like frontend, backend, tests, migrations, and security review.

In practice, the Lead agent should produce a short set of artifacts before anyone edits code:

A plan with a numbered checklist of deliverables
File-level ownership (which agent touches what)
A contract source of truth (schema, types, or spec)
Acceptance tests or at least acceptance criteria written as verifiable statements
A commit protocol (more on this below)

Then each specialist agent gets a tight mission and a required reading list. Suhrab emphasizes this: "the artifacts they must read before touching code." That typically includes:

Relevant architecture docs
Existing contract definitions
Test conventions
Build and run commands
Previous PRs that touched the same subsystem

The goal is simple: prevent parallel hallucination.

What “tight mission” looks like

Instead of "Update backend for new contract," give the backend agent:

"Update endpoint X to accept field Y, keep field Z backwards compatible, update validation, and add two integration tests. Do not change any frontend files. Output a patch and a brief rationale."

Boundaries are a feature, not a constraint.

Step 2: Run parallel review on purpose

This is the part most teams miss. They run agents in parallel for implementation, then do review serially. Suhrab flips it: run review in parallel by design.

He suggests dedicated reviewers:

One agent hunts vulnerabilities
One agent checks style and consistency
One agent validates build and test commands on a clean clone

This mirrors what strong teams do with humans, but agents make it cheap to do every time.

"Agents argue in public in the thread. The Lead resolves, then commits." - Suhrab Khan

Two important details are embedded there:

Reviews must be visible. If agents are silently revising their own patches, you lose accountability and create conflicting fixes.
The Lead agent resolves disputes and controls the final commit. Without a single merge authority, you will optimize for output volume instead of correctness.

A simple commit protocol that prevents chaos

Adopt rules like:

Only the Lead agent commits to main (or to the integration branch).
Subagents propose changes as patch sets or PRs, never direct commits.
Every change must reference an acceptance test or acceptance criterion.
No cross-boundary edits unless the Lead approves a boundary update.

This sounds heavy until you compare it to hours lost in conflict resolution.

Step 3: Use “computer use” for the boring glue

Suhrab’s third step is a reminder that not all leverage is in writing code. "Computer use" is ideal for repetitive workflows: portals, forms, multi-app data entry, and end-to-end transactions tied to a ticket.

The principle: let agents do the rote steps that humans hate, and keep humans for judgment calls.

Examples where this works well:

Reproducing a bug by following a QA script across tools
Creating and linking tickets, changelog entries, and release notes
Running a staging verification checklist and capturing evidence
Validating permissions and roles in an admin UI

This is also where you can reduce integration risk: agents can execute the full checklist consistently.

The three non-negotiables Suhrab lists

Suhrab summarizes the operating model into three checkboxes:

"1) Map ownership and interfaces. 2) Write acceptance tests first. 3) Enforce a commit protocol."

If you do only one thing, do acceptance tests. They create a shared definition of done that every agent can optimize against.

If you do two things, add ownership mapping. It reduces overlap and accidental cross-layer edits.

If you do all three, you will feel the difference immediately: fewer conflicts, fewer regressions, and cleaner parallelism.

Picking the right workflow for a 48-hour ship window

Suhrab ends with a practical challenge: choose one workflow you keep punting, then hand it to a Lead agent plus three specialists.

If I had to choose, here is a rule of thumb:

Legacy migration: great if the boundaries are clear and you can verify with data checks.
Parallel code review: great as a first adoption step because it improves quality without changing architecture.
Cross-layer feature: only if you already have contracts and tests, otherwise it will expose every gap at once.

That is the point. Agent teams do not hide the cracks. They light them up.

This blog post expands on a viral LinkedIn post by Suhrab Khan. View the original LinkedIn post →