AI & Automation5 min read

The One Step I Never Hand to a Subagent

S

Suneet Malhotra

Jun 22, 2026

1 views
The One Step I Never Hand to a Subagent - AI & Automation blog post
🔧LLM Agents🔧Multi-Agent Systems🔧Automation

The routine that writes this blog is allowed to spawn other agents. When the morning content job runs, it fans out three of them in a single shot: one reads the news feeds and scores them against what my trading engine actually did, one queries the database tables and pulls the most postable events, one reads the last ten posts' metrics and flags which hooks are running hot. Three agents, three contexts, all working at once. Then their outputs come back, and the routine does the one thing it is explicitly forbidden from delegating. It writes the post itself.

That instruction is sitting in my own configuration in plain language: dispatch subagents for the gathering, never dispatch the drafting, because voice is the main loop's job. For a while I read that as a stylistic preference. It is not. It is the correct architecture, and the reason is the same reason a forty-year-old result about parallel computing is still true.

Some work shards and some does not

Amdahl's law says the speedup you get from throwing more processors at a task is capped by the fraction of the task that has to run in sequence. If ten percent of a job is irreducibly serial, you can buy infinite hardware and never go more than ten times faster. The serial fraction is the ceiling, and no amount of parallelism touches it.

An agent fleet obeys the same law. The question for any step is not "can I run this on more agents" but "is this step serial or parallel by nature." Retrieval is parallel. Reading seven news sources, querying five database tables, pulling metrics on ten posts: these are independent jobs whose results you can collect in any order and concatenate without loss. Three blind agents each handing back a paragraph of findings merge into a clean brief. Nothing about source three depends on what source one said.

Synthesis is serial. Deciding the single topic for the day, holding one register across nine hundred words, planting a callback in paragraph two that pays off in the close, signing one name to the whole thing: none of that decomposes. It is a single function over the entire artifact, and it has to be computed in one place that can see the entire artifact.

Why a voice will not fan out

The technical reason voice resists sharding is that it is a global constraint, and a global constraint cannot be satisfied by independent local optimizers. Each subagent can only see its own slice. If I hand three agents two paragraphs each of "my" blog post, each one writes fluent prose, and each one writes it as a slightly different person. One leans formal, one drops a joke, one over-explains. Stitched together you do not get my voice at lower cost. You get a committee, and a reader feels the seams even when they cannot name them.

This is the failure mode I watch for now, because it is the seductive one. Fanning out the writing feels like the same win as fanning out the reading. It is not. The reading got faster and stayed correct. The writing got faster and stopped being a voice. The thing that made the parallelism free for retrieval, that the pieces are independent, is exactly the thing that makes it ruinous for synthesis, because the whole value of a voice is that the pieces are not independent.

The vendors shipping multi-agent orchestration this season, a lead agent delegating to a swarm of specialists, are selling the parallel half of this and they are right to. The discipline they cannot ship for you is the line: what stays in the one head that signs the work.

The mirror mistake

The rule cuts both ways, and the opposite error is just as real. Doing serially what should fan out is the other way to get it wrong. If my routine read seven sources one at a time inside a single context, it would burn the window, slow the run, and crowd the eventual drafting with raw material it does not need in front of it. The fix there is the same insight pointed the other direction: that work is parallel, so parallelize it, and let each blind reader return only its conclusion.

So this is a sorting rule, not a preference for centralization. Before any step, I ask one question: can the outputs be merged mechanically, or do they have to share a single consistent identity. If they merge, fan them out and keep nothing in the main context but the conclusions. If they have to cohere as one thing, one agent does it, and that agent gets to see everything.

Almost every step of this job lands cleanly on one side. Gather wide, then narrow to one head for the judgment. The mistake that looks most like progress is widening the part that was never supposed to widen. I can fan out the reading to as many agents as I have work for. I cannot fan out the writing, because a voice is not something eight heads can hold at once. It is the one thing that has to come from a single place, or it stops being a voice and starts being a meeting.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.