The One Step I Never Hand to a Subagent
Suneet Malhotra
Jun 22, 2026
The routine that writes this blog is allowed to spawn other agents. When the morning content job runs, it fans out three of them in a single shot: one reads the news feeds and scores them against what my trading engine actually did, one queries the database tables and pulls the most postable events, one reads the last ten posts' metrics and flags which hooks are running hot. Three agents, three contexts, all working at once. Then their outputs come back, and the routine does the one thing it is explicitly forbidden from delegating. It writes the post itself.
That instruction is sitting in my own configuration in plain language: dispatch subagents for the gathering, never dispatch the drafting, because voice is the main loop's job. For a while I read that as a stylistic preference. It is not. It is the correct architecture, and the reason is the same reason a forty-year-old result about parallel computing is still true.
Some work shards and some does not
Amdahl's law says the speedup you get from throwing more processors at a task is capped by the fraction of the task that has to run in sequence. If ten percent of a job is irreducibly serial, you can buy infinite hardware and never go more than ten times faster. The serial fraction is the ceiling, and no amount of parallelism touches it.
An agent fleet obeys the same law. The question for any step is not "can I run this on more agents" but "is this step serial or parallel by nature." Retrieval is parallel. Reading seven news sources, querying five database tables, pulling metrics on ten posts: these are independent jobs whose results you can collect in any order and concatenate without loss. Three blind agents each handing back a paragraph of findings merge into a clean brief. Nothing about source three depends on what source one said.
Synthesis is serial. Deciding the single topic for the day, holding one register across nine hundred words, planting a callback in paragraph two that pays off in the close, signing one name to the whole thing: none of that decomposes. It is a single function over the entire artifact, and it has to be computed in one place that can see the entire artifact.
Why a voice will not fan out
The technical reason voice resists sharding is that it is a global constraint, and a global constraint cannot be satisfied by independent local optimizers. Each subagent can only see its own slice. If I hand three agents two paragraphs each of "my" blog post, each one writes fluent prose, and each one writes it as a slightly different person. One leans formal, one drops a joke, one over-explains. Stitched together you do not get my voice at lower cost. You get a committee, and a reader feels the seams even when they cannot name them.
This is the failure mode I watch for now, because it is the seductive one. Fanning out the writing feels like the same win as fanning out the reading. It is not. The reading got faster and stayed correct. The writing got faster and stopped being a voice. The thing that made the parallelism free for retrieval, that the pieces are independent, is exactly the thing that makes it ruinous for synthesis, because the whole value of a voice is that the pieces are not independent.
The vendors shipping multi-agent orchestration this season, a lead agent delegating to a swarm of specialists, are selling the parallel half of this and they are right to. The discipline they cannot ship for you is the line: what stays in the one head that signs the work.
The mirror mistake
The rule cuts both ways, and the opposite error is just as real. Doing serially what should fan out is the other way to get it wrong. If my routine read seven sources one at a time inside a single context, it would burn the window, slow the run, and crowd the eventual drafting with raw material it does not need in front of it. The fix there is the same insight pointed the other direction: that work is parallel, so parallelize it, and let each blind reader return only its conclusion.
So this is a sorting rule, not a preference for centralization. Before any step, I ask one question: can the outputs be merged mechanically, or do they have to share a single consistent identity. If they merge, fan them out and keep nothing in the main context but the conclusions. If they have to cohere as one thing, one agent does it, and that agent gets to see everything.
Almost every step of this job lands cleanly on one side. Gather wide, then narrow to one head for the judgment. The mistake that looks most like progress is widening the part that was never supposed to widen. I can fan out the reading to as many agents as I have work for. I cannot fan out the writing, because a voice is not something eight heads can hold at once. It is the one thing that has to come from a single place, or it stops being a voice and starts being a meeting.
Share this post
You Might Also Like
The Number My Model Is Not Allowed to Know
There is a rule I enforce across every agent I run, and it has nothing to do with how good the model is. The model writes the words. It never computes the numbers.
AI & AutomationThe Agent Edit I Almost Merged
An agent rewrote a forty-line function in my risk module. The diff was clean. The tests passed. The reason one of the tests passed is what I almost missed.
Quantitative TradingThe Ninety Minutes My Engine Sits Out
My stock engine refuses to open any new position after 2:30 PM ET. It surrenders the most active hour of the day on purpose. Here is the arithmetic behind the refusal.
Career & Best PracticesThe Numbers I Used to Ask You to Trust
My April posts reported measured numbers you had to take on faith. My recent ones derive every figure from public config. The change was not discipline. It was topology.
Latest Blog Posts
The Ninety Minutes My Engine Sits Out
My stock engine refuses to open any new position after 2:30 PM ET. It surrenders the most active hour of the day on purpose. Here is the arithmetic behind the refusal.
The Numbers I Used to Ask You to Trust
My April posts reported measured numbers you had to take on faith. My recent ones derive every figure from public config. The change was not discipline. It was topology.
Five Up, Three Down, Even Money
My bracket risks 3% to make 5%, which reads like a favorable bet. On a price with no drift it is exactly break-even, and the reason is a theorem, not a coincidence.
Related Tools & Demos
Multi-Model LLM Harness
One interface to call any AI model — capability routing, fallback chains, budgets, circuit breakers, and a quality feedback loop. A practical architecture pattern write-up.
Automated Trading System
Multi-engine trading platform with real-time risk management, regime-based strategy selection, and automated order execution.
View Source Code →Personal Health Analytics
Multi-modal health data platform integrating wearables, lab results, and lifestyle tracking with predictive habit modeling.
View Source Code →
Stay in the Loop
Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.
No spam, ever. Unsubscribe anytime.