The Number My Model Is Not Allowed to Know
Suneet Malhotra
Jun 17, 2026
There is a rule I enforce across every agent I run, and it has nothing to do with how good the model is. The model writes the words. It never computes the numbers.
That sounds like a stylistic preference. It is not. It is the single line that decides which parts of my stack are safe to leave unsupervised and which parts would quietly poison themselves if I did. The model that drafts this post, the model that sizes nothing in my trading engine, the model that picks today's topic but reports zero performance figures of its own: they all live on the same side of one boundary, and the boundary is not about capability.
Two kinds of work
Every value an agent produces falls into one of two bins. It is either checkable or it is fuzzy.
Checkable means there is an oracle. A win rate is a count divided by a count; either it matches the database or it does not. A position size is one percent of equity divided by the stop distance; either the arithmetic holds or it is wrong. A circuit breaker either trips at three thousand dollars of loss or it has a bug. For all of these there exists, somewhere, a deterministic computation that produces the one right answer, and a test that can confirm it.
Fuzzy means there is no oracle. Is this sentence on-voice. Is this the right topic for a Wednesday. Does this paragraph land or drag. There is no function I can write that returns the correct answer, because there is no single correct answer. Judgment is the whole job.
The model is extraordinary at the fuzzy bin and merely competent at the checkable one. But competence is not the deciding factor. The deciding factor is the question I actually care about: if it gets this wrong, will I find out?
The asymmetry of a wrong number
Here is what took me too long to internalize. A wrong number from my code and a wrong number from my model fail in opposite directions.
When my Python sizes a trade incorrectly, something breaks loudly. The order math throws, a risk gate rejects it, the position reconciles against the broker and the mismatch surfaces, or a unit test goes red before it ever ships. Deterministic code that computes a number wrong tends to announce itself, because the wrongness is mechanical and mechanical wrongness leaves evidence.
When my model computes a number wrong, none of that happens. It produces a figure that is plausible, formatted correctly, surrounded by a paragraph of confident context, and carrying my byline. There is no exception. There is no red test. The number reads exactly like a right number, because the model is not bad at being wrong, it is good at being fluent. A fabricated fifty-eight percent win rate and a real one look identical in the output. The only place the truth still exists is the data the model was supposed to read, and nothing forces the output to match it.
So the asymmetry is this. A wrong number from my code throws an exception. A wrong number from the model gets a paragraph of context and my name on it. One of those I will catch today. The other I might quote for a month.
Where I draw the line
In the trading engine, every number lives in code. The one percent risk per trade, the three percent stop, the three thousand dollar daily halt, the ten position cap, the rule that no directional options trade fires below a bias score of sixty-five: all of it is plain Python, in files I can read and test, with not a single value the model is permitted to invent at runtime. The model is nowhere near the money. It cannot size, cannot gate, cannot decide. That is deliberate.
In the writing agent, the same line holds in the other room. The model drafts the prose, which is the fuzzy work it should own. But it does not compute its own performance figures. Every number a post cites comes from the database, pulled by deterministic code and handed to the model as a fact to phrase, not a quantity to derive. The model's job is to phrase the number. It is never to know it.
The line cuts both ways
The mirror mistake is just as expensive: putting fuzzy work in code. I have watched people try to enforce tone with regexes, gate quality with keyword lists, template their way around the part of the task that is irreducibly a judgment call. That fails in the opposite direction. It is rigid where the work needs to be soft, and it breaks on the first input that does not fit the template. There is no oracle for voice, so do not pretend a string match is one.
So the boundary is not a hierarchy with code on top. It is a sorting rule. Checkable work goes where wrongness is loud. Fuzzy work goes where there was never going to be a right answer to check against.
The test
The question I ask before I let any agent emit a value is not whether the model can produce it. It usually can. The question is whether an oracle exists. If one does, the number must come from the code that owns the oracle, and the model gets it as an input. If one does not, the model owns it and I supervise by reading, because reading is the only oracle there is.
Determinism, in the end, is not about what the model is capable of. It is about whether I will notice when it is wrong. Put the model where I cannot write the test, and keep the test's work in code.
Share this post
You Might Also Like
The Agent Edit I Almost Merged
An agent rewrote a forty-line function in my risk module. The diff was clean. The tests passed. The reason one of the tests passed is what I almost missed.
AI & AutomationThe Step in My Routine That Is Not Safe to Run Twice
A cron does not retry on failure. I do. And a blind rerun of a routine that already half-finished is the most reliable way I know to get a duplicate into production.
Quantitative TradingWhat a Fifteen-Minute Bar Forgets
Every indicator my engine trusts is computed on fifteen-minute bars. A bar is a summary of those minutes, and the summary throws away the one thing that moved the price: the path.
OpenClaw TutorialsThe Check That Passes Until the Day It Does Not
Every day my engine reconciles its own record of open positions against the broker's. Almost every day the two lists match. I do not run the check for those days.
Latest Blog Posts
What a Fifteen-Minute Bar Forgets
Every indicator my engine trusts is computed on fifteen-minute bars. A bar is a summary of those minutes, and the summary throws away the one thing that moved the price: the path.
The Check That Passes Until the Day It Does Not
Every day my engine reconciles its own record of open positions against the broker's. Almost every day the two lists match. I do not run the check for those days.
The Hour My Scheduler Loses Twice a Year
I reason about my agents in Pacific time. The machine runs them in UTC. I bridged the two with a fixed offset, and a fixed offset is wrong for half the year.
Related Tools & Demos
Automated Trading System
Multi-engine trading platform with real-time risk management, regime-based strategy selection, and automated order execution.
View Source Code →Personal Health Analytics
Multi-modal health data platform integrating wearables, lab results, and lifestyle tracking with predictive habit modeling.
View Source Code →AI Content Engine
Automated content pipeline with multi-platform distribution, engagement optimization, and editorial quality gates.
View Source Code →
Stay in the Loop
Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.
No spam, ever. Unsubscribe anytime.