AI & Automation5 min read

The Number My Model Is Not Allowed to Know

S

Suneet Malhotra

Jun 17, 2026

1 views
The Number My Model Is Not Allowed to Know - AI & Automation blog post
🔧Python🔧LLM Agents🔧Risk Management

There is a rule I enforce across every agent I run, and it has nothing to do with how good the model is. The model writes the words. It never computes the numbers.

That sounds like a stylistic preference. It is not. It is the single line that decides which parts of my stack are safe to leave unsupervised and which parts would quietly poison themselves if I did. The model that drafts this post, the model that sizes nothing in my trading engine, the model that picks today's topic but reports zero performance figures of its own: they all live on the same side of one boundary, and the boundary is not about capability.

Two kinds of work

Every value an agent produces falls into one of two bins. It is either checkable or it is fuzzy.

Checkable means there is an oracle. A win rate is a count divided by a count; either it matches the database or it does not. A position size is one percent of equity divided by the stop distance; either the arithmetic holds or it is wrong. A circuit breaker either trips at three thousand dollars of loss or it has a bug. For all of these there exists, somewhere, a deterministic computation that produces the one right answer, and a test that can confirm it.

Fuzzy means there is no oracle. Is this sentence on-voice. Is this the right topic for a Wednesday. Does this paragraph land or drag. There is no function I can write that returns the correct answer, because there is no single correct answer. Judgment is the whole job.

The model is extraordinary at the fuzzy bin and merely competent at the checkable one. But competence is not the deciding factor. The deciding factor is the question I actually care about: if it gets this wrong, will I find out?

The asymmetry of a wrong number

Here is what took me too long to internalize. A wrong number from my code and a wrong number from my model fail in opposite directions.

When my Python sizes a trade incorrectly, something breaks loudly. The order math throws, a risk gate rejects it, the position reconciles against the broker and the mismatch surfaces, or a unit test goes red before it ever ships. Deterministic code that computes a number wrong tends to announce itself, because the wrongness is mechanical and mechanical wrongness leaves evidence.

When my model computes a number wrong, none of that happens. It produces a figure that is plausible, formatted correctly, surrounded by a paragraph of confident context, and carrying my byline. There is no exception. There is no red test. The number reads exactly like a right number, because the model is not bad at being wrong, it is good at being fluent. A fabricated fifty-eight percent win rate and a real one look identical in the output. The only place the truth still exists is the data the model was supposed to read, and nothing forces the output to match it.

So the asymmetry is this. A wrong number from my code throws an exception. A wrong number from the model gets a paragraph of context and my name on it. One of those I will catch today. The other I might quote for a month.

Where I draw the line

In the trading engine, every number lives in code. The one percent risk per trade, the three percent stop, the three thousand dollar daily halt, the ten position cap, the rule that no directional options trade fires below a bias score of sixty-five: all of it is plain Python, in files I can read and test, with not a single value the model is permitted to invent at runtime. The model is nowhere near the money. It cannot size, cannot gate, cannot decide. That is deliberate.

In the writing agent, the same line holds in the other room. The model drafts the prose, which is the fuzzy work it should own. But it does not compute its own performance figures. Every number a post cites comes from the database, pulled by deterministic code and handed to the model as a fact to phrase, not a quantity to derive. The model's job is to phrase the number. It is never to know it.

The line cuts both ways

The mirror mistake is just as expensive: putting fuzzy work in code. I have watched people try to enforce tone with regexes, gate quality with keyword lists, template their way around the part of the task that is irreducibly a judgment call. That fails in the opposite direction. It is rigid where the work needs to be soft, and it breaks on the first input that does not fit the template. There is no oracle for voice, so do not pretend a string match is one.

So the boundary is not a hierarchy with code on top. It is a sorting rule. Checkable work goes where wrongness is loud. Fuzzy work goes where there was never going to be a right answer to check against.

The test

The question I ask before I let any agent emit a value is not whether the model can produce it. It usually can. The question is whether an oracle exists. If one does, the number must come from the code that owns the oracle, and the model gets it as an input. If one does not, the model owns it and I supervise by reading, because reading is the only oracle there is.

Determinism, in the end, is not about what the model is capable of. It is about whether I will notice when it is wrong. Put the model where I cannot write the test, and keep the test's work in code.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.