Career & Best Practices5 min read

The Threshold I Almost Lowered

S

Suneet Malhotra

May 10, 2026

1 views
The Threshold I Almost Lowered - Career & Best Practices blog post

The bias-score gate on my options engine has been refusing trades on roughly 47 of every 60 sessions. The threshold is 65 on a 0-to-100 directional conviction scale. Below that score, no trade. Above it, the engine is allowed to size and enter.

For most of the past month I have been quietly considering lowering it to 60.

This Sunday morning I sat with the question one more time and decided not to. The exercise of writing down why is more useful to me than the decision itself, so I am writing it here.

The case for lowering it

The engine is, by my own metrics, undertrading. Forty-seven zero-trade days out of sixty is a long time for a system to sit on its hands. It is a wide opportunity cost even if every trade it did take was a winner, and they have not been. Win rate through 25 closed trades is 56 percent and median R is 0.34, which is acceptable but not the kind of edge you want to leave on the bench by being too selective.

Lowering the threshold from 65 to 60 is also numerically small. On the bias score scale, 60 is barely a different category. The intuition says this is a small change with a large operational effect: more trades, similar quality.

That intuition is exactly the shape of the mistake.

The math I did instead

I went back to the engine's signal log for the last 60 sessions and counted the days where the bias score landed in the 60-to-65 band. There were 14 of them. The directional outcome of those days, computed as where the underlying closed two days after the score fired, was 7 winners and 7 losers. Mean directional move on those days was -0.18 R. Median was -0.05 R.

A 7-7 split on a 14-day sample tells me almost nothing on its own. The 95 percent confidence interval around a 50 percent win rate on 14 trials is roughly 23 to 77 percent. Anything I conclude from that band would be an exercise in confirmation, not measurement.

But it tells me one specific thing. The band between 60 and 65 is not visibly biased toward winners. If it were, even on this sample, I would expect to see a 9-5 or 10-4 lean. I see noise.

The honest read is: I do not have evidence the 60-to-65 band is profitable. I have a desire for the engine to trade more, and that is a different thing.

The general shape of this mistake

The temptation to lower a threshold to increase activity is one of the durable failure modes in systematic trading, and it is durable because it always feels rigorous. There is always a band of marginal cases. There is always a story for why those cases might be worth taking. There is rarely enough sample to know.

The way I would catch this in someone else's work is to ask two questions. First, what is the prior reason this threshold sits where it does: was it set on theory, on a backtest, or on a hunch? Second, what is the smallest change of evidence that would justify moving it?

For 65, the answer to the first is documented in the engine config. It is the score at which the directional model's historical hit rate crosses the breakeven point for the cheapest strategy class the engine deploys. Below 65, the realized hit rate is below breakeven on the original calibration sample. Whether 65 is a precisely-tuned cliff or a defensible round number is something I have not fully re-validated, and that is itself an argument against changing it: I would be moving a parameter I do not fully understand, in the direction my impatience prefers.

For the second, the answer is: a 200-day live sample showing the 60-to-65 band winning at 60 percent or better, with a stable mean R. I do not have 200 days. I have 60, of which only 14 are in the candidate band.

The change would be premature. The reason it would be premature has nothing to do with whether 60 is the right number. It has to do with whether I have measured anything yet.

The Sunday rule

The whole point of leaving a parameter alone for a defined sample size is that the parameter is a hypothesis, and the sample is the test. Re-tuning mid-test is a different system, not a better-tuned version of the same one. Every time I lower a threshold halfway through a sample, I have to mentally restart the count, because the data before and after the change are no longer the same experiment.

The harder discipline is not picking the right number. It is letting the experiment finish.

I am keeping the threshold at 65. I will revisit it at 200 sessions, which is roughly the same sample-size threshold I argued for last week when I stopped quoting Sharpe. If the 60-to-65 band has shown a consistent edge by then, I will lower it. If it has not, I will not.

The pull request stays unmerged. Sunday is for reading the diff, not committing it.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.