OpenClaw Tutorials4 min read

The Sharpe Number I Stopped Quoting

S

Suneet Malhotra

May 09, 2026

1 views
The Sharpe Number I Stopped Quoting - OpenClaw Tutorials blog post

For the first three months of running the systematic options engine on a paper account, I quoted a Sharpe ratio in my Telegram daily summary. Some days it was 1.4. Some days it was 0.6. Once, briefly, it was 2.8. I have stopped quoting it, and the reason is the kind of methodological mistake I would have caught immediately in someone else's work. I made it in mine because I wanted the headline.

The number that swung too much

The engine has been live on paper for about 60 trading sessions. I computed a rolling 30-day Sharpe across those sessions, walking the window forward day by day. The values ranged from 0.4 to 2.8. Same engine, same parameters, same universe. Different 30-day window.

That is not a measurement. That is a function of which slice of sample noise I happened to draw the box around.

There is a textbook reason for this. The Sharpe ratio is the mean of daily returns divided by their standard deviation, then annualized. On 30 sessions, both the numerator and the denominator are small-sample estimators. The mean is dominated by the two or three biggest days. The standard deviation is dominated by the same two or three biggest days. When you divide one by the other, the ratio is volatile in a way that has nothing to do with the underlying strategy.

There is also a structural reason that hits a paper-trading engine harder than a continuously-live book. On 47 of those 60 sessions the engine made no trade at all. The daily return on those days is approximately zero. The standard deviation across mostly-zero days with a handful of non-zero days is not a stable number. It bounces with the inclusion or exclusion of a single big day.

What I reported instead

I switched the headline metric to three things, none of which require a square root.

First, win rate on closed trades. Through 25 closed trades, that has been 56 percent. It is also a noisy estimator on a small sample, but it is at least an estimator of a thing I can describe in plain English.

Second, median R-multiple. R is the realized P&L on a trade divided by the trade's defined-risk maximum loss at entry. Median R has been 0.34. Mean R has been 0.41. I report both, because the gap between them tells me whether wins are coming from a couple of big trades or a steadier middle.

Third, the actual drawdown shape. Two specific facts: the worst single-day P&L in the period, and the longest streak of consecutive losing days. Both are real numbers a reader can hold. Neither pretends to be an annualized risk-adjusted return.

The general principle

Sharpe is a population statistic that practitioners often quote as if it were a small-sample one. The original Sharpe formulation assumed a stable mean and variance over the measurement window and a return distribution close to normal. Sixty sessions of options trades, with 47 zero days and a long tail on the rest, satisfies neither assumption.

This is not a problem unique to me. It is the reason credible track records tend to report Sharpe with explicit period boundaries, annualized over multi-year samples, and avoid quoting it on the most recent quarter. The discipline is to measure the metric over enough independent observations that the estimator is stable. Two hundred and fifty sessions is the rough threshold I have seen suggested in practitioner literature, and it lines up with my own re-sampling experiments on this engine. Below that, the rolling-window swings are large enough to flip the qualitative read.

The point is not that Sharpe is a bad metric. It is a good metric. The point is that quoting it on a sample where the estimator has not yet stabilised is the same shape of error as quoting a daily P&L change as if it were a monthly return. The math is fine. The frame is wrong.

What it would take to put Sharpe back

I will start quoting Sharpe again when three things are true. The sample is at least 250 sessions on a stable parameter set. The return distribution is no longer dominated by zero days, which means the engine is taking trades on most sessions, not most weeks. And the rolling 30-day Sharpe stays inside a one-point band across the trailing year.

Until then, the headline is win rate, median R, and the actual drawdown number. They are smaller, plainer, and more honest. The cost of quoting them instead of Sharpe is some legibility for readers who expect the standard metric. I am willing to pay it.

The cost of quoting Sharpe was implying I had measured something I had not.

Share this post

You Might Also Like

Stay in the Loop

Get weekly insights on AI-driven QA, engineering leadership, and automation strategies.

No spam, ever. Unsubscribe anytime.