Misleading Metrics Like Accuracy Are Not Worth Your Time

Read full article at The Capital.

Do not invest in a model tuned to accuracy: probability weighted accuracy, and others, are more trustworthy.

Accuracy as a metric, in trading, is misleading. The definition of accuracy, as it relates to a single trading strategy, is the total number of days a strategy makes money over the total number of days the strategy took a position. This misses at least one key point in trading applications: profits and losses.

Photo by Ricardo Arce on Unsplash


In his recent books, Marcos Lopez de Prado teaches key principles when using metrics to evaluate a trading strategy. The points below are summaries of principles pulled from his work and my own thoughts on the matter.

(The code chunks are pulled from his work too, but I adjusted them only slightly to make them make sense to me. Hopefully my minor adjustments help you too.)

Better Metrics

For machine learning training purposes, we simply need a metric that mirrors how we want to evaluate a strategy’s performance.

Sizing your bets incorrectly will get you into more trouble than incorrectly guessing the direction of price moves. Yes, if you make a bet and are wrong, then you will lose money, but the goal is to lose small amounts when you are wrong and to make large amounts when you are right. Savvy?

Log Loss

Compared to accuracy, a better metric is log loss. Log loss “rewards” correct predictions assigned higher confidence, while it “punishes” incorrect predictions assigned higher confidence. This is to say, a good log loss score is one where long and short bets are made accurately with predicted probabilities that are high and where long and short bets are made incorrectly with predicted probabilities that are low. My complaint with log loss is that it is difficult to interpret.

Log loss is difficult to interpret, but you should check out this Stack Exchange question and the answer given by Fed Zee. Simply put, Fed Zee shows some complexities to log loss as he compares log loss scores to accuracy.

A lower log loss score is better, but the best way to use log loss is by negating it; then, higher values are better, just like all other metrics (e.g., accuracy, recall, F1, etc.). scikit-learn’s implementation is sufficient.

from sklearn.metrics import log_loss
probabilities = clf.predict_proba(X_test)
neg_log_loss = -log_loss(
y_test, probabilities, w_test, labels=clf.classes_)

Weighted Accuracy

To retain the interpretability of accuracy and to extend its functionality, you can use weighted accuracy. Weighted accuracy will compute accuracy but give higher or lower weights based on your input.

One way to make this metric valuable is to pass in returns as weights. Correct predictions which would have made more money receive higher weight. Incorrect predictions, which would have lost more money also receive higher weights. Correct and incorrect predictions which would have returned low profits or losses receive lower weights. This emulates how we want to evaluate strategies.

def weighted_accuracy(yn, wght, normalize=False):
Weighted accuracy (normalize=True), or weighted sum

:param yn: indicator function yn E {0,1} where yn = 1 when
prediction was correct, yn = 0 otherwise
:param: wght: sample weights
if normalize:
return np.average(yn, weights=wght)
elif wght is not None:
return np.dot(yn, wght)
return yn.sum()

Probability Weighted Accuracy

You can probably guess, by its name, what this metric is measuring. Yes! You are correct. Probability weighted accuracy, introduced by Marcos Lopez de Prado, in Machine Learning for Asset Managers, uses probabilities to weight accuracy, similarly to log loss, however, it is much more interpretable.

Probability weighted accuracy punishes bad predictions made with high confidence more severely than accuracy, but less severely than log-loss. — Marcos Lopez de Prado

def probability_weighted_accuracy(yn, pn, K):
PWA punishes bad predictions made with high confidence more
severely than accuracy, but less severely than log-loss

:param yn: indicator function yn E {0,1} where yn = 1 when
prediction was correct, yn = 0 otherwise
:param pn: max k {pn,k} where pn,k is the prob associated
with prediction n of label k
:param K: num of classes
return np.sum(
(yn * (pn.max(axis=1) - (K * 1.) ** -1))[:-1]) / np.sum(
(pn.max(axis=1) - (K * 1.) ** -1)[:-1])


Use trustworthy evaluation metrics, like log loss, weighted accuracy, or probability weighted accuracy, when you tune machine learning models in the time series and finance realm. Do not trust accuracy, you may be misled.


[1] M. Lopez de Prado, Machine Learning for Asset Managers (2020), Cambridge Elements

Check out our new platform 👉 https://thecapital.io/



Misleading Metrics Like Accuracy Are Not Worth Your Time was originally published in The Capital on Medium, where people are continuing the conversation by highlighting and responding to this story.

This article is strictly for informational purposes only. It is not a direct offer or solicitation of an offer to buy or sell, or a recommendation or endorsement of any products, services, or companies. CryptosOnline.com does not provide investment, tax, legal, business or accounting advice. Neither the company nor the author is responsible, directly or indirectly, for any loss or damage caused or alleged to be caused by, or in connection with, the use of or reliance on any content, goods, services or opinions mentioned in this article.

#Bitcoin #Crypto #Cryptocurrency