How good are your beliefs?

This article is the introduction on a series of scoring rules, a type of loss function defined on probability distributions or their functions.

We are constantly making predictions - large and small - every moment of our day.

In Superforecasting: The Art and Science of Prediction, political scientist Philip Tetlock revealed that based on decades of research, the average “expert” was roughly as accurate as a “dart-throwing chimpanzee” in forecasting global events. But he also discovered that a small group of people turned out to have genuine skill with the key: they were great at being wrong, then learning from their mistakes to course-correct.

You can try a simple form of this prediction, and test your forecasting abilities here.

Brier Scores

Let’s take a look at one way we can validate our beliefs: Brier scores, a number between 0 (best) and 1 (worst) that measures the accuracy of probabilistic forecasts for binary outcomes. To use the Brier score to learn from our mistakes and sharpen our decisions, we need to do two things:

  1. Make a probabilistic forecast for a binary outcome.
  2. Calculate the Brier score for your forecast.

A probabilistic forecast is a number between 0 and 1 that expresses your level of confidence in your forecast, and a binary outcome is something that either will or will not happen. So for example, if you forecast that there is an 80% chance of rain tomorrow, this is a probabilistic forecast for a binary outcome since 0.8 is a number between 0 and 1, and it will either rain tomorrow or it won’t.

Actually calculating the Brier is quite simple: it’s the mean squared error between the true outcomes and our forecasts, where 0 and 1 are the best and worst possible scores respectively.

brier_score = sum((actual_outcomes - forecasts)^2) / number_of_forecasts

For example, if I expect a 40% chance (forecast_1 = 0.4) of snow tomorrow and an 70% chance (forecast_2 = 0.7) of rain the day after, and the weather turns out to be sunny tomorrow (actual_outcome_1 = 0) and rainy the day after (actual_outcome_2 = 1). Then my Brier score would be:

brier_score = ((actual_outcome_1 - forecast_1)^2 + (actual_outcome_2 - forecast_2)^2) / 2
            = ((0 - 0.4)^2 + (1 - 0.7)^2) / 2
            = 0.125

This isn’t too bad, considering the ideal score is only 0. This may seem strange, since only 1 of my 2 forecasts were correct. It turns out that this is because I assigned a high probability (70%) to the forecast I got correct, and a very low probability (40%) to the one I got wrong. In other words, the level of confidence I have in my forecasts matters more than the number of forecasts I get correct. To see this, if I flip the probabilites above, this time my Brier score would be:

brier_score = ((actual_outcome_1 - forecast_1)^2 + (actual_outcome_2 - forecast_2)^2) / 2
            = ((0 - 0.7)^2 + (1 - 0.4)^2) / 2
            = 0.425

I was punished for not knowing what I was doing, assigning a high probability to the forecast I got wrong this time, which bumped up my Brier score.

This is the beauty of the Brier score - those who know are rewarded, and those who don’t are punished - it demands correct confidence from forecasters. In this way, the Brier score can help us learn from our mistakes, becoming better decision makers, so why not give it a try?


Govind Gnanakumar image
Govind Gnanakumar

Hunting Flutter devs through the multiverse