The day after Donald Trump effectively secured the Republican nomination for President, when Senator Ted Cruz withdrew from the race, back in the mists of time – 4 May 2016 – I conducted a snap poll amongst Australian diplomatic staff in Israel. The question posed: what odds did they now assign to Trump eventually winning the Presidency?
Some gave him odds as high as 60%. Some gave him as low as 2%. The average was 27% – not a bad prediction, all things considered. (Nate Silver, on the eve of the election, gave Trump a 30% chance of winning.)
After this snap poll, I distributed copies of Philip Tetlock and Dan Gardner’s excellent book, Superforecasting: The Art & Science of Prediction to each member of staff, and started reading it myself.
As I did so, an obvious realisation began to dawn. All diplomats are part of the ‘forecasting’ profession, but we do not spend much time at all consciously thinking about how we go about our forecasting, or how we might improve it.
We make (and are paid to make) judgements and assessments all the time about what the future is likely to hold, and its implications for Australia and Australian interests. But are we any good at it? How often do we get things right? Can we improve our performance?
I came away with several lessons and conclusions from my read of Superforecasting.
Firstly, the ‘forecasting’ business is deeply unprofessional, a little like the practice of medicine until the 20th century. There is no measurement, no data, no reviews or post-mortems (except in extreme cases – think Iraq WMD). Professional pundits are rarely assessed against their track record or held accountable for their failures of insight. With no assessment of effectiveness, there is no ability to identify which methods and tools work and which ones don’t, and hence no possibility of improvement.
Secondly, we offer judgements that are very squidgy. We tend to offer judgement and predictions in qualitative terms (“the risks of X are rising”), or as a description of factors at work (“on the one hand … on the other”), and often using indeterminate timeframes (“in the medium term”). The end result means that our prediction, whatever it is, is so hedged in with caveats, and so difficult to tie down, that it can never be proven wrong. As a result, our backs are covered, but the value of such analysis to decision-makers is limited.
Thirdly, we need to be more comfortable with degrees of likelihood, or probability, and less demanding for binary judgements. We often tend to say we think something is either likely to happen, or unlikely to happen, or could go either way (i.e. an even bet). But there is a world of difference (and hence implication) between saying something is 5% likely versus 40% likely. And if we say we think an event is 70% unlikely to happen, and it in fact happens, this does not mean our prediction is wrong – the corollary is that we thought there was a 30% likelihood of the event happening.
Fourthly, crowd-sourced judgements are nearly always better than that of an individual, no matter how talented or well-informed that individual. This is because of the asymmetry with which information is held, and the fact that countervailing biases tend to cancel one another out in large enough groups.
Newly-endowed with the wisdom gained from Superforecasting, the Embassy staff agreed to embark on a forecasting exercise on questions and topics that fell squarely within our work responsibilities.
We posed ourselves 33 questions, from domestic politics (will the composition of Israel’s government change? will a Palestinian unity government be formed?), to the peace-process (will direct, final-status negotiations resume?), to the risks of regional conflict involving Israel. Each of the questions is time-bound, and for each we were required to give an answer on the basis of a percentage likelihood. We’ve structured the scoring to incentivise more confident answers (so someone who predicts something with a 90% likelihood gets more points than someone who predicts it with only a 60% likelihood, but conversely loses more points if their prediction turns out to be wrong).
One of the questions we asked ourselves, for instance, was the likelihood of a Resolution being adopted by the UN Security Council which condemned Israeli settlement activity before 20 January 2017 (we asked this question back in October). Some people put the odds of this as high as 60%; others as low as 5%; and the average (or crowd-sourced) answer was 33%. It turned out that this in fact happened – UNSCR 2334 was adopted on 23 December 2016.
Most of the questions have a timeframe out to 1 May 2017, so we will only be able to assess our effectiveness, and identify any hidden ‘superforecasters’ amongst us, at that point.
But already the exercise has fostered some fresh curiosity, and instilled some extra discipline, in our work.
It has thrown a light on what are the more important and consequential questions we need to answer, and the links between some of them.
We now throw around probabilities when we talk about scenarios (“80% likely”), rather than just reaching for qualitative terms whose ambiguity is high (“probable”).
We adjust our estimates in response to new developments and evidence, recognising that predictions cannot remain static.
Most importantly, it has instilled some humility. We’ve recognised that the future is, by definition, unknowable, and that even low-probability events will happen from time to time. As a result, we should never put too much faith in forecasts, and we should always be prepared for scenarios at odds with conventional wisdom. (And 2016 should have brought that home to all of us involved in some way in international politics.)
As a new Administration takes office in Washington, we have a whole new series of questions and scenarios to ponder, including in the Middle East. The forecasters will be busy.