How It Works: Part Two

This is the second article in a short series describing how the Buckley’s & None forecast works. In part one, I gave a brief introduction to the Australian electoral system and highlighted the importance of two-party preferred vote in Australian elections. If you haven’t already, I’d recommend you quickly read part one before continuing with part two.

Let’s start with a graph.

Two-Party Preferred vs. Seats

Here the horizontal axis is the two-party preferred margin. The two-party preferred vote is usually presented as two numbers, for example ALP 52% and Coalition 48%. These numbers always add up to 100% because there are only two parties included. Instead of using two numbers we can calculate the margin between the two parties and represent the same information as one number. For example, we can subtract the two-party preferred vote of the ALP (52%) from the Coalition (48%) and get a two-party preferred margin of -4% (48%-52%). The order of the parties is arbitrary, we could equally have swapped the ALP and the Coalition. The order I have chosen has the benefit of putting the ALP on the left side of the axis and the Coalition on the right, with Tony Abbott on the very far right.

The vertical axis here is also a margin. It is the margin of seats between the major parties as a percentage of all seats won by the major parties. We’re going to ignore the crossbench at the moment and deal with them in part four of this series. Again, I’ve (arbitrarily) defined positive as more Coalition seats, and negative defined as more ALP seats.

Perhaps the simplest way to model the relationship between two-party preferred margin and seat margin is to draw a straight “line of best fit” through these points.

Fitting a Line

This is great! Now if someone tells us the national two-party preferred vote, we can give them a prediction of the seat margin between the two major parties. For example, if the national two-party preferred margin is -4% (ALP 52%, Coalition 48%) we’d predict the ALP will win 6.2% more seats than the Coalition. See if you can find this point on the graph. This works out to around 9 seats in the House of Representatives (there are 151 seats and I’m assuming a handful of them go to the crossbench).

This is a great start but drawing a line of best fit has some limitations. As you can see in the graph above our line of best fit doesn’t fit our data perfectly, most of the point are some distance away from our line. You can think of a line of best fit as the “best guess” we have as to what the outcome will be but there will almost always be some error associated with it. To make an even better forecast we want to measure that error and incorporate it into our predictions.

At Buckley’s & None, we like to think probabilistically. This means instead of asking “what is the most likely outcome?”, we like to ask “what are the different likelihoods of a wide range of outcomes?”. We use Bayes’ Theorem, a statistical method, to help answer this question. Explaining the mechanics of Bayes’ Theorem is beyond the scope of this article but I hope to return to it soon.

In practice, instead of a single line of best fit, we fit many, many lines through our data points and then determine the likelihood of each of the lines using Bayes’ Theorem. Below is a visual representation of what we do.

Fitting Lots of Lines

You can see there are lots and lots of lines underneath our data points. You can also see that there are more lines towards the middle of our data and fewer lines towards the edges of our data. The density of the lines (how closely they are packed together) represents the relatively likelihood of different outcomes.

Let’s look again at the situation where the ALP wins the two-party preferred vote by 4%. I’ve drawn a dashed line through this point on the graph above. Below, I’ve drawn a plot that represents the frequency of outcomes along that dashed line.

What happens when ALP wins TPP by 4%?

The most likely outcome when the ALP wins the two-party preferred vote by 4% is the ALP winning 5% to 7% more seats than the Coalition. This matches well with out line of best fit estimate of 6.2%. However, we also have a much clearer idea of the range of possible outcomes. There’s a ~14% chance the ALP wins by more than double that winning 14% more seats than the Coalition. I calculated this by adding all the height of all the bars further to the left than +14% ALP together. If we add all the blue bars on the right together we can see there’s an almost 20% chance the Coalition win more seats than the ALP. This is calculated by adding all the blue bars together.

In summary, we plotted the same data three times and analysed it two different ways. Once using a line of best fit and once using “Bayesian linear regression”. Using Bayesian linear regression we can get a clear idea of the uncertainty in our forecast. Everything we’ve talked about in this article has assumed we know what the two-party preferred vote is. I’ve used ALP 52%, Coalition 48% throughout. The problem is, we don’t know the two-party preferred vote until election day. To make a forecast before an election, we have to estimate the two-party preferred vote using opinion polls. This might be the most exciting part of the forecast and I’ll describe it in part three.