Method

The Irish Polling Indicator combines all national election polls to one estimate of political support for each party. The Irish Polling Indicator is maintained by Tom Louwerse (Associate Professor in Political Science, Leiden University, the Netherlands) and Stefan Müller (Assistant Professor and Ad Astra Fellow, School of Politics and International Relations, University College Dublin). Tom Louwerse created the Irish Polling Indicator when he was working at the Department of Political Science at Trinity College Dublin.

The approach used by the Irish Polling Indicator is described in detail in this article published in Irish Political Studies (Open Access).

The polls used are published national surveys by Behaviour & Attitudes, Ipsos MRBI, Ireland Thinks, Millward Brown, Panelbase and Red C Research.

Basic idea

The basic idea of the Irish Polling Indicator is to take all available polling information together to arrive at the best estimate of current support for parties. Polls are great tools for measuring public opinion, but because only a limited sample is surveyed, we need to take into account sampling error. By combining multiple polls, we can reduce this error.

Moreover, with so many polls going around it is difficult to get a random sample of voters to participate in any one public opinion survey. And those that do participate might not have a clear idea who to vote for, something that is often adjusted for in polls. This may lead to structural differences between the results of different polling companies, so-called house effects.

But how do you average two polls if one is conducted today, another one week ago and yet another one 3 weeks old? Just take the average of the three? Weight the more recent ones more heavily perhaps, but by how much exactly? The Polling Indicator assumes that public opinion changes every day, but only by so much. If Labour was on 10% last week and turns out to poll 18% today, we might question whether one of these polls (or even both) are outliers, which just by chance contains many more or less Labour voters than there are in the general public. The Polling Indicator assumes that support for a party can go up or down, but that radical changes are quite rare. But if one party is generally more volatile, it will take this into account.

Model

This part is a little tricky and you probably need some statistical training to fully grasp it. The Irish Polling Indicator is based on a Bayesian statistical model, based on the work of several political scientists. It provides an estimate for each party's support on each day d. The percentage that this party gets in poll i is called $$P_i$$ - this is something we know. What we want to know is what this party's support among the whole electorate ($$A_d$$) is on each day.  So how do we estimate this?

First, we know what happens if we draw many random samples from a population. So if we would have a population with 20% support for Fine Gael, and draw a lot of random samples of size 1,000 from this population, most of these samples would yield a percentage for Fine Gael that would be pretty close to 20%. But some would be further away. In fact, we know that the values that we possibly might obtain in all of these samples follows a normal distribution with a mean of $$A_d$$ and a standard deviation of $$\sqrt{\frac{A_d (1-A_d)}{N}}$$. Here N stands for the sample size, 1000 in our example. Since we do not know $$A_d$$ we approximate the standard deviation by using $$P_i$$ instead, so the first part of the model would look like this:

\begin{aligned} P_i & \sim \mathcal{N}(A_d, \sqrt{\frac{P_i (1-P_i)}{N}} ) \\ \end{aligned} The percentage that we find in the poll comes from a normal distribution with a mean of the real party support on the day the poll was held ($$A_d$$) and a standard deviation which mainly depends on the sample size ($$N$$). We don't know $$A_d$$, but are going to estimate it.

The actual model is somewhat more complicated because it takes into account two other things. First, the standard deviation in the formula above (also called the standard error in this case), is only known through the simple formula above if we have a random sample. Real-world polls usually have a more complicated strategy to select a sample, which may increase the standard error. By weighting their respondents (i.e. if you have 75% men in the survey, you might want to weight that down to 50%) error might be reduced. Therefore we allow the standard deviation ($$F_i$$) to be a factor $$D$$ smaller or larger than we would have with a simple random sample.

Secondly, there might be structural differences between pollsters which cause a certain polling company to overestimate or underestimate a certain party. So, they sample from a distribution with mean $$M_d$$, which is, in fact, a combination of the real percentage $$A_d$$ plus their house effect $$H_{b_i}$$. If their house effect is 0, they are polling from the 'correct' distribution and we only have to deal with sampling error. If their house effect is large, they might structurally underestimate or overestimate a party.

This yields the following model (for each party):

\begin{aligned} (1)~~ P_i & \sim \mathcal{N}(M_d, F_iD) \\ (2)~~ M_d & = A_d + H_{b_i} \end{aligned}

The next part of the model relates a party's percentage today ($$A_d$$) to its percentage yesterday ($$A_{d-1}$$). As explained above, we expect that day-to-day change in support is limited. To ensure that party support sums to 100%, these day-to-day changes will be modelled in terms of the log-ratio of support (where the first party will be fixed at a log-ratio of 0). For each day, the support is allow to change somewhat up or down:

\begin{aligned} (3)~~ LA_{d} & \sim\mathcal{\mathcal{N}}(LA_{d-1},\tau_{p}) \end{aligned}

We can calculate the vote share for each party based on these log-ratios as follows:

\begin{aligned} (4)~~ A_{d} & =\frac{exp(LA_{d})}{\sum exp(LA_{i})} \end{aligned}

Priors For the statistical nerds: The Bayesian of the model has the following priors: \begin{aligned} (5)~~ \tau_{p} & \sim Uniform(0,0.2) \\ (6)~~ H_{b} & \sim Uniform(-0.2,0.2) \\ (7)~~ D & \sim Uniform(\sqrt{\frac{1}{3}},\sqrt{3}) \\ \end{aligned} The house effects $$H_{b_i}$$ are constrained to sum to zero over the companies $$b$$ to allow for model identification. The model is estimated in JAGS 3.4. It is usually run with 6 chains, with 30,000 burn-in iterations and 60,000 iterations (150 thinning interval), leaving 2,400 MCMC draws from the posterior distribution. Although the model is slow-mixing, this seems to be adequate and a good balance between speed and accuracy.

Sources

Fisher, S. D., Ford, R., Jennings, W., Pickup, M., & Wlezien, C. (2011). From polls to votes to seats: Forecasting the 2010 British general election. Electoral Studies, 30(2), 250-257.

Jackman, S. (2005). Pooling the polls over an election campaign. Australian Journal of Political Science, 40(4), 499-517.

Pickup, M. A., & Wlezien, C. (2009). On filtering longitudinal public opinion data: Issues in identification and representation of true change. Electoral Studies, 28(3), 354-367.

Pickup, M., & Johnston, R. (2008). Campaign trial heats as election forecasts: Measurement error and bias in 2004 presidential campaign polls. International Journal of Forecasting, 24(2), 272-284.

Pickup M. (2011). 'Methodology' http://pollob.politics.ox.ac.uk/documents/methodology.pdf