Data: Size is not everything



It is called drinking your Kool Aid – wrongly taking new insight as gospel because of an “uncommon” revelation in data.  Data can easily mislead users if certain factors that matter to its integrity are not considered. Imagine polling 1,000 respondents from South-South region asking them who will be the next Nigerian President. The result will easily throw up Goodluck Jonathan. You might also get same result if you mix that group with a sample of people who believe that the woes of his administration are tied to the desperate return of the Northern oligarchy or the President’s kinsmen living in urban areas.  You might get a reverse result if you pitch with Northern commoners or those who feel President Jonathan has been weak as regards being indecisive on corruption issues.

To question the FutureForNG released polls, it will be good if they have an exhaust of metadata, mine it and release it for the public. Who are the respondents that chose Buhari? Twitter users, Facebook users who live in Lagos, Twitter users who are Northerners who live in the Lagos. What were the chances that the South-East folks knew about this? How many came via sms, email or social media? How many of such respondents vote? How many voted from the diaspora and are the respondents a fair representation of the voting population that APC plans to attract? How many respondents voted  using multiple numbers?

The above posers are to rinse data because as the size gets bigger, data points multiply. This leads to the understanding that in every polling, size is not everything. One needs large diversity to get quality feedback from a wide set of respondents that are representative enough for the final actors – the voters.


This is a challenge that we will also contend with in the new era of Big Data. The rigorous approach to peer on the entire data footprint is the new cool. This will lead to overlooking certain connections in the broad way of data unless we are capable of exploring every node at a machine-scale.

I am interested in detailed  and comprehensive data but when it comes to polling, is it really a matter of size? How efficient is the profiling of the respondents because based on the sample question, typical profile of respondents is subject to change.  You might need a grassroots Northern voter, South East elite, Northern Central middle class profile for an election sample.

Ethnicity might be highly weighted based on previous block voting across regions but taking a poll on inequality will involve citizens better classified according to their  income levels. Diversity of respondents is key in polling especially for the elections.

My few thoughts. Just a word of caution.

Picture source:


One thought on “Data: Size is not everything

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s