An interesting case study in using collated data to predict social trends hit the interwebs recently.
Showcasing the most popular hits of the past year, the annual Triple J Hottest 100 countdown is one of the most significant popular music charts in the country. Listeners submit their votes, and the list is tabulated from compiling those votes. This year, a group of four individuals utilised social media to compile their own prediction of what tracks would make the top 100. The details about how these individuals went about collecting their data is described in this article, but in a nutshell, this is what the ‘project’ entailed:
1 – Every user is able to share their votes via the Triple J website itself, on a unique page. This means that the root URL of every voting page is the same. In other words, the only thing that distinguishes person A’s vote from person B is the ‘random number’ at the end of the URL.
2 – For instance, your voting site might look like this:
and mine might look like this:
4 – So, because these root URL’s are similar, and because all this information is freely available, votes can actually be collated!
5- The Warmest 100 was thus collated (for more details about the process look it up here) with a sample size of about 35 thousand votes from roughly 3600 unique voters, giving them a sample size of a mere 2.7 percent of the voting total.
6- The result: They accurately predicted the top 3 songs on the list, and several others in the top 20, give or take some errors and discrepancies here and there. Here is a spreadsheet detailing the differences between the predictions and the actual results.
For a small sample size, the results obtained are impressive. So much so that the organisers of the countdown have decided to change the voting system for next year’s countdown.
Lev Manovich addresses some issues and concerns with the implications of ‘Big Data’ in the age of social media proliferation and increasing digital presence. One of the concerns he raises is the authenticity of information shared over social networks. In light of this, the Warmest 100 countdown provides one with an interesting example. Unlike other forms of digital information such as photos shared on Flickr and Facebook wall posts, there is no doubting the authenticity of users’ countdown votes. In fact, users are even encouraged to share their votes via social media – Facebook, Twitter and Pinterest plugins are standard fare. In retrospect, allowing this information to be freely available has, in the case of the Warmest 100, given users and voters themselves the ability to take the collating process into their own hands, eliminating (to a certain extent) the countdown’s element of mystery!
The goal, the objective of the Warmest 100 is clear cut – predicting the results of a countdown. When applying the collation of big data in humanities based projects however, ‘objectives’ and ‘goals’ might not be so clear cut. With so much information on users available freely over the internet, the example of the Warmest 100 tells us just how powerful statistics can be in extrapolating information about social trends. If we apply this to humanities based work, the emphasis comes back to that of interpretation. One might have all this data available, but so what? What can one do with data that is available? What can the data tell us?
And I suppose that’s where humanities researchers step in. Interpreting and analysing big data. Which brings me back to the roots of my doctoral research – where my first foray into digital humanities research stemmed from a quantitative analysis of Shakespeare’s 154 sonnets.
Matt Shea, “The Inside Story of How Four Techs Broke Open Triple J’s Hottest 100“, The Vine, Jan 2013
Lev Manovich, “Trending: The Promises and Challenges of Big Social Data”, Debates in the Digital Humanities, edited by Matthew K Gold, University of Minnesota Press, 2012