Earlier this year, Universities and Science Minister David Willetts announced that the UK government was releasing £73 million for research which would harness the economic and social potential of Big Data. This term carries a great deal of buzz in business and academic spheres, although data and their analysis are obviously not new: whether as Census statistics, interview texts, or household surveys, data come in many familiar forms to social scientists. But ‘bigness’ often refers to three characteristics that differentiate it from so-called ‘small’ data: volume, variety, and velocity.
Volume relates to the sheer size of datasets that potentially include millions of records. These datasets can also feature information that is drawn from many different sources (variety) as well as instantaneously aggregated as new data are constantly being created (velocity). The usefulness of such datasets depends on your perspective and purpose. In his widely cited 2008 Wired Magazine article, Chris Anderson rhetorically asked ‘who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves’. But do they? Others argue that data never exist in ‘raw’ forms but rather are influenced by researchers who, whether intentionally or not, select and construct them in certain ways.
So why should social scientists—and particularly those working on international migration—care about the Big Data meme, and debates about data more generally? First, civil society organisations place great value on evidence and data, and increasingly demand both from academics. They also produce a range of visualisations and text about migration issues that provide windows into particular stories or narratives for their audiences. My ongoing research into the uptake, communication, and visualisation of migration data comes from observing how ideas like ‘data-driven research’, ‘data journalism’, and ‘evidence-based policy’ proliferate in media and policy contexts.
The International Organization for Migration publishes Migration Profiles ‘to enhance policy coherence, evidence-based policymaking, and the mainstreaming of migration into development plans’. Meanwhile, the Data Driven Journalism initiative of the European Journalism Centre (EJC) seeks to ‘enabl[e] more journalists to use data-sets as a source for reporting’. In 2013, the EJC with the UN Alliance of Civilizations published a comparative study of how migration was covered in five countries’ newspapers, describing it as ‘an experimental, data-driven research project, analysing migration reporting in the Netherlands, the United States, Canada, France, and Germany’. Finally, media outlets like the Guardian’s Datablog have published results from numerous data-driven journalism projects, including one that compared official UK migration figures to mentions in major newspaper headlines.
Second, debates about ‘Big’ data actually point to broader concerns about the potential of all sizes of data to only selectively reveal social phenomena and generate partial conclusions on which policies are based. If data generation and analysis is not entirely neutral but rather carry assumptions about what is ‘worthwhile’ or ‘acceptable’ to measure in the first place, then it raises questions about the values which underpin preferences for certain types of social and economic research. What kinds of meanings do phrases like ‘data-driven’ and ‘evidence-based’ signify or confer when they are attached to research outputs? Which data qualify as ‘evidence’ in the first place, or, to play on Anderson’s words, which data are allowed to speak for themselves in the realms of policy, media, and advocacy? When data are presented, how do their visual elements impact the subsequent communication of social scientific findings as well as build a particular narrative for the viewer? And perhaps most importantly, to what ends—political, reputational, rhetorical—are these data and research employed by users?
These reasons have implications for academic practice which partly relies upon the stated and real public impacts of research. In an economic and political environment where impact and knowledge exchange with non-academic users are increasingly emphasised as important components of research activity, it is important to reflect upon what kinds of public debates are created and sustained through the use data, evidence, and research predicated on them: more genuinely informed ones, or ones where ‘experts’ defend pre-determined positions using selective and sometimes limited information. More practically, there is a need to address precisely what to do with the rafts of (admittedly sometimes dubious or limited) data available to researchers in the first place. Complexity theorist Emma Uprichard recently warned that, ‘we cannot afford to let big data run away without good social theories about what to do with the masses of data we are producing’. On both fronts, social scientists are well-positioned to make vital, interdisciplinary contributions that can help inform future practice; indeed, hubs like the LSE Impact of Social Sciences Blog provide valuable platforms for discussing such advances.
It remains to be seen whether Big Data approaches develop and mature, or if they will remain nebulous and theoretically thin. However, concepts of evidence and data—both ‘big’ and ‘small’—will continue to play important roles in social research, especially as stakeholders in civil society, media, and policymaking spheres use and commission research. Given the current political economy of research, social scientists should care about this phenomenon because it can prompt critical reflection on the values underpinning their own practice regardless of the ‘size’ of data used.
You can follow Will’s work on media portrayals of international migration, uptake of evidence-based research in civil society, and Big Data visualisations on Twitter @williamlallen and Academia.edu.