At the 2012 Annual Meeting of the Association of American Geographers, I presented during the session ‘Information Geographies: Online Power, Representation and Voice’, which was organised by Mark Graham (Oxford Internet Institute) and Matthew Zook (University of Kentucky). For an early morning session on a Saturday, the session was well attended – and the papers in the session were very interesting.
My presentation, titled ‘Nobody wants to do council estates’ – digital divide, spatial justice and outliers‘, was the result of thinking about the nature of social information that is available on the Web and which I partially articulated in a response to a post on GeoIQ blog. When Mark and Matt asked for an abstract, I provided the following:
The understanding of the world through digital representation (digiplace) and VGI is frequently carried out with the assumption that these are valid, comprehensive and useful representations of the world. A common practice throughout the literature on these issues is to mention the digital divide and, while accepting it as a social phenomenon, either ignore it for the rest of the analysis or expect that it will solve itself over time through technological diffusion. The almost deterministic belief in technological diffusion absolves the analyst from fully confronting the political implication of the divide.
However, what VGI and social media analysis reveals is that the digital divide is part of deep and growing social inequalities in Western societies. Worse still, digiplace amplifies and strengthens them.
In digiplace the wealthy, powerful, educated and mostly male elite is amplified through multiple digital representations. Moreover, the frequent decision of algorithm designers to highlight and emphasise those who submit more media, and the level of ‘digital cacophony’ that more active contributors create, means that a very small minority – arguably outliers in every analysis of normal distribution of human activities – are super empowered. Therefore, digiplace power relationships are arguably more polarised than outside cyberspace due to the lack of social check and balances. This makes the acceptance of the disproportional amount of information that these outliers produce as reality highly questionable.
The following notes might help in making sense of the slides.
Slide 2 takes us back 405 years to Mantua, Italy, where Claudio Monteverdi has just written one of the very first operas – L’Orfeo – as an after-dinner entertainment piece for Duke Vincenzo Gonzaga. Leaving aside the wonderful music – my personal recommendation is for Emmanuelle Haïm’s performance and I used the opening toccata in my presentation – there is a serious point about history. For a large portion of human history, and as recent as 400 years ago, we knew only about the rich and the powerful. We ignored everyone else because they ‘were not important’.
Slide 3 highlights two points about modern statistics. First, that it is a tool to gain an understanding about the nature of society as a whole. Second, when we look at the main body of society, it is within the first 2 standard deviations of a normalised distribution. The Index of Deprivation of the UK (Slide 4) is an example ofthis type of analysis. Even though it was designed to direct resources to the most needy, it analyses the whole population (and, by the way, is normalised).
Slide 5 points out that on the Web, and in social media in particular, the focus is on ‘long tail’ distributions. My main issue is not with the pattern but with what it means in terms of analysing the information. This is where participation inequality (Slide 6) matters and the point of Nielsen’s analysis is that outlets such as Wikipedia (and, as we will see, OpenStreetMap) are suffering from even worse inequality than other communication media. Nielsen’s recent analysis in his newsletter (Slide 7) demonstrates how this is playing out on Facebook (FB). Notice the comment ‘these people have no life‘ or, as Sherry Turkle put it, they got life on the screen…
Slide 8 and 9 demonstrate that participation inequality is strongly represented in OpenStreetMap, and we can expect it to play out in FourSquare, Google Map Maker, Waze and other GeoWeb social applications. Slide 10 focuses on other characteristics of the people that are involved in the contribution of content: men, highly educated, age 20-40. Similar characteristics have been shown in other social media and the GeoWeb by Monica Stephens & Antonella Rondinone, and by many other researchers.
In slides 11-14, observed spatial biases in OpenStreetMap are noted – concentration on highly populated places, gap between rich and poor places (using the Index of Deprivation from Slide 4), and difference between rural and urban areas. These differences were also observed in other sources of Volunteer Geographic Information (VGI) such as photo sharing sites (in Vyron Antoniou’s PhD).
Taken together, participation inequality, demographic bias and spatial bias point to a very skewed group that is producing most of the content that we see on the GeoWeb. Look back at Slide 3, and it is a good guess that this minority falls within 3 standard deviations of the centre. They are outliers – not representative of anything other than of themselves. Of course, given the large number of people online and the ability of outliers to ‘shout’ louder than anyone else, and converse among themselves, it is tempting to look at them as a population worth listening to. But it is, similarly to the opening point, a look at the rich and powerful (or super enthusiastic) and not the mainstream.
Strangely, when such a small group controls the economy, we see it as a political issue (Slide 15, which was produced by Mother Jones as part of the response to the Occupy movement). We should be just as concerned when it happens with digital content and sets the agenda of what we see and how we understand the world.
Now to the implication of this analysis, and the use of the GeoWeb and social media to understand society. Slide 17 provides the link to the GeoIQ post that argued that these outliers are worth listening to. They might be, but the issue is what you are trying to find out by looking at the data:
The first option is to ask questions about the resulting data such as ‘can it be used to update national datasets?’ – accepting the biases in the data collection as they are and explore if there is anything useful that comes out of the outcomes (Slides 19-21, from the work of Vyron Antoniou and Thomas Koukoletsos). This should be fine as long as the researchers don’t try to state something general about the way society works from the data. Even so, researchers ought to analyse and point to biases and shortcomings (Slides 11-14 are doing exactly that).
The second option is to start claiming that we can learn something about social activities (Slides 22-23, from the work of Eric Fischer and Daniel Gayo-Avello, as well as Sean Gorman in the GeoIQ post). In this case, it is wrong to read too much into the data – as Gayo-Avello noted – as the outliers’ bias renders the analysis as not representative of society. Notice, for example, the huge gap between the social media noise during the Egyptian revolution and the outcomes of the elections, or the political differences that Gayo-Avello noted.
The third option is to find data that is representative (Slide 24, from the MIT Senseable City Lab), which looks at the ‘digital breadcrumbs’ that we leave behind on a large scale – phone calls, SMS, travel cards, etc. This data is representative, but provides observations without context. There is no qualitative or contextual information that comes with it and, because of the biases that are noted above, it is wrong to integrate it with the digital cacophony of the outliers. It is most likely to lead to erroneous conclusions.
Therefore, the understanding of the concept of digiplace (Slide 25) – the ordering of digital representation through software algorithms and GeoWeb portals – is, in fact, double filtered. The provision of content by outliers means that the algorithms will tend to amplify their point of view and biases. Not only that, digital inequality, which is happening on top of social and economic inequality, means that more and more of our views of the world are being shaped by this tiny minority.
When we add to the mix aspects of digital inequalities (some people can only afford a pay-as-you-go function phone, while a tiny minority consumes a lot of bandwidth over multiple devices), we should stop talking about the ‘digital divide’ as something that will close over time. This is some sort of imaginary trickle-down theory that is being proven not to withstand the test of reality. If anything, it grows as the ‘haves’ are using multiple devices to shape digiplace in their own image.
This is actually one of the core problems that differentiates to approaches to engagement in data collection. There is the laissez-faire approach to engaging society in collecting information about the world (Slides 27-28 showing OpenStreetMap mapping parties) which does not confront the biases and opposite it, there are participatory approaches (Slides 29-30 showing participatory mapping exercises from the work of Mapping for Change) where the effort is on making the activity inclusive.
This point about the biases, inequality and influence on the way we understand the world is important to repeat – as it is too often ignored by researchers who deal with these data.
10 thoughts on “‘Nobody wants to do council estates’ – digital divide, spatial justice and outliers – AAG 2012”
Good presentation and blog, Muki, thanks.
Of course while it’s true that in the main OpenStreetMap is created through a laissez faire approach, there are individuals and groups who aim to confront those biases, e.g. http://osm.org/go/euuvJAFWt-?layers=Q
It would be really fascinating to produce an overlay analysing data density/quality and IMD, to highlight poorer areas that are particularly well and particularly poorly mapped as a way of encouraging volunteers to do something about it.
For practitioners like Mapping for Change, and for techies who are interested in these biases, the question that follows from your post is: how can we get people who are interested in council estates to use OpenStreetMap? I am assuming that it’s a good objective, of course.
The only likely course seems to me to be that groups who already do related activities and who have an interest in council estates switch to OpenStreetMap for their maps, data and open tools, and fit a little bit of data maintenance into the work they already do. I’m thinking of people like Mapping for Change, the London Orchard Project, Capital Growth, TRAs, etc.
The barrier at the moment seems to be that groups like Mapping for Change still find it easier to use Google tools and map tiles than open tools and OpenStreetMap data, or lack the resources to make a switch if they are now suitable. A TRA member could knock up a Google Map of problems on their estate in an afternoon, whereas they would probably give up after 15 minutes if they arrived on http://www.openstreetmap.org looking for the tools.
Thank you for the thoughtful comment. Of course, the main target of this post is the academic community – who too often ignore the inherent biases and mine the data (see some of the links from the FuturICT project.
I agree that on the practical side it calls for a proactive approach to data collection and visualisation. Indeed, the usability of OpenStreetMap is creating an additional barrier, but that is an issue for another post!
The more I think about it, the more I like the idea of the map showing the biases by relating data to IMD scores. Mappers could identify areas to work on, and researchers looking to use data on relatively small areas could get a quick visual check of bias problems.
It would be great if you could encourage a colleague or student to try their hand at it.
Thank you Muki very interesting post. A quality control over the source of information is always a must for scientists dealing with social phenomena. Being a good programmer or computer scientist, being able to easily handle social media data doesn’t make you a sociologist.
nice post. i agree with the general idea, but but saying that “They are outliers – not representative of anything other than of themselves” is an oversimplified assessment of reality, not least because one won’t get these results http://www.cl.cam.ac.uk/~dq209/publications/quercia12talkcity.pdf
yet, i welcome more research on the demographics
of social media users in uk. knowing user demographics allows
researchers to interpret the results. unfortunately we don’t have uk sociologists who would do what US ones have already started to do a while back
Thanks Daniele, but your research is exactly the type that I’m criticising.
You might get your statistical significance for as long as you want, but considering that a minute percentage of tweets are geotagged, plus the participation inequality that is going on, which is amplified by your geographical sampling, your paper should have at least a page or two of caveats about all the issues that I’m highlighting and then justify why it is still relevant to explore the data. It tells us nothing about the communities the way that a properly conducted social survey that will try to talk with all the population will find.
What you are doing is predicting some sense of qualitative information within the miniscule groups of people that tweets. So you have a self selected group of 576 people. The tweets are not random, the people are not, and it is not a sample that should be treated as ‘objective data’ without plenty of care for the biases.
Sorry to be so blunt, but that is the point that I’m trying to highlight. The moment you are claiming that your results are generalisable (and you do that in your conclusions) you are making plenty of fallacies.
“the moment you are claiming that your results are generalisable (and you do that in your conclusions) you are making plenty of fallacies”
Ouch 🙂 I think that the conclusion of our paper “Limitations” speaks for itself 🙂 Having said that, I am aware of the problem you reported in your post, agreed with it in the paper (which is n-month old), and commented on it more than a month ago on mobblog – in that post, i tried to suggest what we should do after we realize that our data is not representative, which I would have thought to be well known… I might be wrong though …