Thursday marked the launch of The Conservation Volunteers (TCV) report on volunteering impact where they summarised a three year project that explored motivations, changes in pro-environmental behaviour, wellbeing and community resilience. The report is worth a read as it goes beyond the direct impact on the local environment of TCV activities, and demonstrates how involvement in environmental volunteering can have multiple benefits. In a way, it is adding ingredients to a more holistic understanding of ‘green volunteering’.
TCVmotivations One of the interesting aspects of the report is in the longitudinal analysis of volunteers motivation (copied here from the report).  The comparison is from 784 baseline surveys, 202 Second surveys and 73 third surveys, which were done with volunteers while they were involved with the TCV. The second survey was taken after 4 volunteering sessions, and the third after 10 sessions.

The results of the surveys are interesting in the context of online activities (e.g. citizen science or VGI) because they provide an example for an activity that happen off line – in green spaces such as local parks, community gardens and the such. Moreover, the people that are participating in them come from all walks of life, as previous analysis of TCV data demonstrated that they are recruiting volunteers across the socio-economic spectrum. So here is an activity that can be compared to online volunteering. This is valuable, as if the pattern of TCV information are similar, then we can understand online volunteering as part of general volunteering and not assume that technology changes everything.

So the graph above attracted my attention because of the similarities to Nama Budhathoki work on the motivation of OpenStreetMap volunteers. First, there is a difference between the reasons that are influencing the people that join just one session and those that are involved for the longer time. Secondly, social and personal development aspects are becoming more important over time.

There is clear need to continue and explore the data – especially because the numbers that are being surveyed at each period are different, but this is an interesting finding, and there is surly more to explore. Some of it will be explored by Valentine Seymour in ExCiteS who is working with TCV as part of her PhD.

It is also worth listening to the qualitative observations by volunteers, as expressed in the video that open the event, which is provided below.

TCV Volunteer Impacts from The Conservation Volunteers on Vimeo.

Some ideas take long time to mature into a form that you are finally happy to share them. This is an example for such thing.

I got interested in the area of Philosophy of Technology during my PhD studies, and continue to explore it since. During this journey, I found a lot of inspiration and links to Andrew Feenberg’s work, for example, in my paper about neogeography and the delusion of democratisation. The links are mostly due to Feenberg’s attention to ‘hacking’ or appropriating technical systems to functions and activities that they are outside what the designers or producers of them thought.

In addition to Feenberg, I became interested in the work of Albert Borgmann and because he explicitly analysed GIS, dedicating a whole chapter to it in Holding on to RealityIn particular, I was intrigues by his formulation to The Device Paradigm and the notion of Focal Things and Practices which are linked to information systems in Holding on to Reality where three forms of information are presented – Natural Information, Cultural Information and Technological Information. It took me some time to see that these 5 concepts are linked, with technological information being a demonstration of the trouble with the device paradigm, while natural and cultural information being part of focal things and practices (more on these concepts below).

I first used Borgmann’s analysis as part of ‘Conversations Across the Divide‘ session in 2005, which focused on Complexity and Emergence. In a joint contribution with David O’Sullivan about ‘complexity science and Geography: understanding the limits of narratives’, I’ve used Borgmann’s classification of information. Later on, we’ve tried to turn it into a paper, but in the end David wrote a much better analysis of complexity and geography, while the attempt to focus mostly on the information concepts was not fruitful.

The next opportunity to revisit Borgmann came in 2011, for an AAG pre-conference workshop on VGI where I explored the links between The Device Paradigm, Focal Practices and VGI. By 2013, when I was invited to the ‘Thinking and Doing Digital Mapping‘ workshop that was organise by ‘Charting the Digital‘ project. I was able to articulate the link between all the five elements of Borgmann’s approach in my position paper. This week, I was able to come back to the topic in a seminar in the Department of Geography at the University of Leicester. Finally, I feel that I can link them in a coherent way.

So what is it all about?

Within the areas of VGI and Citizen Science, there is a tension between the different goals or the projects and identification of practices in terms of what they mean for the participants – are we using people as ‘platform for sensors’ or are we dealing with fuller engagement? The use of Borgmann’s ideas can help in understanding the difference. He argues that modern technologies tend to adopt the myopic ‘Device Paradigm’ in which specific interpretation of efficiency, productivity and a reductionist view of human actions are taking precedence over ‘Focal Things and Practices’ that bring people together in a way meaningful to human life. In Holding On to Reality (1999), he differentiates three types of information: natural, cultural and technological.  Natural information is defined as information about reality: for example, scientific information on the movement of the earth or the functioning of a cell.  This is information that was created in order to understand the functioning of reality.  Cultural information is information that is being used to shape reality, such as engineering design plans.  Technological information is information as reality and leads to decreased human engagement with fundamental aspects of reality.  Significantly, these categories do not relate to the common usage of the words ‘natural’, ‘cultural and ‘technological’ rather to describe the changing relationship between information and reality at different stages of socio-technical development.

When we explore general geographical information, we can see that some of it is technological information, for example SatNav and the way that communicate to the people who us them, or virtual globes that try to claim to be a representation of reality with ‘current clouds’ and all. The paper map, on the other hand, provide a conduit to the experience of hiking and walking through the landscape, and is part of cultural information.

Things are especially interesting with VGI and Citizen Science. In them, information and practices need to be analysed in a more nuanced way. In some cases, the practices can become focal to the participants – for example in iSpot where the experience of identifying a species in the field is also link to the experiences of the amateurs and experts who discuss the classification. It’s an activity that brings people together. On the other hand, in crowdsourcing projects that grab information from SatNav devices, there is a demonstration of The Device Paradigm, with the potential of reducing of meaningful holiday journey to ‘getting from A to B at the shortest time’. The slides below go through the ideas and then explore the implications on GIS, VGI and Citizen Science.

Now for the next stage – turning this into a paper…

Following the two previous assertions, namely that:

you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’ (original post here)

And

‘All information sources are heterogeneous, but some are more honest about it than others’  (original post here)

The third assertion is about pattern of participation. It is one that I’ve mentioned before and in some way it is a corollary of the two assertions above.

‘When looking at crowdsourced information, always keep participation inequality in mind’ 

Because crowdsourced information, either Volunteered Geographic Information or Citizen Science, is created through a socio-technical process, all too often it is easy to forget the social side – especially when you are looking at the information without the metadata of who collected it and when. So when working with OpenStreetMap data, or viewing the distribution of bird species in eBird (below), even though the data source is expected to be heterogeneous, each observation is treated as similar to other observation and assumed to be produced in a similar way.

Distribution of House Sparrow

Yet, data is not only heterogeneous in terms of consistency and coverage, it is also highly heterogeneous in terms of contribution. One of the most persistence findings from studies of various systems – for example in Wikipedia , OpenStreetMap and even in volunteer computing is that there is a very distinctive heterogeneity in contribution. The phenomena was term Participation Inequality by Jakob Nielsn in 2006 and it is summarised succinctly in the diagram below (from Visual Liberation blog) – very small number of contributors add most of the content, while most of the people that are involved in using the information will not contribute at all. Even when examining only those that actually contribute, in some project over 70% contribute only once, with a tiny minority contributing most of the information.

Participation Inequality Therefore, when looking at sources of information that were created through such process, it is critical to remember the nature of contribution. This has far reaching implications on quality as it is dependent on the expertise of the heavy contributors, on their spatial and temporal engagement, and even on their social interaction and practices (e.g. abrasive behaviour towards other participants).

Because of these factors, it is critical to remember the impact and implications of participation inequality on the analysis of the information. There will be some analysis to which it will have less impact and some where it will have major one. In either cases, it need to be taken into account.

Following the last post, which focused on an assertion about crowdsourced geographic information and citizen science I continue with another observation. As was noted in the previous post, these can be treated as ‘laws’ as they seem to emerge as common patterns from multiple projects in different areas of activity – from citizen science to crowdsourced geographic information. The first assertion was about the relationship between the number of volunteers who can participate in an activity and the amount of time and effort that they are expect to contribute.

This time, I look at one aspect of data quality, which is about consistency and coverage. Here the following assertion applies: 

‘All information sources are heterogeneous, but some are more honest about it than others’

What I mean by that is the on-going argument about authoritative and  crowdsourced  information sources (Flanagin and Metzger 2008 frequently come up in this context), which was also at the root of the Wikipedia vs. Britannica debate, and the mistrust in citizen science observations and the constant questioning if they can do ‘real research’

There are many aspects for these concerns, so the assertion deals with the aspects of comprehensiveness and consistency which are used as a reason to dismiss crowdsourced information when comparing them to authoritative data. However, at a closer look we can see that all these information sources are fundamentally heterogeneous. Despite of all the effort to define precisely standards for data collection in authoritative data, heterogeneity creeps in because of budget and time limitations, decisions about what is worthy to collect and how, and the clash between reality and the specifications. Here are two examples:

Take one of the Ordnance Survey Open Data sources – the map present themselves as consistent and covering the whole country in an orderly way. However, dig in to the details for the mapping, and you discover that the Ordnance Survey uses different standards for mapping urban, rural and remote areas. Yet, the derived products that are generalised and manipulated in various ways, such as Meridian or Vector Map District, do not provide a clear indication which parts originated from which scale – so the heterogeneity of the source disappeared in the final product.

The census is also heterogeneous, and it is a good case of specifications vs. reality. Not everyone fill in the forms and even with the best effort of enumerators it is impossible to collect all the data, and therefore statistical analysis and manipulation of the results are required to produce a well reasoned assessment of the population. This is expected, even though it is not always understood.

Therefore, even the best information sources that we accept as authoritative are heterogeneous, but as I’ve stated, they just not completely honest about it. The ONS doesn’t release the full original set of data before all the manipulations, nor completely disclose all the assumptions that went into reaching the final value. The Ordnance Survey doesn’t tag every line with metadata about the date of collection and scale.

Somewhat counter-intuitively, exactly because crowdsourced information is expected to be inconsistent, we approach it as such and ask questions about its fitness for use. So in that way it is more honest about the inherent heterogeneity.

Importantly, the assertion should not be taken to be dismissive of authoritative sources, or ignoring that the heterogeneity within crowdsources information sources is likely to be much higher than in authoritative ones. Of course all the investment in making things consistent and the effort to get universal coverage is indeed worth it, and it will be foolish and counterproductive to consider that such sources of information can be replaced as is suggest for the census or that it’s not worth investing in the Ordnance Survey to update the authoritative data sets.

Moreover, when commercial interests meet crowdsourced geographic information or citizen science, the ‘honesty’ disappear. For example, even though we know that Google Map Maker is now used in many part

s of the world (see the figure), even in cases when access to vector data is provided by Google, you cannot find out about who contribute, when and where. It is also presented as an authoritative source of information. 

Despite the risk of misinterpretation, the assertion can be useful as a reminder that the differences between authoritative and crowdsourced information are not as big as it may seem.

Looking across the range of crowdsourced geographic information activities, some regular patterns are emerging and it might be useful to start notice them as a way to think about what is possible or not possible to do in this area. Since I don’t like the concept of ‘laws’ – as in Tobler’s first law of geography which is  stated as ‘Everything is related to everything else, but near things are more related than distant things.’ – I would call them assertions. There is also something nice about using the word ‘assertion’ in the context of crowdsourced geographic information, as it echos Mike Goodchild’s differentiation between asserted and authoritative information. So not laws, just assertions or even observations.

The first one, is rephrasing a famous quote:

you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’

So the Christmas Bird Count can have tens of thousands of participants for a short time, while the number of people who operate weather observation stations will be much smaller. Same thing is true for OpenStreetMap – for crisis mapping, which is a short term task, you can get many contributors  but for the regular updating of an area under usual conditions, there will be only few.

The exception for the assertion is the case for passive data collection, where information is collected automatically through the logging of information from a sensor – for example the recording of GPS track to improve navigation information.

OSM Haiyan

During the symposium “The Future of PGIS: Learning from Practice?” which was held at ITC-University of Twente, 26 June 2013, I gave a talk titled ‘Keeping the spirit alive’ – preservations of participatory GIS values in the Geoweb, which explored what was are the important values in participatory GIS and how they translate to the Geoweb, Volunteered Geographic Information and current interests in crowdsourcing. You can watch the talk below.


To see the rest of the presentations during the day, see https://vimeo.com/album/2475389 and details of the event are available here http://www.itc.nl/Pub/Events-Conferences/2013/2013-June/Participatory-GIS-Symposium.html

 

Google Trends 'Citizen Science' (July 2013)

Google Trends ‘Citizen Science’ (July 2013)

The term ‘Citizen Science’ is clearly gaining more recognition and use. It is now get mentioned in radio and television broadcasts, social media channels as well as conferences and workshops. Some of the clearer signs for the growing attention include discussion of citizen science in policy oriented conferences such as UNESCO’s World Summit on Information Society (WSIS+10) review meeting discussion papers (see page ), or the Eye on Earth users conference (see the talks here) or the launch of the European Citizen Science Association in the recent EU Green Week conference.

There are more academic conferences and publications that cover citizen science, a Google Plus community dedicated to citizen science with 1400 members, a clear trend in Google searches and so on.

Another aspect of the expanding world of citizen science is the emerging questions from those who are involved in such projects or study them about the efficacy of the term. As is very common with general terms, some reflections on the accuracy of the term are coming to the fore – so Rick Bonney and colleagues suggest to use ‘Public Participation in Scientific Research‘ (significantly, Bonney was the first to use ‘Citizen Science’ in 1995); Francois Grey coined Citizen Cyberscience to describe projects that are dependent on the Internet; recently Chris Lintott discussed some doubts about the term in the context of Zooniverse;  and Katherine Mathieson asks if Citizen Science is just a passing fad. In our own group, there are also questions about the correct terminology, with Cindy Regalado suggestions to focus on ‘Publicly Initiated Scientific Research (PIScR)‘, and discussion on the meaning of ‘Extreme Citizen Science‘.

Gartner Hype Cycle

One way to  explore what is going on is to consider the evolution of the ‘hype’ around citizen science throughGartner’s Hype Cycle‘  which can be seen as a way to consider the way technologies are being adopted in a world of  rapid communication and inflated expectations from technologies. leaving aside Gartner own hype, the story that the model is trying to tell is that once a new approach (technology) emerges because it is possible or someone reconfigured existing elements and claim that it’s a new thing (e.g. Web 2.0), it will go through a rapid growth in terms of attention and publicity. This will go on until it reaches the ‘peak of inflated expectations’ where the expectations from the technology are unrealistic (e.g. that it will revolutionize the way we use our fridges). This must follow by a slump, as more and more failures come to light and the promises are not fulfilled. At this stage, the disillusionment is so deep that even the useful aspects of the technology are forgotten. However, if it passes this stage, then after the realisation of  what is possible, the technology is integrated into everyday life and practices and being used productively.

So does the hype cycle apply to citizen science?

If we look at Gartner cycle from last September, Crowdsourcing is near the ‘peak of inflated expectations’ and some descriptions of citizen science as scientific crowdsourcing clearly match the same mindset.

Gartner 2012 Hype Cycle

There is a growing evidence of academic researchers entering citizen science out of opportunism, without paying attention to the commitment and work that is require to carry out such projects. With some, it seems like that they decided that they can also join in because someone around know how to make an app for smartphones or a website that will work like Galaxy Zoo (failing to notice the need all the social aspects that Arfon Smith highlights in his talks). When you look around at the emerging projects, you can start guessing which projects will succeed or fail by looking at the expertise and approach that the people behind it take.

Another cause of concern are the expectations that I noticed in the more policy oriented events about the ability of citizen science to solve all sort of issues – from raising awareness to behaviour change with limited professional involvement, or that it will reduce the resources that are needed for activities such as environmental monitoring, but without an understanding that significant sustained investment is required – community coordinator, technical support and other aspects are needed here just as much. This concern is heightened by  statements that promote citizen science as a mechanism to reduce the costs of research, creating a source of free labour etc.

On the other hand, it can be argued that the hype cycle doesn’t apply to citizen science because of history.  Citizen science existed for many years, as Caren Cooper describe in her blog posts. Therefore, conceptualising it as a new technology is wrong as there are already mechanisms, practices and institutions to support it.

In addition, and unlike the technologies that are on Gartner chart, academic projects within which citizen science happen benefit from access to what is sometime termed patient capital without expectations for quick returns on investment. Even with the increasing expectations of research funding bodies for explanations on how the research will lead to an impact on wider society, they have no expectations that the impact will be immediate (5-10 years is usually fine) and funding come in chunks that cover 3-5 years, which provides the breathing space to overcome the ‘through of disillusionment’  that is likely to happen within the technology sector regarding crowdsourcing.

And yet, I would guess that citizen science will suffer some examples of disillusionment from badly designed and executed projects – to get these projects right you need to have a combination of domain knowledge in the specific scientific discipline, science communication to tell the story in an accessible way, technical ability to build mobile and web infrastructure, understanding of user interaction and user experience to to build an engaging interfaces, community management ability to nurture and develop your communities and we can add further skills to the list (e.g. if you want gamification elements, you need experts in games and not to do it amateurishly). In short, it need to be taken seriously, with careful considerations and design. This is not a call for gatekeepers , more a realisation that the successful projects and groups are stating similar things.

Which bring us back to the issue of the definition of citizen science and terminology. I have been following terminology arguments in my own discipline for over 20 years. I have seen people arguing about a data storage format for GIS and should it be raster or vector (answer: it doesn’t matter). Or arguing if GIS is tool or science. Or unhappy with Geographic Information Science and resolutely calling it geoinformation, geoinformatics etc. Even in the minute sub-discipline that deals with participation and computerised maps that are arguments about Public Participation GIS (PPGIS) or Participatory GIS (PGIS). Most recently, we are debating the right term for mass-contribution of geographic information as volunteered geographic information (VGI), Crowdsourced geographic information or user-generated geographic information.

It’s not that terminology and precision in definition is not useful, on the contrary. However, I’ve noticed that in most cases the more inclusive and, importantly, vague and broad church definition won the day. Broad terminologies, especially when they are evocative (such as citizen science), are especially powerful. They convey a good message and are therefore useful. As long as we don’t try to force a canonical definition and allow people to decide what they include in the term and express clearly why what they are doing  is falling within citizen science, it should be fine. Some broad principles are useful and will help all those that are committed to working in this area to sail through the hype cycle safely.

The Consumers’ Association Which? magazine  is probably not the first place to turn to when you look for usability studies. Especially not if you’re interested in computer technology – for that, there are sources such as PC Magazine on the consumer side, and professional magazines such as Interactions from Association for Computing Machinery (ACM) Special Interest Group on Computer-Human Interaction (SIGCHI).

And yet…

Over the past few years, Which? is reviewing, testing and recommending Satnavs (also known Personal Navigation Devices – PNDs). Which? is an interesting case because it reaches over 600,000 households and because of the level of trust that it enjoys. If you look at their methodology for testing satnavs , you’ll find that it does resemble usability testing – click on the image to see the video from Which? about their methodology. The methodology is more about everyday use and the opinion of the assessors seems to play an important role.

Link to Which Satnav video

Professionals in geographical information science or human-computer interaction might dismiss the study as unrepresentative, or not fitting their ways of evaluating technologies, but we need to remember that Which? is providing an insight into the experience of the people who are outside our usual professional and social context – people who go to a high street shop or download an app and start using it straightaway. Therefore, it’s worth understanding how they review the different systems and what the experience is like when you try to think like a consumer, with limited technical knowledge and understanding of maps.

There are also aspects that puncture the ‘filter bubble‘ of geoweb people – Google Maps are now probably the most used maps on the web, but the satnav application using Google Maps was described as ‘bad, useful for getting around on foot, but traffic information and audio instructions are limited and there’s no speed limit or speed camera data‘. Waze, the crowdsourced application received especially low marks and the magazine noted that it ‘lets users share traffic and road info, but we found its routes and maps are inaccurate and audio is poor‘ (both citations from Which? Nov 2012, p. 38). It is also worth reading their description of OpenStreetMap when discussing map updates, and also the opinions on the willingness to pay for map updates.

There are many ways to receive information about the usability and the nature of interaction with geographical technologies, and some of them, while not traditional, can provide useful insights.

The Spatial Data Infrastructure Magazine (SDIMag.com) is a relatively new e-zine dedicated to the development of spatial  data infrastructures around the world. Roger Longhorn, the editor of the magazine, conducted an email interview with me, which is now published.

In the interview, we are covering the problematic terminology used to describe a wider range of activities; the need to consider social and technical aspects as well as goals of the participants; and, of course, the role of the information that is produced through crowdsourcing, citizen science, VGI with spatial data infrastructures.

The full interview can be found here.

 

At the 2012 Annual Meeting of the Association of American Geographers, I presented during the session Information Geographies: Online Power, Representation and Voice’, which was organised by Mark Graham (Oxford Internet Institute) and Matthew Zook (University of Kentucky). For an early morning session on a Saturday, the session was well attended – and the papers in the session were very interesting.

My presentation, titled ‘Nobody wants to do council estates’ – digital divide, spatial justice and outliers‘, was the result of thinking about the nature of social information that is available on the Web and which I partially articulated in a response to a post on GeoIQ blog. When Mark and Matt asked for an abstract, I provided the following:

The understanding of the world through digital representation (digiplace) and VGI is frequently carried out with the assumption that these are valid, comprehensive and useful representations of the world. A common practice throughout the literature on these issues is to mention the digital divide and, while accepting it as a social phenomenon, either ignore it for the rest of the analysis or expect that it will solve itself over time through technological diffusion. The almost deterministic belief in technological diffusion absolves the analyst from fully confronting the political implication of the divide.

However, what VGI and social media analysis reveals is that the digital divide is part of deep and growing social inequalities in Western societies. Worse still, digiplace amplifies and strengthens them.

In digiplace the wealthy, powerful, educated and mostly male elite is amplified through multiple digital representations. Moreover, the frequent decision of algorithm designers to highlight and emphasise those who submit more media, and the level of ‘digital cacophony’ that more active contributors create, means that a very small minority – arguably outliers in every analysis of normal distribution of human activities – are super empowered. Therefore, digiplace power relationships are arguably more polarised than outside cyberspace due to the lack of social check and balances. This makes the acceptance of the disproportional amount of information that these outliers produce as reality highly questionable.

The following notes might help in making sense of the slides.

Slide 2 takes us back 405 years to Mantua, Italy, where Claudio Monteverdi has just written one of the very first operas – L’Orfeo – as an after-dinner entertainment piece for Duke Vincenzo Gonzaga. Leaving aside the wonderful music – my personal recommendation is for Emmanuelle Haïm’s performance and I used the opening toccata in my presentation – there is a serious point about history. For a large portion of human history, and as recent as 400 years ago, we knew only about the rich and the powerful. We ignored everyone else because they ‘were not important’.

Slide 3 highlights two points about modern statistics. First, that it is a tool to gain an understanding about the nature of society as a whole. Second, when we look at the main body of society, it is within the first 2 standard deviations of a normalised distribution. The Index of Deprivation of the UK (Slide 4) is an example ofthis type of analysis. Even though it was designed to direct resources to the most needy, it analyses the whole population (and, by the way, is normalised).

Slide 5 points out that on the Web, and in social media in particular, the focus is on ‘long tail’ distributions. My main issue is not with the pattern but with what it means in terms of analysing the information. This is where participation inequality (Slide 6) matters and the point of Nielsen’s analysis is that outlets such as Wikipedia (and, as we will see, OpenStreetMap) are suffering from even worse inequality than other communication media. Nielsen’s recent analysis in his newsletter (Slide 7) demonstrates how this is playing out on Facebook (FB). Notice the comment ‘these people have no life‘ or, as Sherry Turkle put it, they got life on the screen

Slide 8 and 9 demonstrate that participation inequality is strongly represented in OpenStreetMap, and we can expect it to play out in FourSquare, Google Map Maker, Waze and other GeoWeb social applications. Slide 10 focuses on other characteristics of the people that are involved in the contribution of content: men, highly educated, age 20-40. Similar characteristics have been shown in other social media and the GeoWeb by Monica Stephens & Antonella Rondinone, and by many other researchers.

In slides 11-14, observed spatial biases in OpenStreetMap are noted – concentration on highly populated places, gap between rich and poor places (using the Index of Deprivation from Slide 4), and difference between rural and urban areas. These differences were also observed in other sources of Volunteer Geographic Information (VGI) such as photo sharing sites (in Vyron Antoniou’s PhD).

Taken together, participation inequality, demographic bias and spatial bias point to a very skewed group that is producing most of the content that we see on the GeoWeb. Look back at Slide 3, and it is a good guess that this minority falls within 3 standard deviations of the centre. They are outliers – not representative of anything other than of themselves. Of course, given the large number of people online and the ability of outliers to ‘shout’ louder than anyone else, and converse among themselves, it is tempting to look at them as a population worth listening to. But it is, similarly to the opening point, a look at the rich and powerful (or super enthusiastic) and not the mainstream.

Strangely, when such a small group controls the economy, we see it as a political issue (Slide 15, which was produced by Mother Jones as part of the response to the Occupy movement). We should be just as concerned when it happens with digital content and sets the agenda of what we see and how we understand the world.

Now to the implication of this analysis, and the use of the GeoWeb and social media to understand society. Slide 17 provides the link to the GeoIQ post that argued that these outliers are worth listening to. They might be, but the issue is what you are trying to find out by looking at the data:

The first option is to ask questions about the resulting data such as ‘can it be used to update national datasets?’ – accepting the biases in the data collection as they are and explore if there is anything useful that comes out of the outcomes (Slides 19-21, from the work of Vyron Antoniou and Thomas Koukoletsos). This should be fine as long as the researchers don’t try to state something general about the way society works from the data. Even so, researchers ought to analyse and point to biases and shortcomings (Slides 11-14 are doing exactly that).

The second option is to start claiming that we can learn something about social activities (Slides 22-23, from the work of Eric Fischer and Daniel Gayo-Avello, as well as Sean Gorman in the GeoIQ post). In this case, it is wrong to read too much into the dataas Gayo-Avello noted – as the outliers’ bias renders the analysis as not representative of society. Notice, for example, the huge gap between the social media noise during the Egyptian revolution and the outcomes of the elections, or the political differences that Gayo-Avello noted.

The third option is to find data that is representative (Slide 24, from the MIT Senseable City Lab), which looks at the ‘digital breadcrumbs’ that we leave behind on a large scale – phone calls, SMS, travel cards, etc. This data is representative, but provides observations without context. There is no qualitative or contextual information that comes with it and, because of the biases that are noted above, it is wrong to integrate it with the digital cacophony of the outliers. It is most likely to lead to erroneous conclusions.

Therefore, the understanding of the concept of digiplace (Slide 25) – the ordering of digital representation through software algorithms and GeoWeb portals – is, in fact, double filtered. The provision of content by outliers means that the algorithms will tend to amplify their point of view and biases.  Not only that, digital inequality, which is happening on top of social and economic inequality, means that more and more of our views of the world are being shaped by this tiny minority.

When we add to the mix aspects of digital inequalities (some people can only afford a pay-as-you-go function phone, while a tiny minority consumes a lot of bandwidth over multiple devices), we should stop talking about the ‘digital divide’ as something that will close over time. This is some sort of imaginary trickle-down  theory that is being proven not to withstand the test of reality. If anything, it grows as the ‘haves’ are using multiple devices to shape digiplace in their own image.

This is actually one of the core problems that differentiates to approaches to engagement in data collection. There is the laissez-faire approach to engaging society in collecting information about the world (Slides 27-28 showing OpenStreetMap mapping parties) which does not confront the biases and opposite it, there are participatory approaches (Slides 29-30 showing participatory mapping exercises from the work of Mapping for Change) where the effort is on making the activity inclusive.

This point about the biases, inequality and influence on the way we understand the world is important to repeat – as it is too often ignored by researchers who deal with these data.

Follow

Get every new post delivered to your Inbox.

Join 2,082 other followers