Looking across the range of crowdsourced geographic information activities, some regular patterns are emerging and it might be useful to start notice them as a way to think about what is possible or not possible to do in this area. Since I don’t like the concept of ‘laws’ – as in Tobler’s first law of geography which is  stated as ‘Everything is related to everything else, but near things are more related than distant things.’ – I would call them assertions. There is also something nice about using the word ‘assertion’ in the context of crowdsourced geographic information, as it echos Mike Goodchild’s differentiation between asserted and authoritative information. So not laws, just assertions or even observations.

The first one, is rephrasing a famous quote:

you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’

So the Christmas Bird Count can have tens of thousands of participants for a short time, while the number of people who operate weather observation stations will be much smaller. Same thing is true for OpenStreetMap – for crisis mapping, which is a short term task, you can get many contributors  but for the regular updating of an area under usual conditions, there will be only few.

The exception for the assertion is the case for passive data collection, where information is collected automatically through the logging of information from a sensor – for example the recording of GPS track to improve navigation information.

OSM Haiyan

There is something in the physical presence of book that is pleasurable. Receiving the copy of Introducing Human Geographies was special, as I have contributed a chapter about Geographic Information Systems to the ‘cartographies’ section.

It might be a response to Ron Johnston critique of Human Geography textbooks or a decision by the editors to extend the content of the book, but the book now contains three chapters that deal with maps and GIS. The contributions are the ‘Power of maps’ by Jeremy Crampton, a chapter about ‘Geographical information systems’ by me, and ‘Counter geographies’ by Wen Lin. To some extent, we’ve coordinated the writing, as this is a textbook for undergraduates in geography and we wanted to have a coherent message.

Overall, you’ll notice a lot of references to participatory and collaborative mapping, with OpenStreetMap and PPGIS.net mentioned several times.

In my chapter I have covered both the quantitative/spatial science face of GIS, as well as the critical/participatory one. As the introduction to the section describes:

“Chapter 14 focuses on the place of Geographical Information Systems (GIS) within contemporary mapping. A GIS involves the representation of geographies in digital computers. … GIS is now a widespread and varied form of mapping, both within the academy and beyond. In the chapter, he speaks to that variety by considering the use of GIS both within practices such as location planning, where it is underpinned by the intellectual paradigm of spatial science and quantitative data, and within emergent fields of ‘critical’ and ‘qualitative GIS’, where GIS could be focused on representing the experiences of marginalized groups of people, for example. Generally, Muki argues against the equation of GIS with only one sort of Human Geography, showing how it can be used as a technology within various kinds of research. More specifically, his account shows how current work is pursuing those options through careful consideration of both the wider issues of power and representation present in mapping and the detailed, technical and scientific challenges within GIS development.”

To preview the chapter on Google Book, use this link . I hope that it will be useful introduction to GIS to Geography students.

 

The Consumers’ Association Which? magazine  is probably not the first place to turn to when you look for usability studies. Especially not if you’re interested in computer technology – for that, there are sources such as PC Magazine on the consumer side, and professional magazines such as Interactions from Association for Computing Machinery (ACM) Special Interest Group on Computer-Human Interaction (SIGCHI).

And yet…

Over the past few years, Which? is reviewing, testing and recommending Satnavs (also known Personal Navigation Devices – PNDs). Which? is an interesting case because it reaches over 600,000 households and because of the level of trust that it enjoys. If you look at their methodology for testing satnavs , you’ll find that it does resemble usability testing – click on the image to see the video from Which? about their methodology. The methodology is more about everyday use and the opinion of the assessors seems to play an important role.

Link to Which Satnav video

Professionals in geographical information science or human-computer interaction might dismiss the study as unrepresentative, or not fitting their ways of evaluating technologies, but we need to remember that Which? is providing an insight into the experience of the people who are outside our usual professional and social context – people who go to a high street shop or download an app and start using it straightaway. Therefore, it’s worth understanding how they review the different systems and what the experience is like when you try to think like a consumer, with limited technical knowledge and understanding of maps.

There are also aspects that puncture the ‘filter bubble‘ of geoweb people – Google Maps are now probably the most used maps on the web, but the satnav application using Google Maps was described as ‘bad, useful for getting around on foot, but traffic information and audio instructions are limited and there’s no speed limit or speed camera data‘. Waze, the crowdsourced application received especially low marks and the magazine noted that it ‘lets users share traffic and road info, but we found its routes and maps are inaccurate and audio is poor‘ (both citations from Which? Nov 2012, p. 38). It is also worth reading their description of OpenStreetMap when discussing map updates, and also the opinions on the willingness to pay for map updates.

There are many ways to receive information about the usability and the nature of interaction with geographical technologies, and some of them, while not traditional, can provide useful insights.

At the 2012 Annual Meeting of the Association of American Geographers, I presented during the session Information Geographies: Online Power, Representation and Voice’, which was organised by Mark Graham (Oxford Internet Institute) and Matthew Zook (University of Kentucky). For an early morning session on a Saturday, the session was well attended – and the papers in the session were very interesting.

My presentation, titled ‘Nobody wants to do council estates’ – digital divide, spatial justice and outliers‘, was the result of thinking about the nature of social information that is available on the Web and which I partially articulated in a response to a post on GeoIQ blog. When Mark and Matt asked for an abstract, I provided the following:

The understanding of the world through digital representation (digiplace) and VGI is frequently carried out with the assumption that these are valid, comprehensive and useful representations of the world. A common practice throughout the literature on these issues is to mention the digital divide and, while accepting it as a social phenomenon, either ignore it for the rest of the analysis or expect that it will solve itself over time through technological diffusion. The almost deterministic belief in technological diffusion absolves the analyst from fully confronting the political implication of the divide.

However, what VGI and social media analysis reveals is that the digital divide is part of deep and growing social inequalities in Western societies. Worse still, digiplace amplifies and strengthens them.

In digiplace the wealthy, powerful, educated and mostly male elite is amplified through multiple digital representations. Moreover, the frequent decision of algorithm designers to highlight and emphasise those who submit more media, and the level of ‘digital cacophony’ that more active contributors create, means that a very small minority – arguably outliers in every analysis of normal distribution of human activities – are super empowered. Therefore, digiplace power relationships are arguably more polarised than outside cyberspace due to the lack of social check and balances. This makes the acceptance of the disproportional amount of information that these outliers produce as reality highly questionable.

The following notes might help in making sense of the slides.

Slide 2 takes us back 405 years to Mantua, Italy, where Claudio Monteverdi has just written one of the very first operas – L’Orfeo – as an after-dinner entertainment piece for Duke Vincenzo Gonzaga. Leaving aside the wonderful music – my personal recommendation is for Emmanuelle Haïm’s performance and I used the opening toccata in my presentation – there is a serious point about history. For a large portion of human history, and as recent as 400 years ago, we knew only about the rich and the powerful. We ignored everyone else because they ‘were not important’.

Slide 3 highlights two points about modern statistics. First, that it is a tool to gain an understanding about the nature of society as a whole. Second, when we look at the main body of society, it is within the first 2 standard deviations of a normalised distribution. The Index of Deprivation of the UK (Slide 4) is an example ofthis type of analysis. Even though it was designed to direct resources to the most needy, it analyses the whole population (and, by the way, is normalised).

Slide 5 points out that on the Web, and in social media in particular, the focus is on ‘long tail’ distributions. My main issue is not with the pattern but with what it means in terms of analysing the information. This is where participation inequality (Slide 6) matters and the point of Nielsen’s analysis is that outlets such as Wikipedia (and, as we will see, OpenStreetMap) are suffering from even worse inequality than other communication media. Nielsen’s recent analysis in his newsletter (Slide 7) demonstrates how this is playing out on Facebook (FB). Notice the comment ‘these people have no life‘ or, as Sherry Turkle put it, they got life on the screen

Slide 8 and 9 demonstrate that participation inequality is strongly represented in OpenStreetMap, and we can expect it to play out in FourSquare, Google Map Maker, Waze and other GeoWeb social applications. Slide 10 focuses on other characteristics of the people that are involved in the contribution of content: men, highly educated, age 20-40. Similar characteristics have been shown in other social media and the GeoWeb by Monica Stephens & Antonella Rondinone, and by many other researchers.

In slides 11-14, observed spatial biases in OpenStreetMap are noted – concentration on highly populated places, gap between rich and poor places (using the Index of Deprivation from Slide 4), and difference between rural and urban areas. These differences were also observed in other sources of Volunteer Geographic Information (VGI) such as photo sharing sites (in Vyron Antoniou’s PhD).

Taken together, participation inequality, demographic bias and spatial bias point to a very skewed group that is producing most of the content that we see on the GeoWeb. Look back at Slide 3, and it is a good guess that this minority falls within 3 standard deviations of the centre. They are outliers – not representative of anything other than of themselves. Of course, given the large number of people online and the ability of outliers to ‘shout’ louder than anyone else, and converse among themselves, it is tempting to look at them as a population worth listening to. But it is, similarly to the opening point, a look at the rich and powerful (or super enthusiastic) and not the mainstream.

Strangely, when such a small group controls the economy, we see it as a political issue (Slide 15, which was produced by Mother Jones as part of the response to the Occupy movement). We should be just as concerned when it happens with digital content and sets the agenda of what we see and how we understand the world.

Now to the implication of this analysis, and the use of the GeoWeb and social media to understand society. Slide 17 provides the link to the GeoIQ post that argued that these outliers are worth listening to. They might be, but the issue is what you are trying to find out by looking at the data:

The first option is to ask questions about the resulting data such as ‘can it be used to update national datasets?’ – accepting the biases in the data collection as they are and explore if there is anything useful that comes out of the outcomes (Slides 19-21, from the work of Vyron Antoniou and Thomas Koukoletsos). This should be fine as long as the researchers don’t try to state something general about the way society works from the data. Even so, researchers ought to analyse and point to biases and shortcomings (Slides 11-14 are doing exactly that).

The second option is to start claiming that we can learn something about social activities (Slides 22-23, from the work of Eric Fischer and Daniel Gayo-Avello, as well as Sean Gorman in the GeoIQ post). In this case, it is wrong to read too much into the dataas Gayo-Avello noted – as the outliers’ bias renders the analysis as not representative of society. Notice, for example, the huge gap between the social media noise during the Egyptian revolution and the outcomes of the elections, or the political differences that Gayo-Avello noted.

The third option is to find data that is representative (Slide 24, from the MIT Senseable City Lab), which looks at the ‘digital breadcrumbs’ that we leave behind on a large scale – phone calls, SMS, travel cards, etc. This data is representative, but provides observations without context. There is no qualitative or contextual information that comes with it and, because of the biases that are noted above, it is wrong to integrate it with the digital cacophony of the outliers. It is most likely to lead to erroneous conclusions.

Therefore, the understanding of the concept of digiplace (Slide 25) – the ordering of digital representation through software algorithms and GeoWeb portals – is, in fact, double filtered. The provision of content by outliers means that the algorithms will tend to amplify their point of view and biases.  Not only that, digital inequality, which is happening on top of social and economic inequality, means that more and more of our views of the world are being shaped by this tiny minority.

When we add to the mix aspects of digital inequalities (some people can only afford a pay-as-you-go function phone, while a tiny minority consumes a lot of bandwidth over multiple devices), we should stop talking about the ‘digital divide’ as something that will close over time. This is some sort of imaginary trickle-down  theory that is being proven not to withstand the test of reality. If anything, it grows as the ‘haves’ are using multiple devices to shape digiplace in their own image.

This is actually one of the core problems that differentiates to approaches to engagement in data collection. There is the laissez-faire approach to engaging society in collecting information about the world (Slides 27-28 showing OpenStreetMap mapping parties) which does not confront the biases and opposite it, there are participatory approaches (Slides 29-30 showing participatory mapping exercises from the work of Mapping for Change) where the effort is on making the activity inclusive.

This point about the biases, inequality and influence on the way we understand the world is important to repeat – as it is too often ignored by researchers who deal with these data.

As part of the Volunteered Geographic Information (VGI) workshop that was held in Seattle in April 2011, Daniel Sui, Sarah Elwood and Mike Goodchild announced that they will be editing a volume dedicated to the topic, published as ‘Crowdsourcing Geographic Knowledge‘  (Here is a link to the Chapter in Crowdsourcing Geographic Knowledge)

My contribution to this volume focuses on citizen science, and shows the links between it and VGI. The chapter is currently under review, but the following excerpt discusses different types of citizen science activities, and I would welcome comments:

“While the aim here is not to provide a precise definition of citizen science. Yet, a definition and clarification of what the core characteristics of citizen science are is unavoidable. Therefore, it is defined as scientific activities in which non-professional scientists volunteer to participate in data collection, analysis and dissemination of a scientific project (Cohn 2008; Silvertown 2009). People who participate in a scientific study without playing some part in the study itself – for example, volunteering in a medical trial or participating in a social science survey – are not included in this definition.

While it is easy to identify a citizen science project when the aim of the project is the collection of scientific information, as in the recording of the distribution of plant species, there are cases where the definition is less clear-cut. For example, the process of data collection in OpenStreetMap or Google Map Maker is mostly focused on recording verifiable facts about the world that can be observed on the ground. The tools that OpenStreetMap mappers use – such as remotely sensed images, GPS receivers and map editing software – can all be considered scientific tools. With their attempt to locate observed objects and record them on a map accurately, they follow the footsteps of surveyors such as Robert Hooke, who also carried out an extensive survey of London using scientific methods – although, unlike OpenStreetMap volunteers, he was paid for his effort. Finally, cases where facts are collected in a participatory mapping activity, such as the one that Ghose (2001) describes, should probably be considered a citizen science only if the participants decided to frame it as such. For the purpose of the discussion here, such a broad definition is more useful than a limiting one that tries to reject certain activities.

Notice also that, by definition, citizen science can only exist in a world in which science is socially constructed as the preserve of professional scientists in academic institutions and industry, because, otherwise, any person who is involved in a scientific project would simply be considered a contributor and potentially a scientist. As Silvertown (2009) noted, until the late 19th century, science was mainly developed by people who had additional sources of employment that allowed them to spend time on data collection and analysis. Famously, Charles Darwin joined the Beagle voyage, not as a professional naturalist but as a companion to Captain FitzRoy. Thus, in that era, almost all science was citizen science albeit mostly by affluent gentlemen scientists and gentlewomen. While the first professional scientist is likely to be Robert Hooke, who was paid to work on scientific studies in the 17th century, the major growth in the professionalisation of scientists was mostly in the latter part of the 19th and throughout the 20th centuries.

Even with the rise of the professional scientist, the role of volunteers has not disappeared, especially in areas such as archaeology, where it is common for enthusiasts to join excavations, or in natural science and ecology, where they collect and send samples and observations to national repositories. These activities include the Christmas Bird Watch that has been ongoing since 1900 and the British Trust for Ornithology Survey, which has collected over 31 million records since its establishment in 1932 (Silvertown 2009). Astronomy is another area where amateurs and volunteers have been on par with professionals when observation of the night sky and the identification of galaxies, comets and asteroids are considered (BBC 2006). Finally, meteorological observations have also relied on volunteers since the early start of systematic measurements of temperature, precipitation or extreme weather events (WMO 2001).

This type of citizen science provides the first type of ‘classic’ citizen science – the ‘persistence’ parts of science where the resources, geographical spread and the nature of the problem mean that volunteers sometimes predate the professionalisation and mechanisation of science. These research areas usually require a large but sparse network of observers who carry out their work as part of a hobby or leisure activity. This type of citizen science has flourished in specific enclaves of scientific practice, and the progressive development of modern communication tools has made the process of collating the results from the participants easier and cheaper, while inherently keeping many of the characteristics of data collection processes close to their origins.

A second set of citizen science activities is environmental management and, even more specifically, within the context of environmental justice campaigns. Modern environmental management includes strong technocratic and science oriented management practices (Bryant & Wilson 1998; Scott & Barnett 2009) and environmental decision making is heavily based on scientific environmental information. As a result, when an environmental conflict emerges – such as a community protest over a local noisy factory or planned expansion of an airport – the valid evidence needs to be based on scientific data collection. This aspect of environmental justice struggle is encouraging communities to carry out ‘community science’ in which scientific measurements and analysis are carried out by members of local communities so they can develop an evidence base and set out action plans to deal with problems in their area. A successful example of such an approach is the ‘Global Community Monitor’ method to allow communities to deal with air pollution issues (Scott & Barnett 2009). This is performed through a simple method of sampling air using plastic buckets followed by analysis in an air pollution laboratory, and, finally, the community being provided with instructions on how to understand the results. This activity is termed ‘Bucket Brigade’ and was used across the world in environmental justice campaigns. In London, community science was used to collect noise readings in two communities that are impacted by airport and industrial activities. The outputs were effective in bringing environmental problems to the policy arena (Haklay, Francis & Whitaker 2008). As in ‘classic’ citizen science, the growth in electronic communication has enabled communities to identify potential methods – e.g. through the ‘Global Community Monitor’ website – as well as find international standards , regulations and scientific papers that can be used together with the local evidence.
However, the emergence of the Internet and the Web as a global infrastructure has enabled a new incarnation of citizen science: the realisation of scientists that the public can provide free labour, skills, computing power and even funding, and, the growing demands from research funders for public engagement all contributing to the motivation of scientists to develop and launch new and innovative projects (Silvertown 2009; Cohn 2008). These projects utilise the abilities of personal computers, GPS receivers and mobile phones to double as scientific instruments.

This third type of citizen science has been termed ‘citizen cyberscience’ by Francois Grey (2009). Within it, it is possible to identify three sub-categories: volunteered computing, volunteered thinking and participatory sensing.

Volunteered computing was first developed in 1999, with the foundation of SETI@home (Anderson et al. 2002), which was designed to distribute the analysis of data that was collected from a radio telescope in the search for extra-terrestrial intelligence. The project utilises the unused processing capacity that exists in personal computers, and uses the Internet to send and receive ‘work packages’ that are analysed automatically and sent back to the main server. Over 3.83 million downloads were registered on the project’s website by July 2002. The system on which SETI@home is based, the Berkeley Open Infrastructure for Network Computing (BOINC), is now used for over 100 projects, covering Physics, processing data from the Large Hadron Collider through LHC@home; Climate Science with the running of climate models in Climateprediction.net; and Biology in which the shape of proteins is calculated in Rosetta@home.

While volunteered computing requires very little from the participants, apart from installing software on their computers, in volunteered thinking the volunteers are engaged at a more active and cognitive level (Grey 2009). In these projects, the participants are asked to use a website in which information or an image is presented to them. When they register onto the system, they are trained in the task of classifying the information. After the training, they are exposed to information that has not been analysed, and are asked to carry out classification work. Stardust@home (Westphal et al. 2006) in which volunteers were asked to use a virtual microscope to try to identify traces of interstellar dust was one of the first projects in this area, together with the NASA ClickWorkers that focused on the classification of craters on Mars. Galaxy Zoo (Lintott et al. 2008), a project in which volunteers classify galaxies, is now one of the most developed ones, with over 100,000 participants and with a range of applications that are included in the wider Zooniverse set of projects (see http://www.zooniverse.org/) .

Participatory sensing is the final and most recent type of citizen science activity. Here, the capabilities of mobile phones are used to sense the environment. Some mobile phones have up to nine sensors integrated into them, including different transceivers (mobile network, WiFi, Bluetooth), FM and GPS receivers, camera, accelerometer, digital compass and microphone. In addition, they can link to external sensors. These capabilities are increasingly used in citizen science projects, such as Mappiness in which participants are asked to provide behavioural information (feeling of happiness) while the phone records their location to allow the linkage of different locations to wellbeing (MacKerron 2011). Other activities include the sensing of air-quality (Cuff 2007) or noise levels (Maisonneuve et al. 2010) by using the mobile phone’s location and the readings from the microphone.”

At the State of the Map (EU) 2011 conference that was held in Vienna from 15-17 July, I gave a keynote talk on the relationships between the OpenStreetMap  (OSM) community and the GIScience research community. Of course, the relationships are especially important for those researchers who are working on volunteered Geographic Information (VGI), due to the major role of OSM in this area of research.

The talk included an overview of what researchers have discovered about OpenStreetMap over the 5 years since we started to pay attention to OSM. One striking result is that the issue of positional accuracy does not require much more work by researchers. Another important outcome of the research is to understand that quality is impacted by the number of mappers, or that the data can be used with confidence for mainstream geographical applications when some conditions are met. These results are both useful, and of interest to a wide range of groups, but there remain key areas that require further research – for example, specific facets of quality, community characteristics  and how the OSM data is used.

Reflecting on the body of research, we can start to form a ‘code of engagement’ for both academics and mappers who are engaged in researching or using OpenStreetMap. One such guideline would be  that it is both prudent and productive for any researcher do some mapping herself, and understand the process of creating OSM data, if the research is to be relevant and accurate. Other aspects of the proposed ‘code’ are covered in the presentation.

The talk is also available as a video from the TU Wien Matterhorn server

 

 

In March 2008, I started comparing OpenStreetMap in England to the Ordnance Survey Meridian 2, as a way to evaluate the completeness of OpenStreetMap coverage. The rational behind the comparison is that Meridian 2 represents a generalised geographic dataset that is widely use in national scale spatial analysis. At the time that the study started, it was not clear that OpenStreetMap volunteers can create highly detailed maps as can be seen on the ‘Best of OpenStreetMap‘ site. Yet even today, Meridian 2 provides a minimum threshold for OpenStreetMap when the question of completeness is asked.

So far, I have carried out 6 evaluations, comparing the two datasets in March 2008, March 2009, October 2009, March 2010, September 2010 and March 2011. While the work on the statistical analysis and verification of the results continues, Oliver O’Brien helped me in taking the results of the analysis for Britain and turn them into an interactive online map that can help in exploring the progression of the coverage over the various time period.

Notice that the visualisation shows the total length of all road objects in OpenStreetMap, so does not discriminate between roads, footpaths and other types of objects. This is the most basic level of completeness evaluation and it is fairly coarse.

The application will allow you to browse the results and to zoom to a specific location, and as Oliver integrated the Ordnance Survey Street View layer, it will allow you to see what information is missing from OpenStreetMap.

Finally, note that for the periods before September 2010, the coverage is for England only.

Some details on the development of the map are available on Oliver’s blog.

Follow

Get every new post delivered to your Inbox.

Join 2,661 other followers