Mike Goodchild’s NSF talk ‘From Community Mapping to Critical Spatial Thinking’

Interesting talk from Mike Goodchild in a lecture at the US NSF entitled ‘From Community Mapping to Critical Spatial Thinking’. This talk is a good overview of VGI and links it to the understanding of spatial concepts and integrating them into teaching and research.

The interesting issue raised in the talk is the link between the ability of people to use spatial information and the development of spatial thinking. One vivid memory from the first State of the Map conference was a presentation from a person whowas trying to use a simple GPS receiver way beyond what it was capable of doing, and the tough questioning from the audience at the end, basically telling him that he got it wrong and needed to rethink his project. What was clear was that, for people who are engaged in active data collection and tools development, the critical spatial thinking and the understanding of the technology evolved. At the same time, the evidence from end-users of SatNav devices shows a reduction in spatial understanding due to the ‘tunnel vision’ that the user interface promotes.

Significantly, the number of the latter group is larger than the first group. So are we having shallow spatial understanding without critical spatial thinking?


Interview in GPS Business News

The website GPS Business News published an interview with me in which I covered several aspects of OpenStreetMap and crowdsourced geographical information, including aspects of spatial data quality, patterns of data collection, inequality in coverage and the implications of these patterns to the wider area of Volunteered geographical Information.

The interview is available here .

Linus’ Law and OpenStreetMap – evidence from large-scale analysis

One issue that remained open in the studies on the relevance of Linus’ Law for OpenStreetMap was that the previous studies looked at areas with more than 5 contributors, and the link between the number of users and the quality was not conclusive – although the quality was above 70% for this number of contributors and above it.

Now, as part of writing up the GISRUK 2010 paper for journal publication, we had an opportunity to fill this gap, to some extent. Vyron Antoniou has developed a method to evaluate the positional accuracy on a larger scale than we have done so far. The methodology uses the geometric position of the Ordnance Survey (OS) Meridian 2 road intersections to evaluate positional accuracy. Although Meridian 2 is created by applying a 20-metre generalisation filter to the centrelines of the OS Roads Database, this generalisation process does not affect the positional accuracy of node points and thus their accuracy is the best available. An algorithm was developed for the identification of the correct nodes between the Meridian 2 and OSM, and the average positional error was calculated for each square kilometre in England. With this data, which provides an estimated positional accuracy for an area of over 43,000 square kilometres, it was possible to estimate the contribution that additional users make to the quality of the data.

As can be seen in the chart below, positional accuracy remains fairly level when the number of users is 13 or more – as we have seen in previous studies. On the other hand, up to 13 users, each additional contributor considerably improves the dataset’s quality. In grey you can see the maximum and minimum values, so the area represents the possible range of positional accuracy results. Interestingly, as the number of users increases, positional accuracy seems to settle close to 5m, which is somewhat expected when considering the source of the information – GPS receivers and aerial imagery. However, this is an aspect of the analysis that clearly requires further testing of the algorithm and the datasets.

It is encouraging to see that the results of the analysis are significantly correlated. For the full dataset the correlation is weak (-0.143) but significant at the 0.01 level (2-tailed). However, the average values for each number of contributors (blue line in the graph), the correlation is strong (-0.844) and significant at the 0.01 level (2-talled).

Linus' Law for OpenStreetMap

An important caveat is that the number of tiles with more than 10 contributors is fairly small, so that is another aspect that requires further exploration. Moreover, spatial data quality is not just positional accuracy, but also attribute accuracy, completeness, update and other properties. We can expect that they will also exhibit similar behaviour to positional accuracy, but this requires further studies – as always.

However, as this is a large-scale analysis that adds to the evidence from the small-scale studies, it is becoming highly likely that Linus’ Law is affecting the quality of OSM data and possibly of other so-called Volunteered Geographical Information (VGI) sources and there is a decreased gain in terms of positional accuracy when the number of contributors passes about 10 or so.

The paper is appeared in the Cartographic Journal, see the following post.

Geographical Citizen Science

The London Citizen Cyberscience Summit in early September was a stimulating event, which brought together a group of people with an interest in this area. A report from the event, with a very good description of the presentations, including a reflection piece, is available on the ‘Strange Attractor’ blog.

During the summit, I discussed the aspects of ‘Extreme’ Citizen Science, where we move from usual science to participatory research. The presentation was partly based on a paper that I wrote and that I presented during the workshop on the value of Volunteered Geographical Information in advancing science, which was run as part of the GIScience 2010 conference towards the middle of September. Details about the workshop are available on the workshop’s website including a set of interesting position papers.

The presentation below covers the topics that I discussed in both workshops. Here, I provide a brief synopsis for the presentation, as it is somewhat different from the paper.

In the talk, I started by highlighting that by using different terminologies we can notice different facets of the practice of crowd data collection (VGI within the GIScience community, crowdsourcing, participatory mapping …).

The first way in which we can understand this information is in the context of Web 2.0 applications. These applications can be non-spatial (such as Wikipedia or Twitter), or implicitly spatial (such as Flickr – you need to be in a location before you can capture a photograph), or explicitly spatial , in applications that are about collecting geographical information – for example OpenStreetMap. When looking at VGI from the perspective of Web 2.0 it’s possible to identify the specific reasons that it emerged and how other similar applications influence its structure and practices.

The second way to view this information is as part of geographical information produced by companies who need mapping information (such as Google or TomTom). In this case, you notice that it’s about reducing the costs of labour and the need for active or passive involvement of the person who carries out the mapping.

The third, and arguably new way to view VGI is as part of Citizen Science. These activities have been going for a long time in ornithology and in meteorology. However, there are new forms of Citizen Science that rely on ICT – such as movement-activated cameras (slide 11 on the left) that are left near animal trails and are operated by volunteers, or a network of accelerometers that form a global earthquake monitoring network. Not all Citizen Science is spatial, and there are very effective examples, especially in the area of Citizen Cyberscience. So in this framing of VGI we can pay special attention to the collection of scientific information. Importantly, as in the case of spatial application, some volunteers become experts, such as Hanny van Arkel who has discovered a type of galaxy in Galaxy Zoo.

Slides 16-17 show the distribution of crowdsourced images, and emphasise the spatial distribution of information near population centres and tourist attractions. Slides 19-25 show the analysis of the data that was collected by OpenStreetMap volunteers and highlight bias towards highly populated and affluent areas.

Citizen Science is not just about the data collections. There are also cultural problems regarding the trustworthiness of the data, but slides 28-30 show that the data is self-improving as more volunteers engage in the process (in this case, mapping in OpenStreetMap). On that basis, I do question the assumption about trustworthiness of volunteers and the need to change the way we think about projects. There are emerging examples of such Citizen Science where the engagement of participants is at a higher level. For example, noise mapping activities that a community near London City Airport carried out (slides 34-39) which shows that people can engage in science and are well placed when there are opportunities, such as the ash cloud in April 2010, to collect ‘background’ noise. This is not possible without the help of communities.
Finally, slides 40 and 41 demonstrate that it is possible to engage non-literate users in environmental data collection.

So in summary, a limitless Citizen Science is possible – we need to create the tool for it and understand how to run such projects, as well study them.

“How good is VGI? A comparative study of OpenStreetMap and Ordnance Survey datasets” – published

The process of academic publication takes a long time, so only now my paper from 2008 is finally in print.

So the paper, which should be cited as:
“Haklay, M., 2010, How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets” Environment and Planning B: Planning and Design 37(4) 682 – 703″

It’s abstract is:
Within the framework of Web 2.0 mapping applications, the most striking example of a geographical application is the OpenStreetMap (OSM) project. OSM aims to create a free digital map of the world and is implemented through the engagement of participants in a mode similar to software development in Open Source projects. The information is collected by many participants, collated on a central database, and distributed in multiple digital formats through the World Wide Web. This type of information was termed ‘Volunteered Geographical Information’ (VGI) by Goodchild, 2007. However, to date there has been no systematic analysis of the quality of VGI. This study aims to fill this gap by analysing OSM information. The examination focuses on analysis of its quality through a comparison with Ordnance Survey (OS) datasets. The analysis focuses on London and England, since OSM started in London in August 2004 and therefore the study of these geographies provides the best understanding of the achievements and difficulties of VGI. The analysis shows that OSM information can be fairly accurate: on average within about 6 m of the position recorded by the OS, and with approximately 80% overlap of motorway objects between the two datasets. In the space of four years, OSM has captured about 29% of the area of England, of which approximately 24% are digitised lines without a complete set of attributes. The paper concludes with a discussion of the implications of the findings to the study of VGI as well as suggesting future research directions.

The paper can be found here. If you are interest in a copy of the published version, email me.

The Tyranny of Place and OpenStreetMap

The slides below are from my presentation in State of the Map 2010 in Girona, Spain. While the conference is about OpenStreetMap, the presentation covers a range of spatially implicint and explicit crowdsourcing projects and also activities that we carried out in Mapping for Change, which all show that unlike other crowdsourcing activities, geography (and places) are both limiting and motivating contribution to them.

In many ways, OpenStreetMap is similar to other open source and open knowledge projects, such as Wikipedia. These similarities include the patterns of contribution and the importance of participation inequalities, in which a small group of participants contribute very significantly, while a very large group of occasional participants contribute only occasionally; the general demographic of participants, with strong representation from educated young males; or the temporal patterns of engagements, in which some participants go through a peak of activity and lose interest, while a small group joins and continues to invest its time and effort to help the progress of the project. These aspects have been identified by researchers who explored volunteering and leisure activities, and crowdsourcing as well as those who explored commons-based peer production networks (Benkler & Nissenbaum 2006).

However, OpenStreetMap is a project about geography, and deals with the shape of features and information about places on the face of the Earth. Thus, the emerging question is ‘what influence does geography have on OSM?’ Does geography make some fundamental changes to the basic principles of crowdsourcing, or should OSM be treated as ‘wikipedia for maps’?

In the presentation, which is based on my work, as well as the work of Vyron Antoniou and Nama Budhathoki, we argue that geography is playing a ‘tyrannical’ role in OSM and other projects that are based on crowdsourced geographical information and shapes the nature of the project beyond what is usually accepted.

The first influence of geography is on motivation. A survey of OSM participants shows that specific geographical knowledge, which a participant acquired at first hand, and the wish to use this knowledge and see it mapped well is an important factor in participation in the project. We found that participants are driven to mapping activities by their desire to represent the places they care about and fix the errors on the map. Both of these motives require local knowledge.

A second influence is on the accuracy and completeness of coverage, with places that are highly populated, and therefore have a larger pool of potential participants, showing better coverage than suburban areas of well-mapped cities. Furthermore, there is an ongoing discussion within the OSM community about the value of mapping without local knowledge and the impact of such action on the willingness of potential contributors to fix errors and contribute to the map.

A third, and somewhat surprising, influence is the impact of mapping places that the participants haven’t or can’t visit, such as Haiti after the earthquake or Baghdad in 2007. Despite the willingness of participants to join in and help in the data collection process, the details that can be captured without being on the ground are fairly limited, even when multiple sources such as Flickr images, Google Street View and paper maps are used. The details are limited to what was captured at a certain point in time and to the limitations of the sensing device, so the mapping is, by necessity, incomplete.

We will demonstrate these and other aspects of what we termed ‘the tyranny of place’ and its impact on what can be covered by OSM without much effort and which locations will not be covered without a concentrated effort that requires some planning.

OpenStreetMap completeness evaluation – March 2010

The opening of Ordnance Survey datasets at the beginning of April 2010 is bound to fundamentally change the way OpenStreetMap (OSM) information is produced in the UK. So just before this major change start to influence OpenStreetMap, it is worth evaluating what has been achieved so far without this data. It is also the time to update the completeness study, as the previous ones were conducted with data from March 2008 and March 2009.

Following the same method that was used in all the previous studies (which is described in details here), the latest version of Meridian 2 from OS OpenData was downloaded and used and compared to OSM data which was downloaded from GeoFabrik. The processing is now streamlined with MapBasic scripts, PostGIS scripts and final processing in Manifold GIS so it is possible to complete the analysis within 2 days. The colour scheme for the map is based on Cynthia Brewer and Mark Harrower‘s  ColorBrewer 2.

OSM Completeness 03/10
OSM Completeness 03/10

By the end of March 2010, OpenStreetMap coverage of England grown to 69.8% from 51.2% a year ago. When attribute information is taken into account, the coverage grown to 24.3% from 14.7% a year ago. The chart on the left shows how the coverage progressed over the past 2 years, using the 4 data points that were used for analysis – March 2008, March 2009, October 2009 and March 2010. Notice that in terms of capturing the geometry less than 5% are now significantly under mapped when compared to Meridian 2. Another interesting aspect is the decline in empty cells – that is grid cells that don’t have any feature in Meridian 2 but now have features from OSM appearing in them. So in terms of capturing road information for England, it seems like the goal of capturing the whole country with volunteer effort was within reach, even without the release of Ordnance Survey data.

On the other hand, when attributes are included in the analysis, the picture is very different.

OSM Completeness (with Attributes) 03/10
OSM Completeness (with Attributes) 03/10

The progression of coverage is far from complete, and although the area that is empty of features that include street or road name in Meridian 2 is much larger, the progress of OSM mappers in completing the information is much slower. While the geometry coverage gone up by 18.6% over the past year, less than 10% (9.6% to be precise) were covered when attributes are taken into account. The reason for this is likely to be the need to carry a ground survey to find the street name without using other copyrighted sources.

The attribute area is the one that I would expect will show the benefits of Ordnance Survey data release to OSM mapping. Products such as StreetView and VectorMap District can be used to either copy the street name (StreetView) or write an algorithm that will copy the street name and other attributes from a vector data set – such as Meridian 2 or VectorMap District.

Of course, this is a failure of the ‘crowd’ in the sense that as this bit of information previously required an actual visit on the ground and it was a more challenging task than finding the people who are happy to volunteer their time to digitise maps.

As in the previous cases, there are local variations, and the geography of the coverage is interesting. The information includes 4 time points, so the most appropriate visualisation is one that allows for comparison and transition between maps. Below is a presentation (you can download it from SlideShare) that provides maps for the whole of England as well as 5 regional maps, roughly covering the South West, London, Birmingham and the Midlands, Manchester and Liverpool, and Newcastle upon Tyne and the North West.

If you want to create your own visualisation, of use the results of this study, you can download the results in a shapefile format from here.

For a very nice visualisation of Meridian 2 and OpenStreetMap data – see Ollie O’Brien SupraGeography blog .