1 June, 2014
‘More or Less‘ is a good programme on BBC Radio 4. Regularly exploring the numbers and the evidence behind news stories and other important things, and checking if they stand out. However, the piece that was broadcast this week about Golf courses and housing in the UK provides a nice demonstration of when not to use crowdsourced information. The issue that was discussed was how much actual space golf courses occupy, when compared to space that is used for housing. All was well, until they announced in the piece the use of clever software (read GIS) with a statistical superhero to do the analysis. Interestingly, the data that was used for the analysis was OpenStreetMap – and because the news item was about Surrey, they started doing the analysis with it.
For the analysis to be correct, you need to assume that all the building polygons in OpenStreetMap and all the Golf courses have been identified and mapped. My own guess that in Surrey, this could be the case – especially with all the wonderful work of James Rutter catalysed. However, assuming that this is the case for the rest of the country is, well, a bit fancy. I wouldn’t dare to state that OpenStreetMap is complete to such a level, without lots of quality testing which I haven’t seen. There is only the road length analysis of ITO World! and other bits of analysis, but we don’t know how complete OSM is.
While I like OpenStreetMap very much, it is utterly unsuitable for any sort of statistical analysis that works at the building level and then summing up to the country level – because of the heterogeneity of the data . For that sort of thing, you have to use a consistent dataset, or at least one that attempts to be consistent, and that data comes from the Ordnance Survey.
As with other statistical affairs, the core case that is made about the assertion as a whole in the rest of the clip is relevant here. First, we should question the unit of analysis (is it right to compare the footprint of a house to the area of Golf courses? Probably not) and what is to be gained by adding up individual building’s footprints to the level of the UK while ignoring roads, gardens, and all the rest of the built environment. Just because it is possible to add up every building’s footprint, doesn’t mean that you should. Second, this analysis is the sort of example of ‘Big Data’ fallacy which goes analyse first, then question (if at all) what the relationship between the data and reality.
At the State of the Map (EU) 2011 conference that was held in Vienna from 15-17 July, I gave a keynote talk on the relationships between the OpenStreetMap (OSM) community and the GIScience research community. Of course, the relationships are especially important for those researchers who are working on volunteered Geographic Information (VGI), due to the major role of OSM in this area of research.
The talk included an overview of what researchers have discovered about OpenStreetMap over the 5 years since we started to pay attention to OSM. One striking result is that the issue of positional accuracy does not require much more work by researchers. Another important outcome of the research is to understand that quality is impacted by the number of mappers, or that the data can be used with confidence for mainstream geographical applications when some conditions are met. These results are both useful, and of interest to a wide range of groups, but there remain key areas that require further research – for example, specific facets of quality, community characteristics and how the OSM data is used.
Reflecting on the body of research, we can start to form a ‘code of engagement’ for both academics and mappers who are engaged in researching or using OpenStreetMap. One such guideline would be that it is both prudent and productive for any researcher do some mapping herself, and understand the process of creating OSM data, if the research is to be relevant and accurate. Other aspects of the proposed ‘code’ are covered in the presentation.
In March 2008, I started comparing OpenStreetMap in England to the Ordnance Survey Meridian 2, as a way to evaluate the completeness of OpenStreetMap coverage. The rational behind the comparison is that Meridian 2 represents a generalised geographic dataset that is widely use in national scale spatial analysis. At the time that the study started, it was not clear that OpenStreetMap volunteers can create highly detailed maps as can be seen on the ‘Best of OpenStreetMap‘ site. Yet even today, Meridian 2 provides a minimum threshold for OpenStreetMap when the question of completeness is asked.
So far, I have carried out 6 evaluations, comparing the two datasets in March 2008, March 2009, October 2009, March 2010, September 2010 and March 2011. While the work on the statistical analysis and verification of the results continues, Oliver O’Brien helped me in taking the results of the analysis for Britain and turn them into an interactive online map that can help in exploring the progression of the coverage over the various time period.
Notice that the visualisation shows the total length of all road objects in OpenStreetMap, so does not discriminate between roads, footpaths and other types of objects. This is the most basic level of completeness evaluation and it is fairly coarse.
The application will allow you to browse the results and to zoom to a specific location, and as Oliver integrated the Ordnance Survey Street View layer, it will allow you to see what information is missing from OpenStreetMap.
Finally, note that for the periods before September 2010, the coverage is for England only.
Some details on the development of the map are available on Oliver’s blog.
23 March, 2011
This post reviews the two books about OpenStreetMap that appeared late in 2010: OpenStreetMap: Using and Enhancing the Free Map of the World (by F. Ramm, J. Topf & S. Chilton, 386 pages, £25) and OpenStreetMap: Be your own Cartographer (by J. Bennett, 252 pages, £25). The review was written by Thomas Koukoletsos, with some edits by me. The review first covers the Ramm et al. book, and then compares it to Bennett’s. It is fairly details, so if you want to see the recommendation, scroll all the way down.
OpenStreetMap: Using and Enhancing the Free Map of the World is a comprehensive guide to OpenStreetMap (OSM), aimed at a wide range of readers, from those unfamiliar with the project to those who want to use its information and tools and integrate them with other applications. It is written in accessible language, starting from the basics and presenting things in an appropriate order for the reader to be able to follow, slowly building the necessary knowledge.
Part I, the introduction, covers 3 chapters. It presents the OSM project generally, while pointing to other chapters wherever further details are provided later on. This includes how the project started, a short description of its main interface, how to export data, and some of its related services such as OpenStreetBugs and OpenRouteService. It concludes with a reference on mapping parties and the OSM foundation. This gives all the necessary information for someone new to OSM to get a general idea, without becoming too technical.
Part II, addressing OSM contributors, follows with chapter 4 focusing on how GPS technology is used for OSM. The balance between the technical detail and accessibility continues, so all the necessary information for mapping is presented in an easily digested way even for those not familiar with mapping science. The following chapter covers the whole mapping process using a very comprehensive case study, through which the reader understands how to work in the field, edit and finally upload the collected data. Based on this overview, the next chapter is slightly more technical, describing the data model followed by OSM. The information provided is necessary to understand how the OSM database is structured.
Chapter 7 moves on to details, describing what objects need to be mapped and how this can be done by using tags. The examples provided help the user to move from simpler to more complicated representations. The importance of this chapter, however, is in emphasising that, although the proposed tagging framework is not compulsory, it would be wise to do it as this will increase the consistency in the OSM database. The chapter ends with a suggestion of mapping priorities, from ‘very important’ objects and attributes to ‘luxury’ ones. Chapter 8 continues with map features, covering all other proposed mapping priorities. The split between the two chapters guides the user gradually from the most important features to those covered by expert OSM users, as otherwise mapping might have been far too difficult a task for new participants.
Chapter 9 describes Potlatch, an online editor which is the most popular. The description is simple and complete, and by the end the user is ready to contribute to the OSM database. The next chapter refers to JOSM, an offline editor designed for advanced users, which is more powerful than Potlatch but more difficult to use – although the extensive instructions make the use of this tool almost as easy as Potlatch. Chapter 11 concludes the review of editors by providing basic information on 5 other editors, suitable for desktop or mobile use. Chapter 12 presents some of the tools for mappers, designed to handle the OSM data or perform quality assurance tests. Among the capabilities described are viewing data in layers, monitoring changes in an area, viewing roads with no names, etc. The second part ends, in Chapter 13, with a description of the OSM licensing framework, giving the reader a detailed view of what source of data should be avoided when updating OSM to save it from copyright violations.
Part III of Ramm et al. is far more technical, beginning with how to use OSM on web pages. After providing the necessary information on tiling used for the OSM map (Mapnik and Tiles@Home servers), chapter 14 moves on to the use of OSM with Google Maps or with OpenLayers. Code is provided to assist the learning process. Chapter 15 provides information on how to download data, including the ability to download only changes and update an already downloaded version, explained further in a following chapter.
The next three chapters dive into cartographic issues, with chapter 16 starting with Osmarender, which helps visualising OSM data. With the help of many examples, the reader is shown how this tool can be used to render maps, and how to customise visualisation rules to create a personal map style. Chapter 17 continues with Mapnik, a more efficient tool than Osmarender for large datasets. Its efficiency is the result of reading the data from a PostgreSQL database. A number of other tools are required to be installed for Mapnik; however, they are all listed with basic installation instructions. The chapter concludes with performance tips, with an example of layers used according to the zooming level so that rendering is faster. The final renderer, described in chapter 18, is Kosmos. It is a more user-friendly application than the previous two, and the only one with a Graphical User Interface (GUI). The rules used to transform OSM data into a map come from the wiki pages, so anyone in need of a personal map style will have to create a wiki page. There is a description of a tiling process using Kosmos, as well as of exporting and printing options. The chapter concludes by mentioning Maperitive, the successor to Kosmos to be released shortly.
Chapter 19 is devoted to mobile use of OSM. After explaining the basics of navigation and route planning, there is a detailed description of how to create and install OSM data on Garmin GPS receivers. Additional applications for various types of devices are briefly presented (iPhones, iPods, Android), as well as other routing applications. Chapter 20 closes the third part of the book with an extensive discussion on licence issues of OSM data and its derivatives. The chapter covers the CC-BY-SA licence framework, as well as a comprehensive presentation of the future licence, without forgetting to mention the difficulties of such a change.
Part IV is the most technical part, aimed at those who want to integrate OSM into their applications. Chapter 21 reveals how OSM works, beginning with the OSM subversion repository, where the software for OSM is managed. Chapter 22 explains how the OSM Application Programming Interface (API) works. Apart from the basic data handling modes (create, retrieve, update or delete objects and GPS tracks), other methods of access are described, as well as how to work with changesets. The chapter ends with OAuth, a method to allow OSM authentication through third party applications keeping the necessary user information. Chapter 23 continues with XAPI, which is a different API that, although offers only read requests and its data may be a few minutes old, it allows more complex queries, returns more data than the standard API (e.g. historic versions) and allows RSS feeds from selected objects. Next, the Name Finder and Nominatim search engines for gazetteer purposes are covered. Lastly, GeoNames is mentioned, which, although not an OSM relative, can be used in combination with other OSM tools.
Chapter 24 presents Osmosis, a tool to filter and convert OSM data. Apart from enabling read and write of XML files, this tool is also able to access PostgreSQL and MySql databases for read and write purposes. It also describes how to create and process change files in order to continually update a local dataset or database from the OSM server. Chapter 25 moves deeper into more advanced editing, presenting the basics of large-scale or other automated changes. As such changes can affect a lot of people and their contributions, the chapter begins with ‘a note of caution’, discussing that, although power editing is available to everyone, a contact and discussion with those whose data is to be changed should be made.
Chapter 26 focuses on imports and exports including some of the programs that are used for specific data types. The final chapter presents a rather more detailed overview of how to run an OSM as well as a tile server, covering the requirements and installation. There is also a presentation of the API schema, and alternatives to the OSM API are also mentioned.
The book ends with the appendix, consisting of two parts, covering geodesy basics, and specifically geographic coordinates, datum definition and projections; and information on local OSM communities for a few selected countries.
Overall, the book is accessible and comprehensive.
Chapters 1 and 2 give a general description of the OSM project and correspond to the first three chapters of Ramm et al. The history of OSM is more detailed here. The main OSM web page description does not include related websites but, on the other hand, it does describe how to use the slippy map as well as how to interact with data. The chapters also focus on the social aspect of the project, briefly presenting more details on a user’s account (e.g. personalisation of the user’s profile by adding a user photo, home location to enable communication with other users in the area or notification of local events).
Chapter 3 corresponds to chapters 4 and 5 of the first book. There is a more detailed description of how GPS works, as well as of how to configure the receiver; however, the other ways of mapping are less detailed. A typical mapping example and a more comprehensive description of the types of GPS devices suitable for OSM contribution, which are provided in Ramm et al., are missing.
Chapter 4 corresponds to chapters 6, 7 and 8 of the first book. Some less than important aspects are missing, such as the data model history. However, Ramm et al. is much more detailed on how to map objects, classifying them according to their importance and providing practical examples of how to do it, while in this chapter a brief description of tags is provided. Both books succeed in communicating the significance of following the wiki suggestions when it comes to tagging, despite the ‘any tags you like’ freedom. An interesting point, which is missing from the first book, is the importance of avoiding tagging for the renderer, explained here with the use of a comprehensive example.
Chapter 5 describes the editors Potlatch, JOSM and Merkaartor, corresponding with chapters 9, 10, and 11 of Ramm et al. Having the three editors in one chapter allows for a comparison table between them, giving a much quicker insight. A practical example with a GPS trace file helps in understanding the basics operation with these editors. More attention is given to Potlatch, while the other two editors are described only briefly. No other editors are described or mentioned.
Chapter 6 provides a practical example of using the three editors and shows how to map objects, which was covered in chapters 6, 7 and 8 in the first book. While the first book is more detailed and includes a wider range of mapping cases, here the reader becomes more familiar with the editors and learns how to provide the corresponding information. In addition to the material in the first book, here we have an example of finding undocumented tags and using OSMdoc.
Chapter 7 corresponds to chapter 12 of the first book, with a detailed description of the four basic tools to check OSM data for errors. However, Ramm et al. offers a broader view by mentioning or briefly describing seven other error-checking tools.
Chapter 8 deals with map production, similar to chapters 2, 16 and 18 of Ramm et al. The Osmarender tool is described in detail in both books. Kosmos renderer, however, is described in much more detail here, although it is no longer developed. The chapter’s summary here is very useful, as it presents briefly the 3 rendering tools and compares them. What is missing from this book, however, is a description of Mapnik (chapter 17 of Ramm et al.) and also the use of tiling in web mapping.
Chapter 9 corresponds to chapters 15, 22 and 23 of Ramm et al. Regarding planet files, Bennett provides a description of a way to check the planet file’s integrity, which can be useful for automating data integration processes. Moving on to OSM’s API, this book is confined to describing ways of retrieving data from OSM, unlike the first book that also includes operations to create, update or delete data. XAPI, however, is more detailed in this book, including how to filter data. In this chapter’s summary a brief description and comparison of the ways to access data is helpful. On the other hand, Ramm et al. briefly describes additional APIs and web services that are not covered here.
Chapter 10 matches chapter 24 of the first book. In both cases Osmosis is described in detail, with examples of how to filter data. The first book includes a more complete description of command line options, classified according to the data streams (entity or change). This book, on the other hand, is more explanatory on how to access data based on a predefined polygon, and further explaining how to create and use a customised one. The first book mentions additional tasks, such as ‘log progress’, ‘report integrity’, ‘buffer’, ‘sort’, while here only the latter is used during an example. An advantage of Bennett’s book, however, is that the use of Osmosis with a PostgreSQL database, as well as how to update data and how to automate a database update procedure, is explained more comprehensively and extensively.
The last chapter talks about future aspects of OSM. The OSM licence and its future development is explained in a comprehensive way, corresponding to the end of chapter 20 of the first book, with the use of some good examples to show where the present OSM licence is problematic. However, throughout Bennett’s book, licence issues are not covered as well as in Ramm et al. (chapters 13, 20), and the reader needs to reach the end of the book to understand what is allowed and what is not with the OSM data. Moving on, MapCSS, a common stylesheet language for OSM, is explained in detail, while in the first book it is simply mentioned at the end of chapter 9 during a discussion of Potlatch 2. The book ends with Mapzen POI collector for iPhone, covered in chapter 11 of the first book.
When compared to the first book, what is missing here is the use of OSM for navigation in mobile devices (chapter 19), large-scale editing (chapter 25), writing or finding software for OSM (chapter 21) and how to run an OSM server (chapter 27). Another drawback is the lack of coloured images; in some cases (e.g. chapter 7 – the NoName layer) it is difficult to understand them.
So which book is for me?
Both the books more or less deal with the same information, as shown by the chapters’ comparison and sequence.
Although there are areas where the two books are complementary, in most cases Ramm et al. provides a better understanding of the matters discussed, using a broader and more extensive view. It addresses a wide range of readers, from those unfamiliar with OSM to the advanced programmers who want to utilise it elsewhere, and is written with a progressive build-up of knowledge, which helps in the learning process. It also benefits from the dedicated website where updates are provided. Bennett’s book, on the other hand, would be comparably more difficult to read for someone who has not heard of OSM, as well as for those in need of using it but who are not programming experts. There is a hidden assumption that the reader is fairly technically literate. It suffers somewhat from not being introductory enough, while at the same time not being in-depth and detailed.
As the two books are sold at a similar price point, we liked the Ramm et al. book much more and would recommend it to our students.
How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information
10 January, 2011
The paper “How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to “ has appeared in The Cartographic Journal. The proper citation for the paper is:
Haklay, M and Basiouka, S and Antoniou, V and Ather, A (2010) How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information. The Cartographic Journal , 47 (4) , 315 – 322.
The abstract of the paper is as follows:
In the area of volunteered geographical information (VGI), the issue of spatial data quality is a clear challenge. The data that are contributed to VGI projects do not comply with standard spatial data quality assurance procedures, and the contributors operate without central coordination and strict data collection frameworks. However, similar to the area of open source software development, it is suggested that the data hold an intrinsic quality assurance measure through the analysis of the number of contributors who have worked on a given spatial unit. The assumption that as the number of contributors increases so does the quality is known as `Linus’ Law’ within the open source community. This paper describes three studies that were carried out to evaluate this hypothesis for VGI using the OpenStreetMap dataset, showing that this rule indeed applies in the case of positional accuracy.
To access the paper on the journal’s website, you can follow the link: 10.1179/000870410X12911304958827. However, if you don’t hold a subscription to the journal, a postprint of the paper is available at the UCL Discovery repository. If you would like to get hold of the printed version, email me.
21 October, 2010
One issue that remained open in the studies on the relevance of Linus’ Law for OpenStreetMap was that the previous studies looked at areas with more than 5 contributors, and the link between the number of users and the quality was not conclusive – although the quality was above 70% for this number of contributors and above it.
Now, as part of writing up the GISRUK 2010 paper for journal publication, we had an opportunity to fill this gap, to some extent. Vyron Antoniou has developed a method to evaluate the positional accuracy on a larger scale than we have done so far. The methodology uses the geometric position of the Ordnance Survey (OS) Meridian 2 road intersections to evaluate positional accuracy. Although Meridian 2 is created by applying a 20-metre generalisation filter to the centrelines of the OS Roads Database, this generalisation process does not affect the positional accuracy of node points and thus their accuracy is the best available. An algorithm was developed for the identification of the correct nodes between the Meridian 2 and OSM, and the average positional error was calculated for each square kilometre in England. With this data, which provides an estimated positional accuracy for an area of over 43,000 square kilometres, it was possible to estimate the contribution that additional users make to the quality of the data.
As can be seen in the chart below, positional accuracy remains fairly level when the number of users is 13 or more – as we have seen in previous studies. On the other hand, up to 13 users, each additional contributor considerably improves the dataset’s quality. In grey you can see the maximum and minimum values, so the area represents the possible range of positional accuracy results. Interestingly, as the number of users increases, positional accuracy seems to settle close to 5m, which is somewhat expected when considering the source of the information – GPS receivers and aerial imagery. However, this is an aspect of the analysis that clearly requires further testing of the algorithm and the datasets.
It is encouraging to see that the results of the analysis are significantly correlated. For the full dataset the correlation is weak (-0.143) but significant at the 0.01 level (2-tailed). However, the average values for each number of contributors (blue line in the graph), the correlation is strong (-0.844) and significant at the 0.01 level (2-talled).
An important caveat is that the number of tiles with more than 10 contributors is fairly small, so that is another aspect that requires further exploration. Moreover, spatial data quality is not just positional accuracy, but also attribute accuracy, completeness, update and other properties. We can expect that they will also exhibit similar behaviour to positional accuracy, but this requires further studies – as always.
However, as this is a large-scale analysis that adds to the evidence from the small-scale studies, it is becoming highly likely that Linus’ Law is affecting the quality of OSM data and possibly of other so-called Volunteered Geographical Information (VGI) sources and there is a decreased gain in terms of positional accuracy when the number of contributors passes about 10 or so.
5 October, 2010
The London Citizen Cyberscience Summit in early September was a stimulating event, which brought together a group of people with an interest in this area. A report from the event, with a very good description of the presentations, including a reflection piece, is available on the ‘Strange Attractor’ blog.
During the summit, I discussed the aspects of ‘Extreme’ Citizen Science, where we move from usual science to participatory research. The presentation was partly based on a paper that I wrote and that I presented during the workshop on the value of Volunteered Geographical Information in advancing science, which was run as part of the GIScience 2010 conference towards the middle of September. Details about the workshop are available on the workshop’s website including a set of interesting position papers.
The presentation below covers the topics that I discussed in both workshops. Here, I provide a brief synopsis for the presentation, as it is somewhat different from the paper.
In the talk, I started by highlighting that by using different terminologies we can notice different facets of the practice of crowd data collection (VGI within the GIScience community, crowdsourcing, participatory mapping …).
The first way in which we can understand this information is in the context of Web 2.0 applications. These applications can be non-spatial (such as Wikipedia or Twitter), or implicitly spatial (such as Flickr – you need to be in a location before you can capture a photograph), or explicitly spatial , in applications that are about collecting geographical information – for example OpenStreetMap. When looking at VGI from the perspective of Web 2.0 it’s possible to identify the specific reasons that it emerged and how other similar applications influence its structure and practices.
The second way to view this information is as part of geographical information produced by companies who need mapping information (such as Google or TomTom). In this case, you notice that it’s about reducing the costs of labour and the need for active or passive involvement of the person who carries out the mapping.
The third, and arguably new way to view VGI is as part of Citizen Science. These activities have been going for a long time in ornithology and in meteorology. However, there are new forms of Citizen Science that rely on ICT – such as movement-activated cameras (slide 11 on the left) that are left near animal trails and are operated by volunteers, or a network of accelerometers that form a global earthquake monitoring network. Not all Citizen Science is spatial, and there are very effective examples, especially in the area of Citizen Cyberscience. So in this framing of VGI we can pay special attention to the collection of scientific information. Importantly, as in the case of spatial application, some volunteers become experts, such as Hanny van Arkel who has discovered a type of galaxy in Galaxy Zoo.
Slides 16-17 show the distribution of crowdsourced images, and emphasise the spatial distribution of information near population centres and tourist attractions. Slides 19-25 show the analysis of the data that was collected by OpenStreetMap volunteers and highlight bias towards highly populated and affluent areas.
Citizen Science is not just about the data collections. There are also cultural problems regarding the trustworthiness of the data, but slides 28-30 show that the data is self-improving as more volunteers engage in the process (in this case, mapping in OpenStreetMap). On that basis, I do question the assumption about trustworthiness of volunteers and the need to change the way we think about projects. There are emerging examples of such Citizen Science where the engagement of participants is at a higher level. For example, noise mapping activities that a community near London City Airport carried out (slides 34-39) which shows that people can engage in science and are well placed when there are opportunities, such as the ash cloud in April 2010, to collect ‘background’ noise. This is not possible without the help of communities.
Finally, slides 40 and 41 demonstrate that it is possible to engage non-literate users in environmental data collection.
So in summary, a limitless Citizen Science is possible – we need to create the tool for it and understand how to run such projects, as well study them.
Completeness in volunteered geographical information – the evolution of OpenStreetMap coverage (2008-2009)
13 August, 2010
The Journal of Spatial Information Science (JOSIS) is a new open access journal in GIScience, edited by Matt Duckham, Jörg-Rüdiger Sack, and Michael Worboys. In addition, the journal adopted an open peer review process, so readers are invited to comment on a paper while it goes through the formal peer review process. So this seem to be the most natural outlet for a new paper that analyses the completeness of OpenStreetMap over 18 months – March 2008 to October 2009. The paper was written in collaboration with Claire Ellul. The abstract of the paper provided below, and you are very welcome to comment on the paper on JOSIS forum that is dedicated to it, where you can also download it.
Abstract: The ability of lay people to collect and share geographical information has increased markedly over the past 5 years as results of the maturation of web and location technologies. This ability has led to a rapid growth in Volunteered Geographical Information (VGI) applications. One of the leading examples of this phenomenon is the OpenStreetMap project, which started in the summer of 2004 in London, England. This paper reports on the development of the project over the period March 2008 to October 2009 by focusing on the completeness of coverage in England. The methodology that is used to evaluate the completeness is comparison of the OpenStreetMap dataset to the Ordnance Survey dataset Meridian 2. The analysis evaluates the coverage in terms of physical coverage (how much area is covered), followed by estimation of the percentage of England population which is covered by completed OpenStreetMap data and finally by using the Index of Deprivation 2007 to gauge socio-economic aspects of OpenStreetMap activity. The analysis shows that within 5 years of project initiation, OpenStreetMap already covers 65% of the area of England, although when details such as street names are taken into consideration, the coverage is closer to 25%. Significantly, this 25% of England’s area covers 45% of its population. There is also a clear bias in data collection practices – more affluent areas and urban locations are better covered than deprived or rural locations. The implications of these outcomes to studies of volunteered geographical information are discussed towards the end of the paper.
3 August, 2010
The process of academic publication takes a long time, so only now my paper from 2008 is finally in print.
So the paper, which should be cited as:
“Haklay, M., 2010, How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets” Environment and Planning B: Planning and Design 37(4) 682 – 703″
It’s abstract is:
Within the framework of Web 2.0 mapping applications, the most striking example of a geographical application is the OpenStreetMap (OSM) project. OSM aims to create a free digital map of the world and is implemented through the engagement of participants in a mode similar to software development in Open Source projects. The information is collected by many participants, collated on a central database, and distributed in multiple digital formats through the World Wide Web. This type of information was termed ‘Volunteered Geographical Information’ (VGI) by Goodchild, 2007. However, to date there has been no systematic analysis of the quality of VGI. This study aims to fill this gap by analysing OSM information. The examination focuses on analysis of its quality through a comparison with Ordnance Survey (OS) datasets. The analysis focuses on London and England, since OSM started in London in August 2004 and therefore the study of these geographies provides the best understanding of the achievements and difficulties of VGI. The analysis shows that OSM information can be fairly accurate: on average within about 6 m of the position recorded by the OS, and with approximately 80% overlap of motorway objects between the two datasets. In the space of four years, OSM has captured about 29% of the area of England, of which approximately 24% are digitised lines without a complete set of attributes. The paper concludes with a discussion of the implications of the findings to the study of VGI as well as suggesting future research directions.
10 July, 2010
The slides below are from my presentation in State of the Map 2010 in Girona, Spain. While the conference is about OpenStreetMap, the presentation covers a range of spatially implicint and explicit crowdsourcing projects and also activities that we carried out in Mapping for Change, which all show that unlike other crowdsourcing activities, geography (and places) are both limiting and motivating contribution to them.
In many ways, OpenStreetMap is similar to other open source and open knowledge projects, such as Wikipedia. These similarities include the patterns of contribution and the importance of participation inequalities, in which a small group of participants contribute very significantly, while a very large group of occasional participants contribute only occasionally; the general demographic of participants, with strong representation from educated young males; or the temporal patterns of engagements, in which some participants go through a peak of activity and lose interest, while a small group joins and continues to invest its time and effort to help the progress of the project. These aspects have been identified by researchers who explored volunteering and leisure activities, and crowdsourcing as well as those who explored commons-based peer production networks (Benkler & Nissenbaum 2006).
However, OpenStreetMap is a project about geography, and deals with the shape of features and information about places on the face of the Earth. Thus, the emerging question is ‘what influence does geography have on OSM?’ Does geography make some fundamental changes to the basic principles of crowdsourcing, or should OSM be treated as ‘wikipedia for maps’?
In the presentation, which is based on my work, as well as the work of Vyron Antoniou and Nama Budhathoki, we argue that geography is playing a ‘tyrannical’ role in OSM and other projects that are based on crowdsourced geographical information and shapes the nature of the project beyond what is usually accepted.
The first influence of geography is on motivation. A survey of OSM participants shows that specific geographical knowledge, which a participant acquired at first hand, and the wish to use this knowledge and see it mapped well is an important factor in participation in the project. We found that participants are driven to mapping activities by their desire to represent the places they care about and fix the errors on the map. Both of these motives require local knowledge.
A second influence is on the accuracy and completeness of coverage, with places that are highly populated, and therefore have a larger pool of potential participants, showing better coverage than suburban areas of well-mapped cities. Furthermore, there is an ongoing discussion within the OSM community about the value of mapping without local knowledge and the impact of such action on the willingness of potential contributors to fix errors and contribute to the map.
A third, and somewhat surprising, influence is the impact of mapping places that the participants haven’t or can’t visit, such as Haiti after the earthquake or Baghdad in 2007. Despite the willingness of participants to join in and help in the data collection process, the details that can be captured without being on the ground are fairly limited, even when multiple sources such as Flickr images, Google Street View and paper maps are used. The details are limited to what was captured at a certain point in time and to the limitations of the sensing device, so the mapping is, by necessity, incomplete.
We will demonstrate these and other aspects of what we termed ‘the tyranny of place’ and its impact on what can be covered by OSM without much effort and which locations will not be covered without a concentrated effort that requires some planning.