Linus’ Law and OpenStreetMap
7 November, 2009
One of the interesting questions that emerged from the work on the quality of OpenStreetMap (OSM) in particular, and Volunteered Geographical Information (VGI) in general, is the validity of the ‘Linus’ Law’ for this type of information.
The law came from Open Source software development and states that ‘Given enough eyeballs, all bugs are shallow’ (Raymond, 2001, p.19). For mapping, I suggest that this can be translated into the number of contributors that worked on a given area. The rationale behind it is that if there is only one contributor in an area he or she might inadvertently introduce some errors. For example, they might forget to survey a street or might position a feature in the wrong location. If there are several contributors, they might notice inaccuracies or ‘bugs’ and therefore the more users, the less ‘bugs’.
In my original analysis I looked only at the number of contributors per square kilometre as a proxy for accuracy, and provided a visualisation of the difference across England.
During the past year, Aamer Ather and Sofia Basiouka looked at this issue, by comparing the positional accuracy of OSM in 125 sq km of London. Aamer carried out a detailed comparison of OSM and the Ordnance Survey MasterMap Integrated Transport Network (ITN) layer. Sofia took the results from his study and divided them for each grid square, so it was possible to calculate an overall value for every cell. The value is the average of the overlap between OSM and OS objects, weighted by the length of the ITN object. The next step was to compare the results to the number of users at each grid square, as calculated from the nodes in the area.
The results show that, above 5 users, there is no clear pattern of improved quality. The graph below provide the details – but the pattern is that the quality, while generally very high, is not dependent on the number of users – so Linus’ Law doesn’t apply to OSM (and probably not to VGI in general).
From looking at OSM data, my hypothesis is that, due to the participation inequality in OSM contribution (some users contribute a lot while others don’t contribute very much), the quality is actually linked to a specific user, and not to the number of users.
Yet, I will qualify the conclusion with the statement that further research is necessary. Firstly, the analysis was carried out in London – so checking what is happening in other parts of the country where different users collected the data is necessary. Secondly, the analysis did not include the interesting range of 1 to 5 users, so it might be the case that there is rapid improvement in quality from 1 to 5 and then it doesn’t matter. Maybe the big change is from 1 to 3? Finally, the analysis focused on positional accuracy, and it is worth exploring the impact of the number of users on completeness.
Volunteered Geographical Information Research Network
17 July, 2009
Chris Parker, a PhD student at Loughborough University, organised a dedicated Volunteered Geographical Information research group site on ResearchGate. While I dislike the term – I usually interpret it as the version of ‘volunteered’ as in ‘mum volunteered me to help the old lady cross the street’ – there is no point in trying to change it. When Mike Goodchild coins an acronym, it will stick; it’s sort of a GIScience law!
If you are interested in user-generated geographical content, crowdsourced geographical information, commons-based peer-produced geographical information, or any other way to call this phenomena (for example VGI) – join the group. It will be good to keep in touch, share information and discuss research aspects.
If you are researching in this area you are also welcome to submit a paper to GISRUK 2010 which will be hosted at UCL – we are keen to have a VGI element in the programme, considering that UCL is the host of OpenStreetMap .
In June, Aamer Ather, an M.Eng. student at the department, completed his research comparing OpenStreetMap (OSM) to Ordnance Survey Master Map Integrated Transport Layer (ITN). This was based on the previous piece of research in which another M.Eng. student, Naureen Zulfiqar, compared OSM to Meridian 2.
There are really surprising results. The analysis shows that when A-roads, B-roads and a motorway from ITN are compared to OSM data, the overlap can reach values that are over 95%. When the comparison with Master Map was completed, it became clear that OSM is of better quality than Meridian 2. It is also interesting to note that the results of higher overlap with ITN were achieved under stricter criteria for the buffering procedure that is used for comparison.
As noted, in the original analysis, Meridian 2 was used as the reference dataset, the ground truth. However, comparing Meridian 2 and OSM is not like with like, because OSM is not generalised and Meridian 2 is. The justification for treating Meridian 2 as the reference dataset was that the nodes are derived from high-accuracy datasets and it was expected that the 20 metres filter would not change positions significantly. It turns out that the generalisation impacts the quality of Meridian more than I anticipated. Yet, the advantage of Meridian 2 is that it allows comparisons for the whole of England, since the file size is still manageable, while the complexity of ITN would make an extensive comparison difficult, time-consuming and lengthy.
The results show that for the 4 Ordnance Survey London tiles that we’ve compared, the results put OSM only 10-30% from the ITN centre line. Rather impressive when you consider the knowledge, skills and backgrounds of the participants. My presentation from the State of the Map conference, below, provides more details of this analysis – and the excellent dissertation by Aamer Ather, which is the basis for this analysis, is available to download here.
The one caveat that will need to be explored in future projects is that the comparison in London means that OSM mappers had access to very high-resolution imagery from Yahoo! which have been georeferenced and rectified. Therefore, the high precision might be a result of tracing these images, and the question is what happens in places where high resolution images are not available. Thus, we need to test more tiles and in other places to validate the results in other areas of the UK.
Another student is currently comparing OSM to 1:10,000 map of Athens, so by the end of the summer I hope that it will be possible to estimate quality in other countries. The comparison to ITN in other areas of the UK will wait for a future student who will be interested in this topic!
Terra Future 2009 – OpenStreetMap and Ordnance Survey
28 April, 2009
I have checked on Twitter to see how the follow-up meeting to Terra Future 2009, last Friday, went. It was a very pleasant surprise to see that the idea that I have put forward in February, that the Ordnance Survey should consider hosting OpenStreetMap and donate some data to it, was voted the best idea that came out of Terra Future 2009. With this sort of peer-review of the idea, and with the added benefit of 2 months for rethinking, I still think that it is quite a good idea.
The most important aspect of this idea is to understand that OpenStreetMap and Ordnance Survey can both thrive in the GeoWeb era. Despite the imaginary competition, each has a clear value to certain parts of the marketplace. There are a very clear benefits that the OpenStreetMap community can gain from working closely with the Ordnance Survey – such as some aspects of mapping that the Ordnance Survey are highly knowledgeable about, and vice versa, such as how to innovate in delivery of geographical information. A collaborative model might work after all…
I wonder how this idea will evolve now?
OSM Quality Assessment – S4 presentation
12 January, 2009
The following presentation is a summary of the OSM quality assessment paper that I’ve posted here in August. It was presented in the UCL Centre for Advanced Spatial Analysis (CASA) S4 event which was held on the 8th January 2009.
The presentation does not include additional analysis to what included in the paper, apart from a graph that analyses the bias of coverage in comparison to the Index of Multiple Deprivation (Slide 37) which shows the analysis for urban areas only. In the slide, only areas with size up to single standard deviation from the average are shown. By and large, this means that only urban areas are included.
Nestoria interview
2 November, 2008
Nestoria is a property search engine covering the European market, based on Web 2.0 technologies such as mashups; in this case, a Google Maps mashup to show the locations of the properties. The company blog run a monthly interview and I had the pleasure of being the Nestoria interviewee for this month.
The interview addresses several aspects of neogeography, including the reasons for its rise and the implications for professional GISers. I comment on results from my evaluation of OpenStreetMap data and the implications of crowdsourced geographic information on businesses such as Nestoria.
The interview can be accessed on the Nestoria blog.
Earlier this year, in April, John Krumm from Microsoft Research, the editor of IEEE Pervasive Computing commissioned me to write a paper about OpenStreetMap for the magazine. The paper was written together with Patrick Weber, and it is finally out. It went through the magazine peer review process, and it is part of a set of articales in the October-December issue of the magazine that are dedicated to aspects of user-generated content.
The article was written for a general audience, and aims to provide an easy to understand introduction to OSM that is suitable for technically minded readers (such as the readers of IEEE Pervasive Computing!). It provides some history, a description of the OSM geostack and how it operates and ends with some open issues and challenges that the project is facing.
You can access the article from IEEE website, and it’s full citation is
Haklay, M. and Weber, P., 2008, ‘OpenDtreetMap: User-Generated Street Maps‘, IEEE Pervasive Computing, October–December 2008, pp. 12-18.
OpenStreetMap Quality evalution and other comparisons
19 August, 2008
A comparison of my analysis of OpenStreetMap (OSM) quality evaluation to other examples of quality evaluation brings up some core issues about the nature of the new GeoWeb and the use of traditional sources. The examples that I’m referring to are from Etienne Cherdlu’s SOTM 2007 ‘OSM and the art of bicycle maintenance’, Dair Grant’s comparison of OSM to Google Maps and reality, Ed Johnson’s analysis this summer and Steven Feldman’s brief evaluation in Highgate.
The first observation is of the importance and abundance of well georeferenced, vector-derived public mapping sites, which make several of these comparisons possible (Chedlu, Dair and Feldman). The previous generation of stylised street maps is not readily available for a comparison. In addition to the availability, the ease with which they can be mashed-up is also a significant enabling factor. Without this comparable geographical information, the evaluation would be much more difficult.
Secondly, when a public mapping website was used, it was Google Maps. If Microsoft’s Virtual Earth had also been used, it would arguably allow a three-way comparison as the Microsoft site uses Navteq information, while Google uses TeleAtlas information. Using Ordnance Survey (OS) OpenSpace for comparison is also a natural candidate. Was this familiarity that led to the selection of Google Maps? Or is it because the method of comparison is visual inspection, so adding a third source makes it more difficult? Notice that Google has the cachet of being a correct depiction of reality, which Etienne, Dair and Bob Barr demonstrated not to be the case!
Thirdly, and most significantly, only when vector data was used – in our comparison and in parts of what Ed Johnson has done – a comprehensive analysis of large areas became possible. This shows the important aspect of the role of formats in the GeoWeb – raster is fabulous for the delivery of cartographic representations, but it is a vector that is suitable for analytical and computational analysis. Only OSM allows the user easy download of vector data – no other mass provider of public mapping does.
Finally, there is the issue of access to information, tools and knowledge. As a team that works at a leading research university (UCL), I and the people who worked with me got easy access to detailed vector datasets and the OS 1:10,000 raster. We also have at our disposal multiple GIS packages, so we can use whichever one performs the task with the least effort. The other comparisons had to rely on publically available datasets and software. In such unequal conditions, it is not surprising that I will argue that the comparison that we carried out is more robust and consistent. The issue that is coming up here is the balance between amateurs and experts, which is quite central to Web 2.0 in general. Should my analysis be more trusted than those of Dair’s or Etienne’s, both of whom who are very active in OSM? Does Steven’s familiarity with Highgate, which is greater than mine, make him more of an expert in that area than my consistent application of analysis?
I think that the answer is not clear cut; academic knowledge entails the consistent scrutiny of the data, and I do have the access and the training to conduct a very detailed geographical information quality assessment. In addition, my first job in 1988 was in geographical data collection and GIS development, so I also have professional knowledge in this area. Yet, local knowledge is just as valuable in a specific area and is much better than a mechanical, automatic evaluation. So what is happening is an exchange of knowledge, methods and experiences between the two sides in which both, I hope, can benefit.
OSM quality evaluation
7 August, 2008
In the past year I have worked on the evaluation of OpenStreetMap data. I was helped by Patrick Weber, Claire Ellul, and especially Naureen Zulfiqar who carried out part of the analysis of motorways. The OSM data was compared against Ordnance Survey Meridian 2 and the 1:10,000 raster as they have enough similarity to justify a comparison. Now, as the fourth birthday of OSM is approaching, it is good time to evaluate what was achieved. The analysis shows that, where OSM was collected by several users and benefited from some quality assurance, the quality of the data is comparable and can be fit for many applications. The positional accuracy is about 6 metres, which is expected for the data collection methods that are used in OSM. The comparison of motorways shows about 80% overlap between OSM and OS – but more research is required. The challenges are the many areas that are not covered – currently, OSM has good coverage for only 25% of the land area of England. In addition, in areas that are covered well, quality assurance procedures should be considered – and I’m sure that the OSM crowd will find great ways to make these procedures fun. OSM also doesn’t covered areas at the bottom of the deprivation scale as well as it covers areas that are wealthier. The map below shows the quality of coverage of the two datasets for England, with blue marking areas where OSM coverage is good and red where it is poor.
The full report is available here, and if someone is willing to sponsor further analysis – please get in touch! Notice that the paper is in the process of being peer-reviewed for publication, so comments and suggestions are welcomed.



