OSM quality evaluation

7 August, 2008

In the past year I have worked on the evaluation of OpenStreetMap data. I was helped by Patrick Weber, Claire Ellul, and especially Naureen Zulfiqar who carried out part of the analysis of motorways. The OSM data was compared against Ordnance Survey Meridian 2 and the 1:10,000 raster as they have enough similarity to justify a comparison. Now, as the fourth birthday of OSM is approaching, it is good time to evaluate what was achieved. The analysis shows that, where OSM was collected by several users and benefited from some quality assurance, the quality of the data is comparable and can be fit for many applications. The positional accuracy is about 6 metres, which is expected for the data collection methods that are used in OSM. The comparison of motorways shows about 80% overlap between OSM and OS – but more research is required. The challenges are the many areas that are not covered – currently, OSM has good coverage for only 25% of the land area of England. In addition, in areas that are covered well, quality assurance procedures should be considered – and I’m sure that the OSM crowd will find great ways to make these procedures fun. OSM also doesn’t covered areas at the bottom of the deprivation scale as well as it covers areas that are wealthier. The map below shows the quality of coverage of the two datasets for England, with blue marking areas where OSM coverage is good and red where it is poor.

Difference between OSM and OS Meridian for England

Difference between OSM and OS Meridian for England

The full report is available here, and if someone is willing to sponsor further analysis – please get in touch!

The paper itself have been published – Haklay M, 2010, “How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets” Environment and Planning B: Planning and Design 37(4) 682 – 703

About these ads

15 Responses to “OSM quality evaluation”

  1. Ed Says:

    Nice work. Very interesting to see a methodical and repeatable analysis, rather than just qualitative random sampling. How difficult would it be to apply this method of comparison to other countries?

    congrats,
    Ed

  2. mukih Says:

    Thanks Ed.
    The methodology can be used with other datasets that are considered comprehensive – such as Navteq or TeleAtlas sets. The only issue is to get hold of them…


  3. This is a really interesting paper. As far as I can tell though the research doesn’t take into account the history of elements in OSM? When you’re looking at the number of users that have worked on an area, you may actually be discounting users who have worked on an area in the past but whose username was then replaced by later edits? I think it’s very important to take this into account as the very peer-reviewing that you are demanding can actually result in the appearance of *less* usernames.

    Definitely nice work though and good to see OSM analysed in this way, also thanks for mentioning Liverpool ;-)

    John

  4. mukih Says:

    Thank you John.
    Actually, the reason that I’ve analysed the number of users according to the data at the node level is to be able to find information about history of a location. If a person edits an area that was edited in the past, it is highly likely that they will insert some nodes that the first person omitted or positioned wrongly. In this case, you will see the second user appearing in the area on some nodes, but not on all of them.
    I would note, though, that another way to analyse this is to take into account all the nodes and ways that are included in an area. I’ve tried that for London and the results weren’t that different!


  5. [...] that I thought he may have missed relates to the way OSM relates the history of OSM entities. A comment I made on his blog follows: As far as I can tell though the research doesn’t take into account the [...]

  6. Dave Says:

    Very interesting paper.

    I think the availability of aerial imagery is skewing some of the node contribution stats though. The classic example here is TimSC who “owns” a large number of nodes as he’s sat down and traced most of London that wasn’t done already, and probably many other places. For many uses OSM data in these areas would probably be considered incomplete (no road names and often wrong connectivity and/or classification).
    Although other users are likely to appear in the users per grid metric as they actually map an area, they’ll probably never compete with the original number of nodes contributed.

    It would be interesting to see if it was possible to remove this effect from the stats to determine what kind of contribution people are making with on-the-ground mapping efforts. I wouldn’t be surprised by similar results, just with different people heading the lists!

  7. mukih Says:

    Thanks Dave.

    I think that you are right about the Yahoo! imagery, plus there are quality issues (positional and attribute) that are caused by trusting it without ground truth.

    It will be indeed interesting to see the ‘on the ground’ contributions, though it will require more processing – another thing for the ‘to do’ list…


  8. [...] student attempting to go his entire time at uni using only OSM maps. The result is that the OSM now compares favourably versus some professionally gathered geodata. Most impressive has been the takeup in Germany: 300 volunteers mapped 99.8% of Hamburg (German), [...]


  9. [...] student attempting to go his entire time at uni using only OSM maps. The result is that the OSM now compares favourably versus some professionally gathered geodata. Most impressive has been the takeup in Germany: 300 volunteers mapped 99.8% of Hamburg (German), [...]


  10. [...] pasar todo su tiempo en la universidad utilizando solo mapas de OSM. El resultado es que OSM ahora se compara favorablemente frente a geodatos capturados profesionalmente. Aun más impresionante ha sido la recepción en Alemania: 300 voluntarios mapearon el 99.8% de [...]


  11. [...] but not least, les données d’OSM seraient, selon un groupe d’universitaires, aussi fiables que celles de l’Ordnance Survey pour ce qui est du territoire britannique, et [...]


  12. [...] information.  One great example of applying academic research to OSM is Muki Haklay’s 2008 OSM Quality Evaluation work, in which Muki compared OSM data to data sets produced by the UK Ordnance Survey (OS) – the UK [...]


  13. [...] database and the pains they take to represent real world features correctly. Moreover, at least one study shows that OSM data compares favorably with data collected by the Ordnance Survey, the UK’s [...]


  14. [...] some places duplicated, is for the majority of use cases, good enough. This was backed up by research undertaken by Muki Haklay of UCL which answered the perennial question of “how good is OSM data” with a pithy [...]


  15. [...] Por una parte, Antonio Villena presentará Determinación de la calidad de OpenStreetMap (OSM) para la Comunidad de Madrid, un trabajo basado en la comparativa de OSM y los datos de la británica Ordnance Survey de Muki Haklay. [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,082 other followers

%d bloggers like this: