Horizon’s ‘Supporting the contextual footprint – Infrastructure Challenges’ presentation

The Digital Economy is a research programme of Research Council UK, and as part of it the University of Nottingham is running the Horizon Digital Economy research centre. The institute organised a set of theme days, and the latest one focused on ‘supporting the contextual footprint – infrastructure challenges‘. The day was excellent, covering issues such as background on location issues with a review of location technology and a demonstration of car pooling application, data ownership, privacy and control over your information and finally crowdsourcing. I was asked to give a presentation with a bit of background on OpenStreetMap, discuss the motivation of contributors and mention the business models that are based on open geographical information.

For the purpose of this demonstration, I teamed with Nama Raj Budhathoki who is completing his PhD research at the University of Illinois, Urbana-Champaign under the supervision of Zorica Nedović-Budić (now at University College Dublin). His research focuses on user-generated geographical information, and just before Christmas he run a survey of OpenStreetMap contributors, and I was involved in the design of the questionnaire (as well as being lucky enough to be on Nama’s advisory committee).

So here is the presentation and we plan to give more comprehensive feedback on the survey during State of the Map 2010.

Advertisements

Haiti – how can VGI help? Comparison of OpenStreetMap and Google Map Maker

As the relief effort to the crisis in Haiti unfolds, so does response from mapping organisations with global reach. It is a positive development that free data is available from the Volunteered Geographic Information (VGI) community to assist humanitarian work on such a large scale, and good that there are now two sources. However, it is sad to discover that there seems to be friction between Google Map Maker and OpenStreetMap as to which organisation will prevail among governmental and NGO users. A key issue is surely to ascertain – and fast – which source of crowdsourced geographic information is most useful for which geographical area, and where the differences lie.

I did this assessment today, in the hope that it is useful for the emergency relief work now, and for the reconstruction work to follow. The data is current for the 18th January 2010, and the results are available  here.

The evaluation of the coverage of Google Map Maker and OpenStreetMap for Haiti was done using the same methodology as for the comparison of OpenStreetMap and Ordnance Survey data. The shapefile’s projection is UTM zone 18N. In the map here, yellow means that there is a better coverage in Map Maker, and blue means that there is a better coverage in OpenStreetMap. The difference between the two datasets is expressed in metres.

OSM and Map Maker coverage - Haiti - 18 January 2010

Unlike the previous comparison, where it was assumed that one dataset was the more accurate, here it is not helpful to pursue a binary approach. Rather, there are differences between the two sources of data, and these may matter as the relief work is carried out. The evaluation question is:  for each grid square, which of the datasets contains more information in terms of roads length?

The file contains the total roads length for both datasets. The calculated difference between them using the equation:

∑(OSM roads length)-∑(Map Maker roads length)

for each 1km grid square.

The information in the file can be used for the following applications:

  • Users of these mapping products – it can help in judging which dataset to use for each area.
  • Users – it can facilitate conflation –  the process of merging datasets to create a better quality output.
  • Mappers – it can illuminate which areas to focus on, to improve coverage.

If you download the file, notice that the field OSMMMClose indicates that the two datasets are very close to one another – the value 1 is associated with grid squares where the difference between them is less than 200 metres. This might be useful as an indication that the two datasets agree with each other.

I hope that this assessment is helpful for those using the data for the relief effort. If you have ideas on how I can help further in this way, please get in touch.

OpenStreetMap and Meridian 2 – releasing the outputs

Back in September, during AGI Geocommunity ’09, I had a chat with Jo Cook about the barriers to the use of OpenStreetMap data by people who are not experts in the ways the data was created and don’t have the time and resources to evaluate the quality of the information. One of the difficulties is to decide if the coverage is complete (or close to complete) for a given area.

To help with this problem, I obtained permission from the Ordnance Survey research unit to release the results of my analysis, which compares OpenStreetMap coverage to the Ordnance Survey Meridian 2 dataset (see below about the licensing conundrum that the analysis produced as a by-product).

Before using the data, it is necessary to understnad how it was created. The methodology can be used for the comparison of completeness as well as the systematic analysis of other properties of two vector datasets. The methodology is based on the evaluation of two datasets A and B, where A is the reference dataset (Ordnance Survey Meridian 2 in this case) and B is the test dataset (OpenStreetMap), and a dataset C which includes the spatial units that will be used for the comparison (1km grid square across England).

The first step in the analysis is to decide on the spatial units that will be used in the comparison process (dataset C). This can be a reference grid with standard cell size, or some other meaningful geographical unit such as census enumeration units or administrative boundaries (see previous post, where lower level super output areas were used). There are advantages to the use of a regular grid, as this avoids problems that arise from the Modifiable Areal Unit Problem (MAUP) to some extent.

The two datasets (A and B) are then split along the boundaries of the geographical units, while preserving the attributes in each part of the object, to ensure that no information is lost. The splitting is necessary to support queries that address only objects that fall within each geographical unit.

The next step involves the creation of very small buffers around the geographical units. This is necessary because, due to computational errors in the algorithm that calculates the intersections and splits the objects and implementation of operators in the specific GIS package used, the co-ordinates where the object was split might be near, but not at, the boundary of the reference geographical unit. The buffers should be very small so as to ensure that only objects that should be calculated inside the unit’s area will be included in the analysis. In our case, the buffers are 25cm over grid square units that are 1km in length.

Finally, spatial queries can be carried out to evaluate the total length, area or any other property of dataset A that falls within each unit, and to compare these values to the results of the analysis of dataset B. The whole process is described in the image above.

The shape file provided here contains values from -4 to +4, and these values correspond to the difference between OpenStreetMap and Meridian 2. In each grid square, the following equation was calculated:

∑(OSM roads length)-∑(Meridian roads length)

If the value is negative, then the total length of Meridian objects is bigger than the length of OpenStreetMap objects. A value of -1, for example, means that ‘there are between 0 and 1000 metres more Meridian 2’ in this grid square whereas 1 means that ‘there are between 0 and 1000 metres more OpenStreetMap’. Importantly, 4 and -4 mean anything with a positive of negative difference of over 3000 metres. In general, the analysis shows that, if the difference is at levels 3 or 4, then you can consider OpenStreetMap as complete, while 1 and 2 will usually mean that some minor roads are likely to be missing. Also, -1 should be easy to complete. In areas where the values are -2 to -4, the OpenStreetMap community needs to do complete the map.

Finally, a licensing conundrum that shows the problems with both Ordnance Survey principles, which state that anything that is derived from its maps is Crown copyright and part of Ordnance Survey intellectual property, and with the use of the Creative Commons licence for OpenStreetMap data.

Look at the equation above. The left-hand side is indisputably derived from OpenStreetMap, so it is under the CC-By-SA licence. The right-hand side is indisputably derived from Ordnance Survey, so it is clearly Crown copyright. The equation, however, includes a lot of UCL’s work, and, most importantly, does not contain any geometrical object from either datasets – the grid was created afresh. Yet, without ‘deriving’ the total length from each dataset, it is impossible to compute the results that are presented here – but they are not derived by one or the other. So what is the status of the resulting dataset? It is, in my view, UCL copyright – but it is an interesting problem, and I might be wrong.

You can download the data from here – the file includes a metadata document.

If you use the dataset, please let me know what you have done with it.

OpenStreetMap in Athens – as accurate as London

Most of the work that we carried out at UCL in evaluating the quality of OpenStreetMap is focused on England, and particularly on London. This is mainly due to the accessibility of comparative datasets. The reason for this was the availability of data, as the Ordnance Survey research unit kindly provided me with the full Meridian 2 dataset for comparison. More detailed comparison, for which we used MasterMap, came from the wonderful Digimap service, though because of the time that it takes to process it we were limited in the size of the area that was used for comparison.

One of the open questions that remained was the accuracy of data collection in other parts of the world. Luckily, Ourania (Rania) Kounadi, who studied our MSc in GIS at UCL, had access to detailed maps of Athens. She used a 1:10,000 map from the Hellenic Military Geographic Service (HGMS) and focused on an area of 25 square kilometres at the centre of the city. The roads were digitised from the HGMS map, and then the Goodchild-Hunter procedure was used to evaluate the positional accuracy.

The results show that for most of the roads in the evaluation area there was an overlap of 69% to 100% between OSM and HGMS datasets. The average overlap was very close to 90%. Her analysis also included attribute and completeness evaluation, showing that the quality is high on these aspects too.

OSM positional accuracy for Athens

So a pattern is starting to emerge showing that the quality of OSM data is indeed good in terms of positional accuracy. This is surprising at first glance – how come people who are not necessarily trained in geographical data collection and do not use rigorous quality assurance processes produce data that is as good as the authoritative data?

My explanation for this, as I’ve written in my paper about OSM quality, is that it ‘demonstrates the importance of the infrastructure, which is funded by the private and public sector and which allows the volunteers to do their work without significant personal investment. The GPS system and the receivers allow untrained users to automatically acquire their position accurately, and thus simplify the process of gathering geographical information. This is, in a way, the culmination of the process in which highly trained surveyors were replaced by technicians, with the introduction of high-accuracy GPS receivers in the construction and mapping industries over the last decade. The imagery also provides such an infrastructure function – the images were processed, rectified and georeferenced by experts and thus, an OSM volunteer who uses this imagery for digitising benefits from the good positional accuracy which is inherent in the image. So the issue here is not to compare the work of professionals and amateurs, but to understand that the amateurs are actually supported by the codified professionalised infrastructure and develop their skills through engagement with the project.’
Rania’s dissertation is available to download from here.

OpenStreetMap and Ordnance Survey Meridian 2 – Progress maps

As part of an update of the work that I published in August 2008, I re-ran the comparison between the OpenStreetMap and Ordnance Survey Meridian 2 datasets. In a future post, I will provide a full report of this assessment. As I have now completed the evaluation for October 2009 and a re-evaluation of the data from March 2008, I decided to publish some outputs. The map below shows the completeness of OpenStreetMap across England for the two periods. Click on the map to enlarge.

Meridian 2 - OSM Comparison -  Mar '08 / Oct '09
Meridian 2 - OSM Comparison - Mar '08 / Oct '09

The second set of maps show the estimation of completeness when attributes are considered. For this purpose, the calculation takes into account only line objects that are comparable to those in Meridian 2; thus not including features such as footpaths. The following types of roads were used: motorway, motorway_link, primary, primary_link, secondary, secondary_link, trunk, trunk_link, tertiary, tertiary_link, minor, unclassified and residential.

In addition, a test verified that the ‘name’ field is not empty. This is an indication that a street name or road number is included in the attributes of the objects, and thus it can be considered to be complete with basic attributes. In order to make the comparison appropriate, only objects that contain a road name or number in Meridian 2 were included.

Meridian 2 - OSM with Attributes Comparison -  Mar '08 / Oct '09
Meridian 2 - OSM with Attributes Comparison - Mar '08 / Oct '09

The growth within just over a year and a half  is very impressive – rising from 27% in March 2008 to 65% in October 2009. When attributes are considered, it has risen from 7% to 25%. Notice that the criteria that I have set for this comparison are stringent than the one in the previous study, so the numbers – especially for the attribute completeness – are lower than those published in August 2008.

Linus’ Law and OpenStreetMap

One of the interesting questions that emerged from the work on the quality of OpenStreetMap (OSM) in particular, and Volunteered Geographical Information (VGI) in general, is the validity of the ‘Linus’ Law for this type of information.

The law came from Open Source software development and states that ‘Given enough eyeballs, all bugs are shallow’ (Raymond, 2001, p.19). For mapping, I suggest that this can be translated into the number of contributors that have worked on a given area. The rationale behind it is that if there is only one contributor in an area he or she might inadvertently introduce some errors. For example, they might forget to survey a street or might position a feature in the wrong location. If there are several contributors, they might notice inaccuracies or ‘bugs’ and therefore the more users, the less ‘bugs’.

In my original analysis, I looked only at the number of contributors per square kilometre as a proxy for accuracy, and provided a visualisation of the difference across England.

MasterMap Comparison locations in LondonDuring the past year, Aamer Ather and Sofia Basiouka looked at this issue, by comparing the positional accuracy of OSM in 125 sq km of London. Aamer carried out a detailed comparison of OSM and the Ordnance Survey MasterMap Integrated Transport Network (ITN) layer. Sofia took the results from his study and divided them for each grid square, so it was possible to calculate an overall value for every cell. The value is the average of the overlap between OSM and OS objects, weighted by the length of the ITN object. The next step was to compare the results to the number of users at each grid square, as calculated from the nodes in the area.

The results show that, above 5 users, there is no clear pattern of improved quality. The graph below provide the details – but the pattern is that the quality, while generally very high, is not dependent on the number of users – so Linus’ Law does not apply to OSM (and probably not to VGI in general).

Number of OSM Users and positional accuracy compared to ITN From looking at OSM data, my hypothesis is that, due to the participation inequality in OSM contribution (some users contribute a lot while others don’t contribute very much), the quality is actually linked to a specific user, and not to the number of users.
Yet, I will qualify the conclusion with the statement that further research is necessary. Firstly, the analysis was carried out in London, so checking what is happening in other parts of the country where different users collected the data is necessary. Secondly, the analysis did not include the interesting range of 1 to 5 users, so it might be the case that there is rapid improvement in quality from 1 to 5 and then it doesn’t matter. Maybe the big change is from 1 to 3? Finally, the analysis focused on positional accuracy, and it is worth exploring the impact of the number of users on completeness.

Volunteered Geographical Information Research Network

Chris Parker, a PhD student at Loughborough University, organised a dedicated Volunteered Geographical Information research group site on ResearchGate. While I dislike the term – I usually interpret it as the version of ‘volunteered’ as in ‘mum volunteered me to help the old lady cross the street’ – there is no point in trying to change it. When Mike Goodchild coins an acronym, it will stick; it’s sort of a GIScience law!

If you are interested in user-generated geographical content, crowdsourced geographical information, commons-based peer-produced geographical information, or any other way to call this phenomena (for example VGI) – join the group. It will be good to keep in touch, share information and discuss research aspects.
If you are researching in this area you are also welcome to submit a paper to GISRUK 2010 which will be hosted at UCL – we are keen to have a VGI element in the programme, considering that UCL is the host of OpenStreetMap .