29 January, 2010
After the publication of the comparison of OpenStreetMap and Google Map Maker coverage of Haiti, Nicolas Chavent from the Humanitarian OpenStreetMap Team contacted me and turned my attention to the UN Stabilization Mission in Haiti’s (known as MINUSTAH) geographical dataset, which is seen as the core set for the post earthquake humanitarian effort, and therefore a comparison with this dataset might be helpful, too. The comparison of the two Volunteered Geographical Information (VGI) datasets of OpenStreetMap and Google Map Maker with this core dataset also exposed an aspect of the usability of geographical information in emergency situations that is worth commenting on.
For the purpose of the comparison, I downloaded two datasets from GeoCommons – the detailed maps of Port-au-Prince and the Haiti road network. Both are reported on GeoCommons as originating from MINUSTAH. I combined them together, and then carried out the comparison. As in the previous case, the comparison focused only on the length of the roads, with the hypothesis that, if there is a significant difference in the length of the road at a given grid square, it is likely that the longer dataset is more complete. The other comparisons between established and VGI datasets give ground to this hypothesis, although caution must be applied when the differences are small. The following maps show the differences between the MINUSTAH dataset and OpenStreetMap and MINUSTAH and Google Map Maker datasets. I have also reproduced the original map that compares OpenStreetMap and Map Maker for the purpose of comparison and consistency, as well as for cartographic quality.
The maps show that MINUSTAH does provide fairly comprehensive coverage across Haiti (as expected) and that the volunteered efforts of OpenStreetMap and Map Maker provide further details in urban areas. There are areas that are only covered by one of the datasets, so they all have value.
The final comparison uses the 3 datasets together, with the same criteria as in the previous map – the dataset with the longest length of roads is the one that is considered the most complete.
It is interesting to note the south/north divide between OpenStreetMap and Google Map Maker, with Google Map Maker providing more details in the north, and OpenStreetMap in the south (closer to the earthquake epicentre). When compared over the areas in which there is at least 100 metres of coverage of MINUSTAH, OpenStreetMap is, overall, 64.4% complete, while Map Maker is 41.2% complete. Map Maker is covering further 354 square kilometres which are not covered by MINUSTAH or OpenStreetMap, and OpneStreetMap is covering further 1044 square kilometres that are missing from the other datasets, so clearly there is a benefit in integrating them. The grid that includes the analysis of the integrated datasets in shapefile format is available here, in case that it is of any use or if you like to carry out further analysis and or visualise it.
While working on this comparison, it was interesting to explore the data fields in the MINUSTAH dataset, with some of them included to provide operational information, such as road condition, length of time that it takes to travel through it, etc. These are the hallmarks of practical and operational geographical information, with details that are relevant directly to the end-users in their daily tasks. The other two datasets have been standardised for universal coverage and delivery, and this is apparent in their internal data structure. Google Map Maker schema is closer to traditional geographical information products in field names and semantics, exposing the internal engineering of the system – for example, including a country code, which is clearly meaningless in a case where you are downloading one country! OpenStreetMap (as provided by either CloudMade or GeoFabrik) keeps with the simplicity mantra and is fairly basic. Yet, the scheme is the same in Haiti as in England or any other place. So just like Google, it takes a system view of the data and its delivery.
This means that, from an end-user perspective, while these VGI data sources were produced in a radically different way to traditional GI products, their delivery is similar to the way in which traditional products were delivered, burdening the user with the need to understand the semantics of the different fields before using the data.
In emergency situations, this is likely to present an additional hurdle for the use of any data, as it is not enough to provide the data for download through GeoCommons, GeoFabrik or Google – it is how it is going to be used that matters. Notice that the maps tell a story in which an end-user who wants to have full coverage of Haiti has to combine three datasets, so the semantic interpretation can be an issue for such a user.
So what should a user-centred design of GI for an emergency situation look like? The general answer is ‘find the core dataset that is used by the first responders, and adapt your data to this standard’. In the case of Haiti, I would suggest that the MINUSTAH dataset is a template for such a thing. It is more likely to find users of GI on the ground who are already exposed to the core dataset and familiar with it. The fields are relevant and operational and show that this is more ‘user-centred’ than the other two. Therefore, it would be beneficial for VGI providers who want to help in an emergency situation to ensure that their data comply to the local de facto standard, which is the dataset being used on the ground, and bring their schema to fit it.
Of course, this is what GI ontologies are for, to allow for semantic interoperability. The issue with them is that they add at least two steps – define the ontology and figure out the process to translate the dataset that you have acquired to the required format. Therefore, this is something that should be done by data providers, not by end-users when they are dealing with the real situation on the ground. They have more important things to do than to find a knowledge engineer that can understand semantic interoperability…
As the relief effort to the crisis in Haiti unfolds, so does response from mapping organisations with global reach. It is a positive development that free data is available from the Volunteered Geographic Information (VGI) community to assist humanitarian work on such a large scale, and good that there are now two sources. However, it is sad to discover that there seems to be friction between Google Map Maker and OpenStreetMap as to which organisation will prevail among governmental and NGO users. A key issue is surely to ascertain – and fast - which source of crowdsourced geographic information is most useful for which geographical area, and where the differences lie.
I did this assessment today, in the hope that it is useful for the emergency relief work now, and for the reconstruction work to follow. The data is current for the 18th January 2010, and the results are available here.
The evaluation of the coverage of Google Map Maker and OpenStreetMap for Haiti was done using the same methodology as for the comparison of OpenStreetMap and Ordnance Survey data. The shapefile’s projection is UTM zone 18N. In the map here, yellow means that there is a better coverage in Map Maker, and blue means that there is a better coverage in OpenStreetMap. The difference between the two datasets is expressed in metres.
Unlike the previous comparison, where it was assumed that one dataset was the more accurate, here it is not helpful to pursue a binary approach. Rather, there are differences between the two sources of data, and these may matter as the relief work is carried out. The evaluation question is: for each grid square, which of the datasets contains more information in terms of roads length?
The file contains the total roads length for both datasets. The calculated difference between them using the equation:
∑(OSM roads length)-∑(Map Maker roads length)
for each 1km grid square.
The information in the file can be used for the following applications:
- Users of these mapping products - it can help in judging which dataset to use for each area.
- Users – it can facilitate conflation - the process of merging datasets to create a better quality output.
- Mappers - it can illuminate which areas to focus on, to improve coverage.
If you download the file, notice that the field OSMMMClose indicates that the two datasets are very close to one another – the value 1 is associated with grid squares where the difference between them is less than 200 metres. This might be useful as an indication that the two datasets agree with each other.
I hope that this assessment is helpful for those using the data for the relief effort. If you have ideas on how I can help further in this way, please get in touch.
18 January, 2010
Back in September, during AGI Geocommunity ’09, I had a chat with Jo Cook about the barriers to the use of OpenStreetMap data by people who are not experts in the ways the data was created and don’t have the time and resources to evaluate the quality of the information. One of the difficulties is to decide if the coverage is complete (or close to complete) for a given area.
To help with this problem, I obtained permission from the Ordnance Survey research unit to release the results of my analysis, which compares OpenStreetMap coverage to the Ordnance Survey Meridian 2 dataset (see below about the licensing conundrum that the analysis produced as a by-product).
Before using the data, it is necessary to understnad how it was created. The methodology can be used for the comparison of completeness as well as the systematic analysis of other properties of two vector datasets. The methodology is based on the evaluation of two datasets A and B, where A is the reference dataset (Ordnance Survey Meridian 2 in this case) and B is the test dataset (OpenStreetMap), and a dataset C which includes the spatial units that will be used for the comparison (1km grid square across England).
The first step in the analysis is to decide on the spatial units that will be used in the comparison process (dataset C). This can be a reference grid with standard cell size, or some other meaningful geographical unit such as census enumeration units or administrative boundaries (see previous post, where lower level super output areas were used). There are advantages to the use of a regular grid, as this avoids problems that arise from the Modifiable Areal Unit Problem (MAUP) to some extent.
The two datasets (A and B) are then split along the boundaries of the geographical units, while preserving the attributes in each part of the object, to ensure that no information is lost. The splitting is necessary to support queries that address only objects that fall within each geographical unit.
The next step involves the creation of very small buffers around the geographical units. This is necessary because, due to computational errors in the algorithm that calculates the intersections and splits the objects and implementation of operators in the specific GIS package used, the co-ordinates where the object was split might be near, but not at, the boundary of the reference geographical unit. The buffers should be very small so as to ensure that only objects that should be calculated inside the unit’s area will be included in the analysis. In our case, the buffers are 25cm over grid square units that are 1km in length.
Finally, spatial queries can be carried out to evaluate the total length, area or any other property of dataset A that falls within each unit, and to compare these values to the results of the analysis of dataset B. The whole process is described in the image above.
The shape file provided here contains values from -4 to +4, and these values correspond to the difference between OpenStreetMap and Meridian 2. In each grid square, the following equation was calculated:
∑(OSM roads length)-∑(Meridian roads length)
If the value is negative, then the total length of Meridian objects is bigger than the length of OpenStreetMap objects. A value of -1, for example, means that ‘there are between 0 and 1000 metres more Meridian 2’ in this grid square whereas 1 means that ‘there are between 0 and 1000 metres more OpenStreetMap’. Importantly, 4 and -4 mean anything with a positive of negative difference of over 3000 metres. In general, the analysis shows that, if the difference is at levels 3 or 4, then you can consider OpenStreetMap as complete, while 1 and 2 will usually mean that some minor roads are likely to be missing. Also, -1 should be easy to complete. In areas where the values are -2 to -4, the OpenStreetMap community needs to do complete the map.
Finally, a licensing conundrum that shows the problems with both Ordnance Survey principles, which state that anything that is derived from its maps is Crown copyright and part of Ordnance Survey intellectual property, and with the use of the Creative Commons licence for OpenStreetMap data.
Look at the equation above. The left-hand side is indisputably derived from OpenStreetMap, so it is under the CC-By-SA licence. The right-hand side is indisputably derived from Ordnance Survey, so it is clearly Crown copyright. The equation, however, includes a lot of UCL’s work, and, most importantly, does not contain any geometrical object from either datasets – the grid was created afresh. Yet, without ‘deriving’ the total length from each dataset, it is impossible to compute the results that are presented here – but they are not derived by one or the other. So what is the status of the resulting dataset? It is, in my view, UCL copyright – but it is an interesting problem, and I might be wrong.
You can download the data from here – the file includes a metadata document.
If you use the dataset, please let me know what you have done with it.