Ordnance Survey Code-Point OpenOne of the surprises of the Ordnance Survey OpenData release at the beginning of April was the inclusion of the Code-Point Open dataset, which lists the location of all postcodes in England, Wales and Scotland. This was clearly a very important dataset because of the way postcode geography drives many services and activities in the UK. Before the release, the costs of using postcodes in geographical analysis were prohibitive for many small organisations.

So how usable is this free Code-Point data? The principle of ‘do not look a gift horse  in the mouth’ doesn’t apply here. The whole point of releasing the data is to make it as useful as possible to encourage innovation, so it should be made available in a way that makes it easy to reuse. I evaluated it while analysing a dataset of 11,000 volunteers’ postcodes that I received from a third sector organisation.

The download process is excellent and easy, apart from the fact that there is no clear and short description of the products in a non-technical manner next to each product. To find a description, you need to go to the product page – so you are at least 2 clicks away from the product details. It would be better to have a link from each product and include a brief description in the download page. We will see in a second why this is important…

The next step was the download itself and the opening of the zip file, which was clear and easy. There is an oddity with all Ordnance Survey data that they have a redundant sub-directory in them – so in this case the data resides under \codepo_gb\Code-Point Open\ . The fact that the files is broken up into postcode area instead of one big file of 157MB is fine, but it can be helpful to remind users that they can concatenate files using simple commands – this is especially necessary to less tech-savvy users. So an explanation for Windows users that you can open the Command window using ‘cmd.exe’ and run ‘type a.csv b.csv > common.csv’ can save some people plenty of time.

But the real unpleasant surprise was that nowhere in the downloaded package is there a description of the fields in the files! So you open the files and need to figure out what the fields are. The user manual is hides 4 clicks away from the download page and luckily I knew that the ‘user manual’ is stored under ‘technical information’ on the product page, which is not that obvious at first visit. Why not deliver the user manual with the product ?!? The Doc directory is an obvious place to store it.

The user manual reveals that there are 19 fields in the file, of which 9 (half!) are ‘not available in Code-Point Open’ – so why are they delivered? After figuring out the fields, I created a single line that can be attached to the files before importing them to a GIS:

Postcode,Positional Quality,PR Delete,TP Delete,DQ Delete,RP Delete,BP Delete,PD Delete,MP Delete,UM Delete,Easting,Northing,Country,Regional Health Authority,Health Authority,County,District,Ward,LS Delete.

Of course, all the fields with ‘Delete’ in the name mean that they should be deleted once imported.

Interestingly, once you delete these fields, the total size of Code-Point Open drops from 157MB to 91MB – which means that it can save the Ordnance Survey bandwidth and carbon emissions by making the file smaller.

Another interesting point is that the user manual includes detailed instructions on how to change the postcode to a ‘single spaced postcode’. The instructions are for Excel, Mapinfo and ArcGIS. This is the type of information that can help end-users start using the data faster. Finally, you can use this wonderful information to create lovely maps.

All these problems are minor, apart from the description of the fields which is a major usability error. Similar analysis can be carried out for any of the Ordnance Survey datasets, to ensure that they are useful to their users. There are some easy improvements, such as including the user manual with the distribution, and I’m sure that, over time, the team at the Ordnance Survey will find the time to sort these issues.

On the 23rd March 2010, UCL hosted the second workshop on usability of geographic information, organised by Jenny Harding (Ordnance Survey Research), Sarah Sharples (Nottingham), and myself. This workshop was extending the range of topics that we have covered in the first one, on which we have reported during the AGI conference last year. This time, we had about 20 participants and it was an excellent day, covering a wide range of topics – from a presentation by Martin Maguire (Loughborough) on the visualisation and communication of Climate Change data, to Johannes Schlüter (Münster) discussion on the use of XO computers with schoolchildren, to a talk by Richard Treves (Southampton) on the impact of Google Earth tours on learning. Especially interesting are the combination of sound and other senses in the work on Nick Bearman (UEA) and Paul Kelly (Queens University, Belfast).

Jenny’s introduction highlighted the different aspects of GI usability, from those that are specific to data to issues with application interfaces. The integration of data with software that creates the user experience in GIS was discussed throughout the day, and it is one of the reasons that the issue of the usability of the information itself is important in this field. The Ordnance Survey is currently running a project to explore how they can integrate usability into the design of their products – Michael Brown’s presentation discusses the development of a survey as part of this project. The integration of data and application was also central to Philip Robinson (GE Energy) presentation on the use of GI by utility field workers.

My presentation focused on some preliminary thoughts that are based on the analysis of OpenStreetMap  and Google Map communities response to the earthquake in Haiti at the beginning of 2010. The presentation discussed a set of issues that, if explored, will provide insights that are relevant beyond the specific case and that can illuminate issues that are relevant to daily production and use of geographic information. For example, the very basic metadata that was provided on portals such as GeoCommons and what users can do to evaluate fitness for use of a specific data set (See also Barbara Poore’s (USGS) discussion on the metadata crisis).

Interestingly, the day after giving this presentation I had a chance to discuss GI usability with Map Action volunteers who gave a presentation in GEO-10 . Their presentation filled in some gaps, but also reinforced the value of researching GI usability for emergency situations.

For a detailed description of the workshop and abstracts – see this site. All the presentations from the conference are available on SlideShare and my presentation is below.

Back in September, during AGI Geocommunity ’09, I had a chat with Jo Cook about the barriers to the use of OpenStreetMap data by people who are not experts in the ways the data was created and don’t have the time and resources to evaluate the quality of the information. One of the difficulties is to decide if the coverage is complete (or close to complete) for a given area.

To help with this problem, I obtained permission from the Ordnance Survey research unit to release the results of my analysis, which compares OpenStreetMap coverage to the Ordnance Survey Meridian 2 dataset (see below about the licensing conundrum that the analysis produced as a by-product).

Before using the data, it is necessary to understnad how it was created. The methodology can be used for the comparison of completeness as well as the systematic analysis of other properties of two vector datasets. The methodology is based on the evaluation of two datasets A and B, where A is the reference dataset (Ordnance Survey Meridian 2 in this case) and B is the test dataset (OpenStreetMap), and a dataset C which includes the spatial units that will be used for the comparison (1km grid square across England).

The first step in the analysis is to decide on the spatial units that will be used in the comparison process (dataset C). This can be a reference grid with standard cell size, or some other meaningful geographical unit such as census enumeration units or administrative boundaries (see previous post, where lower level super output areas were used). There are advantages to the use of a regular grid, as this avoids problems that arise from the Modifiable Areal Unit Problem (MAUP) to some extent.

The two datasets (A and B) are then split along the boundaries of the geographical units, while preserving the attributes in each part of the object, to ensure that no information is lost. The splitting is necessary to support queries that address only objects that fall within each geographical unit.

The next step involves the creation of very small buffers around the geographical units. This is necessary because, due to computational errors in the algorithm that calculates the intersections and splits the objects and implementation of operators in the specific GIS package used, the co-ordinates where the object was split might be near, but not at, the boundary of the reference geographical unit. The buffers should be very small so as to ensure that only objects that should be calculated inside the unit’s area will be included in the analysis. In our case, the buffers are 25cm over grid square units that are 1km in length.

Finally, spatial queries can be carried out to evaluate the total length, area or any other property of dataset A that falls within each unit, and to compare these values to the results of the analysis of dataset B. The whole process is described in the image above.

The shape file provided here contains values from -4 to +4, and these values correspond to the difference between OpenStreetMap and Meridian 2. In each grid square, the following equation was calculated:

∑(OSM roads length)-∑(Meridian roads length)

If the value is negative, then the total length of Meridian objects is bigger than the length of OpenStreetMap objects. A value of -1, for example, means that ‘there are between 0 and 1000 metres more Meridian 2’ in this grid square whereas 1 means that ‘there are between 0 and 1000 metres more OpenStreetMap’. Importantly, 4 and -4 mean anything with a positive of negative difference of over 3000 metres. In general, the analysis shows that, if the difference is at levels 3 or 4, then you can consider OpenStreetMap as complete, while 1 and 2 will usually mean that some minor roads are likely to be missing. Also, -1 should be easy to complete. In areas where the values are -2 to -4, the OpenStreetMap community needs to do complete the map.

Finally, a licensing conundrum that shows the problems with both Ordnance Survey principles, which state that anything that is derived from its maps is Crown copyright and part of Ordnance Survey intellectual property, and with the use of the Creative Commons licence for OpenStreetMap data.

Look at the equation above. The left-hand side is indisputably derived from OpenStreetMap, so it is under the CC-By-SA licence. The right-hand side is indisputably derived from Ordnance Survey, so it is clearly Crown copyright. The equation, however, includes a lot of UCL’s work, and, most importantly, does not contain any geometrical object from either datasets – the grid was created afresh. Yet, without ‘deriving’ the total length from each dataset, it is impossible to compute the results that are presented here – but they are not derived by one or the other. So what is the status of the resulting dataset? It is, in my view, UCL copyright – but it is an interesting problem, and I might be wrong.

You can download the data from here – the file includes a metadata document.

If you use the dataset, please let me know what you have done with it.

OSM overlap with Master Map ITN for A and B roads

OSM overlap with Master Map ITN for A and B roads

In June, Aamer Ather, an M.Eng. student at the department, completed his research comparing OpenStreetMap (OSM) to Ordnance Survey Master Map Integrated Transport Layer (ITN). This was based on the previous piece of research in which another M.Eng. student, Naureen Zulfiqar, compared OSM to Meridian 2.

There are really surprising results. The analysis shows that when A-roads, B-roads and a motorway from ITN are compared to OSM data, the overlap can reach values that are over 95%. When the comparison with Master Map was completed, it became clear that OSM is of better quality than Meridian 2. It is also interesting to note that the results of higher overlap with ITN were achieved under stricter criteria for the buffering procedure that is used for comparison.

As noted, in the original analysis, Meridian 2 was used as the reference dataset, the ground truth. However, comparing Meridian 2 and OSM is not like with like, because OSM is not generalised and Meridian 2 is. The justification for treating Meridian 2 as the reference dataset was that the nodes are derived from high-accuracy datasets and it was expected that the 20 metres filter would not change positions significantly. It turns out that the generalisation impacts the quality of Meridian more than I anticipated. Yet, the advantage of Meridian 2 is that it allows comparisons for the whole of England, since the file size is still manageable, while the complexity of ITN would make an extensive comparison difficult, time-consuming and lengthy.

The results show that for the 4 Ordnance Survey London tiles that we’ve compared, the results put OSM only 10-30% from the ITN centre line. Rather impressive when you consider the knowledge, skills and backgrounds of the participants. My presentation from the State of the Map conference, below, provides more details of this analysis – and the excellent dissertation by Aamer Ather, which is the basis for this analysis, is available to download here.

The one caveat that will need to be explored in future projects is that the comparison in London means that OSM mappers had access to very high-resolution imagery from Yahoo! which have been georeferenced and rectified. Therefore, the high precision might be a result of tracing these images, and the question is what happens in places where high resolution images are not available. Thus, we need to test more tiles and in other places to validate the results in other areas of the UK.

Another student is currently comparing OSM to 1:10,000 map of Athens, so by the end of the summer I hope that it will be possible to estimate quality in other countries. The comparison to ITN in other areas of the UK will wait for a future student who will be interested in this topic!

I have checked on Twitter to see how the follow-up meeting to Terra Future 2009, last Friday, went. It was a very pleasant surprise to see that the idea that I have put forward in February, that the Ordnance Survey should consider hosting OpenStreetMap and donate some data to it, was voted the best idea that came out of Terra Future 2009. With this sort of peer-review of the idea, and with the added benefit of 2 months for rethinking, I still think that it is quite a good idea.

The most important aspect of this idea is to understand that OpenStreetMap and Ordnance Survey can both thrive in the GeoWeb era. Despite the imaginary competition, each has a clear value to certain parts of the marketplace. There are a very clear benefits that the OpenStreetMap community can gain from working closely with the Ordnance Survey – such as some aspects of mapping that the Ordnance Survey are highly knowledgeable about, and vice versa, such as how to innovate in delivery of geographical information. A collaborative model might work after all…

I wonder how this idea will evolve now?

Follow

Get every new post delivered to your Inbox.

Join 2,082 other followers