The Digital Divide of OpenStreetMap

In my previous analysis of OpenStreetMap (OSM) data, I compared it to the Index of Deprivation, as a way to understand if there is any socio-economic spatial pattern in the coverage of OSM. Following numerous interactions with various parts of the OSM community, I had suspected that there might be a bias, with the result that affluent areas might be mapped more completely than deprived areas. I explored this systematically, as only empirical analysis could provide evidence one way or another.

OSM completeness coverage compared to Index of Deprivation 2007

Here are the details of the analytical process that was used.

The core data that was used for the comparison is the UK government’s Index of Multiple Deprivation 2007 (IMD 2007) which is calculated from a combination of governmental datasets and provides a score for each Lower Level Super Output Area (LSOA) in England. The position of each LSOA was used to calculate the percentile position within the IMD 2007. Each percentile point includes about 325 LSOAs. Areas that are in the bottom percentile are the most deprived, while those at the 99th percentile are the most affluent places in the UK according to the index.

Following the same methodology that was used to evaluate completeness, the road datasets from OSM and from the Ordnance Survey’s Meridian 2 were clipped to each of the LSOAs, and then the total length of the two datasets was compared. Because the size of LSOAs varies, it is more meaningful to compare percentage completeness and not the absolute length.

The analysis of data from March 2008 showed a clear difference between the LSOAs at the bottom of the scale and those at the top. While the LOSAs at the bottom were not neglected, the level of coverage was far lower, even when taking into account the variability in LSOA areas. I wanted to explore whether the situation has changed since then and undertook further analysis using the same methodology.

Has the situation changed during the 19 months from March 2008 to October 2009?

The graph above shows that things have changed, but not for the better. The graph shows the level of completeness for each group of LSOAs. To avoid confusion with rural areas, where the size of the LSOA becomes very large, only LSOAs that are within a standard deviation of area size are included. The effect of this is that the graph shows the results for mostly urban LSOAs.

I compared 3 datasets: March 2008, March 2009 and October 2009. A rather alarming trend is visible. Instead of shrinking, the gap between affluent and deprived LSOAs is growing. The average completeness of the bottom percentile in March 2008 was 40.7%, grew to 65.7% a year later and to 71.8% by October 2008. For the most affluent percentile, completeness grew from 67.5% in March 2008 to 97.0% a year later and to 108.9% by October 2009. In other words, the gap between the top and the bottom has grown from 26.6% to 37.1% within the analysis period.

Within the OpenStreetMap community, there are activities such as those led by Mikel Maron to map informal settlements in Kenya and to ensure coverage of other marginalised parts of the world (see the posts on his blog). From the work that we are doing in Mapping for Change, it is clear to me that mapping can be an excellent motivator to encourage people to use digital tools, and therefore adding data to OSM can work as a way increase digital inclusion. So maybe OSM coverage can be increased in the UK with some governmental support, which has stated an aim of increasing digital inclusion?

If you would like to explore the data by yourself, here is a spreadsheet with the information, including the LSOA codes, the position in IMD 2004 and IMD 2007, and the coverage percentage for March 2008, March 2009 and October 2009. Please note the terms and conditions for its use – and let me know what you have done with it!


Second workshop on geographic information usability – March 2010

In March 2009 Ordnance Survey together with the Human Factors group at the University of Nottingham, ran a workshop on the usability of geographic information.  Bringing together a new grouping of researchers from across disciplines of Human Factors, HCI, Computer Science and Geographic Information Science, the aim of the workshop was to share perspectives on research challenges for investigating usability of data products – in particular geographic information products.  In so doing we wished to help build an interdisciplinary network of contacts in this field and identify priority areas for further investigation.

Findings from the workshop were presented in a paper at AGI2009, and in a report which is available on the Ordnance Survey’s website.  These confirmed there is indeed a clear need to focus on usability of information, as well as on interfaces used to access information.  Rationale centred on the fact that current research and established methodologies in the field of product usability focus on objects such as devices, and on computer interfaces, with not so much focus on usability of data products such as digital geographic information.

The March 2010 workshop

As with the 2009 workshop, this one day workshop aims to bring together people researching usability of data/information across different disciplines, including Human Factors, HCI, Computer Science, Geographic Information Science.

The objective will be to share case studies on theory and/or application of methods for investigating usability of data or information, in particular geographic data/ information.

We hope the workshop will:

  • Identify theoretical frameworks and methodologies, through a range of case studies, for applying usability evaluation to data or information.
  • Help to build further an interdisciplinary network of research contacts in this field
  • Form the basis for a publication

If you would like to participate…

Please send a short position paper (around 1000 words), based on a case study where you have addressed issues of usability of geographic information, to the contact details below by 29th January 2010.

A workshop agenda and venue details will be sent once we have all position papers.

Support for reasonable travel and accommodation costs may be provided – if you may need assistance please contact me (details below).

Jenny Harding, Ordnance Survey Research Phone: +44 (0)23 8079 2052

Workstations or PCs for GIS? The long memory of the Internet

Over the past decade, different people either hailed or criticised the growing inability to forget in the digital age in which we are living. Logging on to Google Dashboard and seeing every single search that I carried out since 2006 is rather scary – especially as there is no guarantee that if I ask to delete my web history, it will be also deleted from Google servers – just anonymising the information which is not much, really. An interesting point of view on the virtue of forgetting in today’s digital world is available in Viktor Mayer-Schonberger’s recent lecture at the RSA .
And then there is all the public information about you that is already on the open web and that is going to be there for as long as the servers and the web continue to be around. While looking for my earliest internet trails, I came across a posting to the usenet group comp.infosystems.gis from 1994. Back then I was working on a large-scale GIS project for the Israel Electric Corporation and, as far as I can recall, I was asked to write a briefing about the direction that we should take regarding the hardware and software platforms that would be used by the client in the roll-out of the system, which was designed for IBM RS/6000 workstations.  The requests that I sent to the list and the discussion are summarised in a posting that is still accessible on Google Groups – so anyone can find it and read it …

In terms of internet memory, it does expose certain aspects that I’m now much more aware about – such as my control of English back then. Glossing over the grammar and spelling mistakes, the analysis makes interesting reading from 15 years perspective.

Firstly, it is interesting to note that the need for high-end computing in terms of operating systems and hardware for GIS remains a relevant issue. See, for example Manifold GIS’s use of 64-bit operating system or the issue of graphic capabilities and the use of General Processing of Graphic Processing Units (GPGPU) in GIS and Remote Sensing packages such as Geomatica. Another indication of the continued need for processing power is illustrated in the description of ‘who might need this?’ for high-end workstations – although in 1994 no one in PC Magazine ever mentioned GIS.

However, for the absolute majority of end-users who are using GIS for basic map making and processing, this is not true anymore and many are using standard desktop or laptop computers quite well. Over the next few years, as more of the processing migrates to the ‘cloud’, the number of GIS users who need high-end machines will continue to decline. In 1994 the expectation was that most users will need a workstation, whereas very soon they will happily use a low-powered netbook.

Secondly, it is interesting to see the changes in data sizes – I note in the text that 1GB data caused us difficulties in backups and the local network (10BASE-T). I recall complaints from the rest of the company, which was running mainframe systems with millions of alpha-numeric operations, when we ran performance tests because of the bandwidth that GIS processing consumed. This aspect of geographical information handling is still challenging, usually not at the local level – even for large-scale processing, the cost of storage is so low that it’s not a problem. However, for the people who manage the backbone of large-scale applications, say Yahoo! Maps, this is still an issue – I assume that video, images and maps are now major consumers of bandwidth and disk storage that require special handling and planning.

Thirdly, there is a lesson about ‘disruptive technologies. The PC was one such disruptive technology and, even over a decade after their introduction, PCs were not comparable to workstations in terms of memory, processing, multitasking and networking. The advantage of workstations was clear in 1994. Even as late as 1999, when we ran the Town Centres project on Sun workstations, there was still an advantage, but it was disappearing rapidly. Today, UNIX workstations occupy a very small niche.

This is an issue when we think forward to the way GIS will look in 2015 (as the AGI Foresight study is doing) or 2020. Some of the disruptions to the way GIS operated for many years are gathering pace, such as the move to software and data as services where organisations will receive the two bundled from a provider, or using more crowd sourced information.

So sometimes it is useful to come across old writing – it makes you re-evaluate the present and consider the future. At the same time, it is only because I forgot about the post that it was interesting to come across it – so Victor Mayer-Schonberger is correct that there is a virtue in forgetting.