The Digital Divide of OpenStreetMap
28 December, 2009
In my previous analysis of OpenStreetMap (OSM) data, I compared it to the Index of Deprivation, as a way to understand if there is any socio-economic spatial pattern in the coverage of OSM. Following numerous interactions with various parts of the OSM community, I had suspected that there might be a bias, with the result that affluent areas might be mapped more completely than deprived areas. I explored this systematically, as only empirical analysis could provide evidence one way or another.
Here are the details of the analytical process that was used.
The core data that was used for the comparison is the UK government’s Index of Multiple Deprivation 2007 (IMD 2007) which is calculated from a combination of governmental datasets and provides a score for each Lower Level Super Output Area (LSOA) in England. The position of each LSOA was used to calculate the percentile position within the IMD 2007. Each percentile point includes about 325 LSOAs. Areas that are in the bottom percentile are the most deprived, while those at the 99th percentile are the most affluent places in the UK according to the index.
Following the same methodology that was used to evaluate completeness, the road datasets from OSM and from the Ordnance Survey’s Meridian 2 were clipped to each of the LSOAs, and then the total length of the two datasets was compared. Because the size of LSOAs varies, it is more meaningful to compare percentage completeness and not the absolute length.
The analysis of data from March 2008 showed a clear difference between the LSOAs at the bottom of the scale and those at the top. While the LOSAs at the bottom were not neglected, the level of coverage was far lower, even when taking into account the variability in LSOA areas. I wanted to explore whether the situation has changed since then and undertook further analysis using the same methodology.
Has the situation changed during the 19 months from March 2008 to October 2009?
The graph above shows that things have changed, but not for the better. The graph shows the level of completeness for each group of LSOAs. To avoid confusion with rural areas, where the size of the LSOA becomes very large, only LSOAs that are within a standard deviation of area size are included. The effect of this is that the graph shows the results for mostly urban LSOAs.
I compared 3 datasets: March 2008, March 2009 and October 2009. A rather alarming trend is visible. Instead of shrinking, the gap between affluent and deprived LSOAs is growing. The average completeness of the bottom percentile in March 2008 was 40.7%, grew to 65.7% a year later and to 71.8% by October 2008. For the most affluent percentile, completeness grew from 67.5% in March 2008 to 97.0% a year later and to 108.9% by October 2009. In other words, the gap between the top and the bottom has grown from 26.6% to 37.1% within the analysis period.
Within the OpenStreetMap community, there are activities such as those led by Mikel Maron to map informal settlements in Kenya and to ensure coverage of other marginalised parts of the world (see the posts on his blog). From the work that we are doing in Mapping for Change, it is clear to me that mapping can be an excellent motivator to encourage people to use digital tools, and therefore adding data to OSM can work as a way increase digital inclusion. So maybe OSM coverage can be increased in the UK with some governmental support, which has stated an aim of increasing digital inclusion?
If you would like to explore the data by yourself, here is a spreadsheet with the information, including the LSOA codes, the position in IMD 2004 and IMD 2007, and the coverage percentage for March 2008, March 2009 and October 2009. Please note the terms and conditions for its use – and let me know what you have done with it!