The Digital Divide of OpenStreetMap

In my previous analysis of OpenStreetMap (OSM) data, I compared it to the Index of Deprivation, as a way to understand if there is any socio-economic spatial pattern in the coverage of OSM. Following numerous interactions with various parts of the OSM community, I had suspected that there might be a bias, with the result that affluent areas might be mapped more completely than deprived areas. I explored this systematically, as only empirical analysis could provide evidence one way or another.

OSM completeness coverage compared to Index of Deprivation 2007

Here are the details of the analytical process that was used.

The core data that was used for the comparison is the UK government’s Index of Multiple Deprivation 2007 (IMD 2007) which is calculated from a combination of governmental datasets and provides a score for each Lower Level Super Output Area (LSOA) in England. The position of each LSOA was used to calculate the percentile position within the IMD 2007. Each percentile point includes about 325 LSOAs. Areas that are in the bottom percentile are the most deprived, while those at the 99th percentile are the most affluent places in the UK according to the index.

Following the same methodology that was used to evaluate completeness, the road datasets from OSM and from the Ordnance Survey’s Meridian 2 were clipped to each of the LSOAs, and then the total length of the two datasets was compared. Because the size of LSOAs varies, it is more meaningful to compare percentage completeness and not the absolute length.

The analysis of data from March 2008 showed a clear difference between the LSOAs at the bottom of the scale and those at the top. While the LOSAs at the bottom were not neglected, the level of coverage was far lower, even when taking into account the variability in LSOA areas. I wanted to explore whether the situation has changed since then and undertook further analysis using the same methodology.

Has the situation changed during the 19 months from March 2008 to October 2009?

The graph above shows that things have changed, but not for the better. The graph shows the level of completeness for each group of LSOAs. To avoid confusion with rural areas, where the size of the LSOA becomes very large, only LSOAs that are within a standard deviation of area size are included. The effect of this is that the graph shows the results for mostly urban LSOAs.

I compared 3 datasets: March 2008, March 2009 and October 2009. A rather alarming trend is visible. Instead of shrinking, the gap between affluent and deprived LSOAs is growing. The average completeness of the bottom percentile in March 2008 was 40.7%, grew to 65.7% a year later and to 71.8% by October 2008. For the most affluent percentile, completeness grew from 67.5% in March 2008 to 97.0% a year later and to 108.9% by October 2009. In other words, the gap between the top and the bottom has grown from 26.6% to 37.1% within the analysis period.

Within the OpenStreetMap community, there are activities such as those led by Mikel Maron to map informal settlements in Kenya and to ensure coverage of other marginalised parts of the world (see the posts on his blog). From the work that we are doing in Mapping for Change, it is clear to me that mapping can be an excellent motivator to encourage people to use digital tools, and therefore adding data to OSM can work as a way increase digital inclusion. So maybe OSM coverage can be increased in the UK with some governmental support, which has stated an aim of increasing digital inclusion?

If you would like to explore the data by yourself, here is a spreadsheet with the information, including the LSOA codes, the position in IMD 2004 and IMD 2007, and the coverage percentage for March 2008, March 2009 and October 2009. Please note the terms and conditions for its use – and let me know what you have done with it!


Indices of Deprivation 2007

Early in December, the new version of the Indices of Deprivation (also known as the Index of Multiple Deprivation or IMD) was released. The first IMD was published in 2000, with a new version in 2004 which has now been updated. Created by Oxford University’s Social Disadvantage Research Centre, the indices classify each Lower-Layer Super Output Area (LSOA) in England according to the level of deprivation in multiple domains. An LSOA is an areal unit that contains on average 1500 people – a neighbourhood unit more or less.

As this is a data set widely used in many of my research projects, it was useful to analyse it and see how it changes in comparison to the previous version. There are some surprises, and, if the indices are really reflecting the changes in neighbourhood, the implication is that it is difficult to escape deprivation at the bottom of the ladder.

The IMD is very useful and has significant political implications. There are hundreds of academic articles that are based on applications of the IMD, and far more significant is the role that they play in allocating resources to local authorities through various governmental programmes such as Sure Start, which assists children in their early years, or Decent Homes, which improves the quality of the social housing stock. Of special importance are the points of 20% and 10% deprivation, as they are used widely in policy decisions. We use the IMD in the research with UnLtd to evaluate the location of projects and awardees, and in the Environmental Inequalities project with London 21 to show communities where they are positioned in the national scale.

After 7 years of use and acceptance at all levels of government in the UK (there are separate indices for Wales, Scotland and Northern Ireland), the creation of the new indices must have been a challenging task – a lot is at stake if a specific area moves up or down. The IMD is a league table of sorts, placing each of the LSOAs (and there are around 32,500 of them) in a position relative to others. For each LSOA that is declared as deprived, another one will move up the scale and out of the bottom 20%, which usually means fewer resources for the community. Therefore, it is interesting to analyse the changes in the 2007 edition in comparison to the 2004 one.

Although the Department of Communities and Local Government staes that:

“The Index scores from 2004 cannot be compared with those from 2007. Though the two Indices are very similar, it is not valid to compare the scores between the two time points. An area’s score is affected by the scores of every other area; so it is impossible to tell whether a change in score is a real change in the level of deprivation in an area or whether it is due to the scores of other areas going up or down.”(see this document)

While this is true for each area, it is still valid to check what is the overall pattern of movement across the whole data set. To do that, each LSOA was coded with the percentile point in the IMD 2007 to which it belongs (in each percentile point there are about 325 LSOAs) and compared to the percentile position in 2004. The gap represents the relative change in the position of the LSOA – positive change means that it is now less deprived, while a negative change means that the place is now more deprived compared to 2004.

Within the span of 3 years and due to the differences in the calculation method, it is expected that specific LSOAs will shift their place – especially when the investment that was put into them is taken into account. For the sake of the discussion, let’s assume that 5% change is not too big – although it can be significant if your LSOA belonged to the 17 percentile in 2004 and now belongs to the 22 percentile. Thus, it is worth exploring where the LSOAs that moved more than 5 percentage points are. In IMD 2007, over 25% of LSOAs have shifted more than 5 percentage points and some LSOAs have moved over 20 percentage points.

The distribution of the LSOAs that moved is shown in the chart below. Notice that, although this might look like normal distribution, actually the number of changes at the lowest percentages is not equivalent to the changes at the top of the range. It might be caused by the fact that the indices are especially designed to locate deprived areas and therefore located them accurately in 2004 and the situation haven’t shifted in 2007. The problem with this is it means that, in the periods of 2001-2 (on which IMD 2004 is based) and 2004-5 (on which IMD 2007 is based), not too many places were shifted out of deprivation, while the rest of the places happily shifted about. Is it possible that the IMD team was especially careful not to bump communities that were already included in the bottom 20%?

IMD 2007 Significant Change by Percentile

Another way to look at the data is of course through mapping. The following map represents the LSOAs that experienced significant change of over 5%. You can download an A2 size PDF in which it is possible to zoom to a specific area to see the changes.

IMD 2007 Significant Change - Map

While most of the changes are not in the most deprived areas, it is fascinating to see the geographical pattern of change. For example, by zooming in to London, it is easy to see that Barnet, Brent and Harrow are some of the local authorities with the biggest change downward, while Camden and Westminster have seen significant change upward. As many of the changes are in the middle range, will they have policy implications?

A final point about this analysis is that it was fairly easy run: the analysis was done in 4-5 hours, using an ageing laptop (a 4 years old IBM X31), Excel 2007 and Manifold GIS 8.0. While the cartography can be improved, the ability of modern GIS to do this type of work so quickly helps in focusing on the task, and not spending the time waiting for the GIS to process data…