10 July, 2010
The slides below are from my presentation in State of the Map 2010 in Girona, Spain. While the conference is about OpenStreetMap, the presentation covers a range of spatially implicint and explicit crowdsourcing projects and also activities that we carried out in Mapping for Change, which all show that unlike other crowdsourcing activities, geography (and places) are both limiting and motivating contribution to them.
In many ways, OpenStreetMap is similar to other open source and open knowledge projects, such as Wikipedia. These similarities include the patterns of contribution and the importance of participation inequalities, in which a small group of participants contribute very significantly, while a very large group of occasional participants contribute only occasionally; the general demographic of participants, with strong representation from educated young males; or the temporal patterns of engagements, in which some participants go through a peak of activity and lose interest, while a small group joins and continues to invest its time and effort to help the progress of the project. These aspects have been identified by researchers who explored volunteering and leisure activities, and crowdsourcing as well as those who explored commons-based peer production networks (Benkler & Nissenbaum 2006).
However, OpenStreetMap is a project about geography, and deals with the shape of features and information about places on the face of the Earth. Thus, the emerging question is ‘what influence does geography have on OSM?’ Does geography make some fundamental changes to the basic principles of crowdsourcing, or should OSM be treated as ‘wikipedia for maps’?
In the presentation, which is based on my work, as well as the work of Vyron Antoniou and Nama Budhathoki, we argue that geography is playing a ‘tyrannical’ role in OSM and other projects that are based on crowdsourced geographical information and shapes the nature of the project beyond what is usually accepted.
The first influence of geography is on motivation. A survey of OSM participants shows that specific geographical knowledge, which a participant acquired at first hand, and the wish to use this knowledge and see it mapped well is an important factor in participation in the project. We found that participants are driven to mapping activities by their desire to represent the places they care about and fix the errors on the map. Both of these motives require local knowledge.
A second influence is on the accuracy and completeness of coverage, with places that are highly populated, and therefore have a larger pool of potential participants, showing better coverage than suburban areas of well-mapped cities. Furthermore, there is an ongoing discussion within the OSM community about the value of mapping without local knowledge and the impact of such action on the willingness of potential contributors to fix errors and contribute to the map.
A third, and somewhat surprising, influence is the impact of mapping places that the participants haven’t or can’t visit, such as Haiti after the earthquake or Baghdad in 2007. Despite the willingness of participants to join in and help in the data collection process, the details that can be captured without being on the ground are fairly limited, even when multiple sources such as Flickr images, Google Street View and paper maps are used. The details are limited to what was captured at a certain point in time and to the limitations of the sensing device, so the mapping is, by necessity, incomplete.
We will demonstrate these and other aspects of what we termed ‘the tyranny of place’ and its impact on what can be covered by OSM without much effort and which locations will not be covered without a concentrated effort that requires some planning.
One of the surprises of the Ordnance Survey OpenData release at the beginning of April was the inclusion of the Code-Point Open dataset, which lists the location of all postcodes in England, Wales and Scotland. This was clearly a very important dataset because of the way postcode geography drives many services and activities in the UK. Before the release, the costs of using postcodes in geographical analysis were prohibitive for many small organisations.
So how usable is this free Code-Point data? The principle of ‘do not look a gift horse in the mouth’ doesn’t apply here. The whole point of releasing the data is to make it as useful as possible to encourage innovation, so it should be made available in a way that makes it easy to reuse. I evaluated it while analysing a dataset of 11,000 volunteers’ postcodes that I received from a third sector organisation.
The download process is excellent and easy, apart from the fact that there is no clear and short description of the products in a non-technical manner next to each product. To find a description, you need to go to the product page – so you are at least 2 clicks away from the product details. It would be better to have a link from each product and include a brief description in the download page. We will see in a second why this is important…
The next step was the download itself and the opening of the zip file, which was clear and easy. There is an oddity with all Ordnance Survey data that they have a redundant sub-directory in them – so in this case the data resides under \codepo_gb\Code-Point Open\ . The fact that the files is broken up into postcode area instead of one big file of 157MB is fine, but it can be helpful to remind users that they can concatenate files using simple commands – this is especially necessary to less tech-savvy users. So an explanation for Windows users that you can open the Command window using ‘cmd.exe’ and run ‘type a.csv b.csv > common.csv’ can save some people plenty of time.
But the real unpleasant surprise was that nowhere in the downloaded package is there a description of the fields in the files! So you open the files and need to figure out what the fields are. The user manual is hides 4 clicks away from the download page and luckily I knew that the ‘user manual’ is stored under ‘technical information’ on the product page, which is not that obvious at first visit. Why not deliver the user manual with the product ?!? The Doc directory is an obvious place to store it.
The user manual reveals that there are 19 fields in the file, of which 9 (half!) are ‘not available in Code-Point Open’ – so why are they delivered? After figuring out the fields, I created a single line that can be attached to the files before importing them to a GIS:
Postcode,Positional Quality,PR Delete,TP Delete,DQ Delete,RP Delete,BP Delete,PD Delete,MP Delete,UM Delete,Easting,Northing,Country,Regional Health Authority,Health Authority,County,District,Ward,LS Delete.
Of course, all the fields with ‘Delete’ in the name mean that they should be deleted once imported.
Interestingly, once you delete these fields, the total size of Code-Point Open drops from 157MB to 91MB – which means that it can save the Ordnance Survey bandwidth and carbon emissions by making the file smaller.
Another interesting point is that the user manual includes detailed instructions on how to change the postcode to a ‘single spaced postcode’. The instructions are for Excel, Mapinfo and ArcGIS. This is the type of information that can help end-users start using the data faster. Finally, you can use this wonderful information to create lovely maps.
All these problems are minor, apart from the description of the fields which is a major usability error. Similar analysis can be carried out for any of the Ordnance Survey datasets, to ensure that they are useful to their users. There are some easy improvements, such as including the user manual with the distribution, and I’m sure that, over time, the team at the Ordnance Survey will find the time to sort these issues.