Following successful funding for the European Union FP7 EveryAware and the EPSRC Extreme Citizen Science activities, the department of Civil, Environmental and Geomatic Engineering at UCL is inviting applications for a postdoctoral position and 3 PhD studentships. Please note that these positions are open to students from any EU country.
These positions are in the ‘Extreme Citizen Science’ (ExCiteS) research group. The group’s activities focus on the theory, methodologies, techniques and tools that are needed to allow any community to start its own bottom-up citizen science activity, regardless of the level of literacy of the users. Importantly, Citizen Science is understood in the widest sense, including perceptions and views – so participatory mapping and participatory geographic information are integral parts of the activities.
The research themes that the group explores include Citizen Science and Citizen Cyberscience; Community and participatory mapping/GIS; Volunteered Geographic Information (OpenStreetMap, Green Mapping, Participatory GeoWeb); Usability of geographic information and geographic information technology, especially with non-expert users; GeoWeb and mobile GeoWeb technologies that facilitate Extreme Citizen Science; and identifying scientific models and visualisations that are suitable for Citizen Science.
Research Associate in Extreme Citizen Science – a 2-year, postdoctoral research associate position commencing 1 May 2011.
The research associate will lead the development of an ‘Intelligent Map’ that allows non-literate users to upload data securely; and the system should allow the users to visualise their information with data from other users. Permissions need to be developed in accordance with cultural sensitivities. As uploaded data from multiple users sharing the same system increase over time, repeating patterns will begin to emerge that indicate particular environmental trends.
The role will also include some general project-management duties, guiding the PhD students who are working on the project. Travel to Cameroon to the forest communities that we are working with is necessary.
Complete details about this post and application procedure are available on the UCL jobs website.
PhD Studentship – understanding citizen scientists’ motivations, incentives and group organisation – a 3.5-year fully funded studentship. We are looking for applicants with a good honours degree (1st Class or 2:1 minimum), and an MA or MSc in anthropology, geography, sociology, psychology or related discipline. The applicant needs to be familiar with quantitative and qualitative research methods, and be able to work with a team that will include programmers and human-computer interaction experts who will design systems to be used in citizen science projects. Travel will be required as part of the project. A willingness to live for short periods in remote forest locations in simple lodgings, eating local food, will be necessary. French language skills are desirable.
The research itself will focus on motivations, incentives and understanding of the needs and wishes of participants in citizen science projects. We will specifically focus on engagement of non-literate people in such projects and need to understand how the process – from data collection to analysis – can be made meaningful and useful for their everyday life. The research will involve using quantitative methods to analyse large-scale patterns of engagement in existing projects, as well as ethnographic and qualitative study of participants. The project will include working with non-literate forest communities in Cameroon as well as marginalised communities in London.
Complete details about this post and application procedure are available on the UCL jobs website.
PhD Studentship in geographic visualisation for non-literate citizen scientists - a 3.5-year fully funded studentship. The applicant should possess a good honours degree (1st Class or 2:1 minimum), and an MSc in computer science, human-computer interaction, electronic engineering or related discipline. In addition, they need to be familiar with geographic information and software development, and be able to work with a team that will include anthropologists and human-computer interaction experts who will design systems to be used in citizen science projects. Travel will be required as part of the project. A willingness to live for short periods in remote forest locations in simple lodgings, eating local food, will be necessary. French language skills are desirable.
Complete details about this post and application procedure are available on the UCL jobs website.
In addition, we offer a PhD Studentship on How interaction design and mobile mapping influences participation in Citizen Science, which is part of the EveryAware project and is also open to any EU citizen.
How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information
10 January, 2011
The paper “How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to “ has appeared in The Cartographic Journal. The proper citation for the paper is:
Haklay, M and Basiouka, S and Antoniou, V and Ather, A (2010) How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information. The Cartographic Journal , 47 (4) , 315 – 322.
The abstract of the paper is as follows:
In the area of volunteered geographical information (VGI), the issue of spatial data quality is a clear challenge. The data that are contributed to VGI projects do not comply with standard spatial data quality assurance procedures, and the contributors operate without central coordination and strict data collection frameworks. However, similar to the area of open source software development, it is suggested that the data hold an intrinsic quality assurance measure through the analysis of the number of contributors who have worked on a given spatial unit. The assumption that as the number of contributors increases so does the quality is known as `Linus’ Law’ within the open source community. This paper describes three studies that were carried out to evaluate this hypothesis for VGI using the OpenStreetMap dataset, showing that this rule indeed applies in the case of positional accuracy.
To access the paper on the journal’s website, you can follow the link: 10.1179/000870410X12911304958827. However, if you don’t hold a subscription to the journal, a postprint of the paper is available at the UCL Discovery repository. If you would like to get hold of the printed version, email me.
29 November, 2010
The website GPS Business News published an interview with me in which I covered several aspects of OpenStreetMap and crowdsourced geographical information, including aspects of spatial data quality, patterns of data collection, inequality in coverage and the implications of these patterns to the wider area of Volunteered geographical Information.
The interview is available here .
21 October, 2010
One issue that remained open in the studies on the relevance of Linus’ Law for OpenStreetMap was that the previous studies looked at areas with more than 5 contributors, and the link between the number of users and the quality was not conclusive – although the quality was above 70% for this number of contributors and above it.
Now, as part of writing up the GISRUK 2010 paper for journal publication, we had an opportunity to fill this gap, to some extent. Vyron Antoniou has developed a method to evaluate the positional accuracy on a larger scale than we have done so far. The methodology uses the geometric position of the Ordnance Survey (OS) Meridian 2 road intersections to evaluate positional accuracy. Although Meridian 2 is created by applying a 20-metre generalisation filter to the centrelines of the OS Roads Database, this generalisation process does not affect the positional accuracy of node points and thus their accuracy is the best available. An algorithm was developed for the identification of the correct nodes between the Meridian 2 and OSM, and the average positional error was calculated for each square kilometre in England. With this data, which provides an estimated positional accuracy for an area of over 43,000 square kilometres, it was possible to estimate the contribution that additional users make to the quality of the data.
As can be seen in the chart below, positional accuracy remains fairly level when the number of users is 13 or more – as we have seen in previous studies. On the other hand, up to 13 users, each additional contributor considerably improves the dataset’s quality. In grey you can see the maximum and minimum values, so the area represents the possible range of positional accuracy results. Interestingly, as the number of users increases, positional accuracy seems to settle close to 5m, which is somewhat expected when considering the source of the information – GPS receivers and aerial imagery. However, this is an aspect of the analysis that clearly requires further testing of the algorithm and the datasets.
It is encouraging to see that the results of the analysis are significantly correlated. For the full dataset the correlation is weak (-0.143) but significant at the 0.01 level (2-tailed). However, the average values for each number of contributors (blue line in the graph), the correlation is strong (-0.844) and significant at the 0.01 level (2-talled).
An important caveat is that the number of tiles with more than 10 contributors is fairly small, so that is another aspect that requires further exploration. Moreover, spatial data quality is not just positional accuracy, but also attribute accuracy, completeness, update and other properties. We can expect that they will also exhibit similar behaviour to positional accuracy, but this requires further studies – as always.
However, as this is a large-scale analysis that adds to the evidence from the small-scale studies, it is becoming highly likely that Linus’ Law is affecting the quality of OSM data and possibly of other so-called Volunteered Geographical Information (VGI) sources and there is a decreased gain in terms of positional accuracy when the number of contributors passes about 10 or so.
5 October, 2010
The London Citizen Cyberscience Summit in early September was a stimulating event, which brought together a group of people with an interest in this area. A report from the event, with a very good description of the presentations, including a reflection piece, is available on the ‘Strange Attractor’ blog.
During the summit, I discussed the aspects of ‘Extreme’ Citizen Science, where we move from usual science to participatory research. The presentation was partly based on a paper that I wrote and that I presented during the workshop on the value of Volunteered Geographical Information in advancing science, which was run as part of the GIScience 2010 conference towards the middle of September. Details about the workshop are available on the workshop’s website including a set of interesting position papers.
The presentation below covers the topics that I discussed in both workshops. Here, I provide a brief synopsis for the presentation, as it is somewhat different from the paper.
In the talk, I started by highlighting that by using different terminologies we can notice different facets of the practice of crowd data collection (VGI within the GIScience community, crowdsourcing, participatory mapping …).
The first way in which we can understand this information is in the context of Web 2.0 applications. These applications can be non-spatial (such as Wikipedia or Twitter), or implicitly spatial (such as Flickr – you need to be in a location before you can capture a photograph), or explicitly spatial , in applications that are about collecting geographical information – for example OpenStreetMap. When looking at VGI from the perspective of Web 2.0 it’s possible to identify the specific reasons that it emerged and how other similar applications influence its structure and practices.
The second way to view this information is as part of geographical information produced by companies who need mapping information (such as Google or TomTom). In this case, you notice that it’s about reducing the costs of labour and the need for active or passive involvement of the person who carries out the mapping.
The third, and arguably new way to view VGI is as part of Citizen Science. These activities have been going for a long time in ornithology and in meteorology. However, there are new forms of Citizen Science that rely on ICT – such as movement-activated cameras (slide 11 on the left) that are left near animal trails and are operated by volunteers, or a network of accelerometers that form a global earthquake monitoring network. Not all Citizen Science is spatial, and there are very effective examples, especially in the area of Citizen Cyberscience. So in this framing of VGI we can pay special attention to the collection of scientific information. Importantly, as in the case of spatial application, some volunteers become experts, such as Hanny van Arkel who has discovered a type of galaxy in Galaxy Zoo.
Slides 16-17 show the distribution of crowdsourced images, and emphasise the spatial distribution of information near population centres and tourist attractions. Slides 19-25 show the analysis of the data that was collected by OpenStreetMap volunteers and highlight bias towards highly populated and affluent areas.
Citizen Science is not just about the data collections. There are also cultural problems regarding the trustworthiness of the data, but slides 28-30 show that the data is self-improving as more volunteers engage in the process (in this case, mapping in OpenStreetMap). On that basis, I do question the assumption about trustworthiness of volunteers and the need to change the way we think about projects. There are emerging examples of such Citizen Science where the engagement of participants is at a higher level. For example, noise mapping activities that a community near London City Airport carried out (slides 34-39) which shows that people can engage in science and are well placed when there are opportunities, such as the ash cloud in April 2010, to collect ‘background’ noise. This is not possible without the help of communities.
Finally, slides 40 and 41 demonstrate that it is possible to engage non-literate users in environmental data collection.
So in summary, a limitless Citizen Science is possible – we need to create the tool for it and understand how to run such projects, as well study them.
Completeness in volunteered geographical information – the evolution of OpenStreetMap coverage (2008-2009)
13 August, 2010
The Journal of Spatial Information Science (JOSIS) is a new open access journal in GIScience, edited by Matt Duckham, Jörg-Rüdiger Sack, and Michael Worboys. In addition, the journal adopted an open peer review process, so readers are invited to comment on a paper while it goes through the formal peer review process. So this seem to be the most natural outlet for a new paper that analyses the completeness of OpenStreetMap over 18 months – March 2008 to October 2009. The paper was written in collaboration with Claire Ellul. The abstract of the paper provided below, and you are very welcome to comment on the paper on JOSIS forum that is dedicated to it, where you can also download it.
Abstract: The ability of lay people to collect and share geographical information has increased markedly over the past 5 years as results of the maturation of web and location technologies. This ability has led to a rapid growth in Volunteered Geographical Information (VGI) applications. One of the leading examples of this phenomenon is the OpenStreetMap project, which started in the summer of 2004 in London, England. This paper reports on the development of the project over the period March 2008 to October 2009 by focusing on the completeness of coverage in England. The methodology that is used to evaluate the completeness is comparison of the OpenStreetMap dataset to the Ordnance Survey dataset Meridian 2. The analysis evaluates the coverage in terms of physical coverage (how much area is covered), followed by estimation of the percentage of England population which is covered by completed OpenStreetMap data and finally by using the Index of Deprivation 2007 to gauge socio-economic aspects of OpenStreetMap activity. The analysis shows that within 5 years of project initiation, OpenStreetMap already covers 65% of the area of England, although when details such as street names are taken into consideration, the coverage is closer to 25%. Significantly, this 25% of England’s area covers 45% of its population. There is also a clear bias in data collection practices – more affluent areas and urban locations are better covered than deprived or rural locations. The implications of these outcomes to studies of volunteered geographical information are discussed towards the end of the paper.
3 August, 2010
The process of academic publication takes a long time, so only now my paper from 2008 is finally in print.
So the paper, which should be cited as:
“Haklay, M., 2010, How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets” Environment and Planning B: Planning and Design 37(4) 682 – 703″
It’s abstract is:
Within the framework of Web 2.0 mapping applications, the most striking example of a geographical application is the OpenStreetMap (OSM) project. OSM aims to create a free digital map of the world and is implemented through the engagement of participants in a mode similar to software development in Open Source projects. The information is collected by many participants, collated on a central database, and distributed in multiple digital formats through the World Wide Web. This type of information was termed ‘Volunteered Geographical Information’ (VGI) by Goodchild, 2007. However, to date there has been no systematic analysis of the quality of VGI. This study aims to fill this gap by analysing OSM information. The examination focuses on analysis of its quality through a comparison with Ordnance Survey (OS) datasets. The analysis focuses on London and England, since OSM started in London in August 2004 and therefore the study of these geographies provides the best understanding of the achievements and difficulties of VGI. The analysis shows that OSM information can be fairly accurate: on average within about 6 m of the position recorded by the OS, and with approximately 80% overlap of motorway objects between the two datasets. In the space of four years, OSM has captured about 29% of the area of England, of which approximately 24% are digitised lines without a complete set of attributes. The paper concludes with a discussion of the implications of the findings to the study of VGI as well as suggesting future research directions.
10 July, 2010
The slides below are from my presentation in State of the Map 2010 in Girona, Spain. While the conference is about OpenStreetMap, the presentation covers a range of spatially implicint and explicit crowdsourcing projects and also activities that we carried out in Mapping for Change, which all show that unlike other crowdsourcing activities, geography (and places) are both limiting and motivating contribution to them.
In many ways, OpenStreetMap is similar to other open source and open knowledge projects, such as Wikipedia. These similarities include the patterns of contribution and the importance of participation inequalities, in which a small group of participants contribute very significantly, while a very large group of occasional participants contribute only occasionally; the general demographic of participants, with strong representation from educated young males; or the temporal patterns of engagements, in which some participants go through a peak of activity and lose interest, while a small group joins and continues to invest its time and effort to help the progress of the project. These aspects have been identified by researchers who explored volunteering and leisure activities, and crowdsourcing as well as those who explored commons-based peer production networks (Benkler & Nissenbaum 2006).
However, OpenStreetMap is a project about geography, and deals with the shape of features and information about places on the face of the Earth. Thus, the emerging question is ‘what influence does geography have on OSM?’ Does geography make some fundamental changes to the basic principles of crowdsourcing, or should OSM be treated as ‘wikipedia for maps’?
In the presentation, which is based on my work, as well as the work of Vyron Antoniou and Nama Budhathoki, we argue that geography is playing a ‘tyrannical’ role in OSM and other projects that are based on crowdsourced geographical information and shapes the nature of the project beyond what is usually accepted.
The first influence of geography is on motivation. A survey of OSM participants shows that specific geographical knowledge, which a participant acquired at first hand, and the wish to use this knowledge and see it mapped well is an important factor in participation in the project. We found that participants are driven to mapping activities by their desire to represent the places they care about and fix the errors on the map. Both of these motives require local knowledge.
A second influence is on the accuracy and completeness of coverage, with places that are highly populated, and therefore have a larger pool of potential participants, showing better coverage than suburban areas of well-mapped cities. Furthermore, there is an ongoing discussion within the OSM community about the value of mapping without local knowledge and the impact of such action on the willingness of potential contributors to fix errors and contribute to the map.
A third, and somewhat surprising, influence is the impact of mapping places that the participants haven’t or can’t visit, such as Haiti after the earthquake or Baghdad in 2007. Despite the willingness of participants to join in and help in the data collection process, the details that can be captured without being on the ground are fairly limited, even when multiple sources such as Flickr images, Google Street View and paper maps are used. The details are limited to what was captured at a certain point in time and to the limitations of the sensing device, so the mapping is, by necessity, incomplete.
We will demonstrate these and other aspects of what we termed ‘the tyranny of place’ and its impact on what can be covered by OSM without much effort and which locations will not be covered without a concentrated effort that requires some planning.
4 April, 2010
The opening of Ordnance Survey datasets at the beginning of April 2010 is bound to fundamentally change the way OpenStreetMap (OSM) information is produced in the UK. So just before this major change start to influence OpenStreetMap, it is worth evaluating what has been achieved so far without this data. It is also the time to update the completeness study, as the previous ones were conducted with data from March 2008 and March 2009.
Following the same method that was used in all the previous studies (which is described in details here), the latest version of Meridian 2 from OS OpenData was downloaded and used and compared to OSM data which was downloaded from GeoFabrik. The processing is now streamlined with MapBasic scripts, PostGIS scripts and final processing in Manifold GIS so it is possible to complete the analysis within 2 days. The colour scheme for the map is based on Cynthia Brewer and Mark Harrower‘s ColorBrewer 2.
By the end of March 2010, OpenStreetMap coverage of England grown to 69.8% from 51.2% a year ago. When attribute information is taken into account, the coverage grown to 24.3% from 14.7% a year ago. The chart on the left shows how the coverage progressed over the past 2 years, using the 4 data points that were used for analysis – March 2008, March 2009, October 2009 and March 2010. Notice that in terms of capturing the geometry less than 5% are now significantly under mapped when compared to Meridian 2. Another interesting aspect is the decline in empty cells – that is grid cells that don’t have any feature in Meridian 2 but now have features from OSM appearing in them. So in terms of capturing road information for England, it seems like the goal of capturing the whole country with volunteer effort was within reach, even without the release of Ordnance Survey data.
On the other hand, when attributes are included in the analysis, the picture is very different.
The progression of coverage is far from complete, and although the area that is empty of features that include street or road name in Meridian 2 is much larger, the progress of OSM mappers in completing the information is much slower. While the geometry coverage gone up by 18.6% over the past year, less than 10% (9.6% to be precise) were covered when attributes are taken into account. The reason for this is likely to be the need to carry a ground survey to find the street name without using other copyrighted sources.
The attribute area is the one that I would expect will show the benefits of Ordnance Survey data release to OSM mapping. Products such as StreetView and VectorMap District can be used to either copy the street name (StreetView) or write an algorithm that will copy the street name and other attributes from a vector data set – such as Meridian 2 or VectorMap District.
Of course, this is a failure of the ‘crowd’ in the sense that as this bit of information previously required an actual visit on the ground and it was a more challenging task than finding the people who are happy to volunteer their time to digitise maps.
As in the previous cases, there are local variations, and the geography of the coverage is interesting. The information includes 4 time points, so the most appropriate visualisation is one that allows for comparison and transition between maps. Below is a presentation (you can download it from SlideShare) that provides maps for the whole of England as well as 5 regional maps, roughly covering the South West, London, Birmingham and the Midlands, Manchester and Liverpool, and Newcastle upon Tyne and the North West.
If you want to create your own visualisation, of use the results of this study, you can download the results in a shapefile format from here.
For a very nice visualisation of Meridian 2 and OpenStreetMap data – see Ollie O’Brien SupraGeography blog .
Usability of VGI in Haiti earthquake response and the 2nd workshop on usability of geographic information
27 March, 2010
On the 23rd March 2010, UCL hosted the second workshop on usability of geographic information, organised by Jenny Harding (Ordnance Survey Research), Sarah Sharples (Nottingham), and myself. This workshop was extending the range of topics that we have covered in the first one, on which we have reported during the AGI conference last year. This time, we had about 20 participants and it was an excellent day, covering a wide range of topics – from a presentation by Martin Maguire (Loughborough) on the visualisation and communication of Climate Change data, to Johannes Schlüter (Münster) discussion on the use of XO computers with schoolchildren, to a talk by Richard Treves (Southampton) on the impact of Google Earth tours on learning. Especially interesting are the combination of sound and other senses in the work on Nick Bearman (UEA) and Paul Kelly (Queens University, Belfast).
Jenny’s introduction highlighted the different aspects of GI usability, from those that are specific to data to issues with application interfaces. The integration of data with software that creates the user experience in GIS was discussed throughout the day, and it is one of the reasons that the issue of the usability of the information itself is important in this field. The Ordnance Survey is currently running a project to explore how they can integrate usability into the design of their products – Michael Brown’s presentation discusses the development of a survey as part of this project. The integration of data and application was also central to Philip Robinson (GE Energy) presentation on the use of GI by utility field workers.
My presentation focused on some preliminary thoughts that are based on the analysis of OpenStreetMap and Google Map communities response to the earthquake in Haiti at the beginning of 2010. The presentation discussed a set of issues that, if explored, will provide insights that are relevant beyond the specific case and that can illuminate issues that are relevant to daily production and use of geographic information. For example, the very basic metadata that was provided on portals such as GeoCommons and what users can do to evaluate fitness for use of a specific data set (See also Barbara Poore’s (USGS) discussion on the metadata crisis).
Interestingly, the day after giving this presentation I had a chance to discuss GI usability with Map Action volunteers who gave a presentation in GEO-10 . Their presentation filled in some gaps, but also reinforced the value of researching GI usability for emergency situations.