COST ENERGIC meeting – Tallinn 21-22 May

TallinnThe COST Energic network is progressing in its 3rd year. The previous post showed one output from the action – a video that describe the links between volunteered geographic information and indigenous knowledge.

The people who came to the meeting represent the variety of interest in crwodsourced geographic information, from people with background in Geography, Urban planning, and many people with interest in computing – from semantic representation of information, cloud computing, data mining and similar issues where VGI represent an ‘interesting’ dataset.

Part of the meeting focused on the next output of the network, which is an Open Access book which is titled ‘European Handbook of Crowdsourced Geographic Information’. The book will be made from short chapters that are going through peer-review by people within the network. The chapters will cover topics such as theoretical and social aspects, quality – criteria and methodologies, data analysis and finally applied research and case studies. We are also creating a combined reference list that will be useful for researchers in the field. There will be about 25 chapters. Different authors gave a quick overview of their topics, with plenty to explore – from Smart Cities to concepts on the nature of information.

COST ‘actions’ (that’s how these projects are called), operate through working groups. In COST Energic, there are 3 working groups, focusing on human and societal issues,  Spatial data Quality and infrastructures, and Data mining, semantics and VGI.

Working Group 1 looked at an example of big data from Alg@line –  22 years of data of ferry data from the Baltic sea – with 17 millions observations a year. Data from  that can be used for visualisation and exploring the properties. Another case study that the working group consider is the engagement of schoolchildren and VGI – with activities in Portugal, Western Finland, and Italy. These activities are integrating citizen science and VGI, and using free and open source software and data. In the coming year, they are planning specific activities in big data and urban planning and crowd atlas on urban biodiversity.

Working Group 2 have been progressing in its activities linking VGI quality with citizen science, and how to produce reliable information from it. The working group collaborate with another COST action (TD1202) which called ‘Mapping and the Citizen Sensor‘. They carried out work on topics of quality of information – and especially with vernacular gazetteers. In their forthcoming activities, they contribute to ISSDQ 2015 (international symposium on spatial data quality) with a set of special sessions. Future work will focus on quality tools and quality visualisation.

Prof. Cristina Capineri opening the meeting
Prof. Cristina Capineri opening the meeting

Working Group 3 also highlighted the ISSDQ 2015 and will have a good presence in the conference. The group aims to plan a hackathon in which people will work on VGI, with a distributed event for people to work with data over time. Another plan is to focus on research around the repository. The data repository from the working group – contains way of getting of data and code. It’s mostly how to get at the data.

There is also a growing repository of bibliography on VGI in CiteULike. The repository is open to other researchers in the area of VGI, and WG3 aim to manage it as a curated resource. 

VGI and indigenous knowledge – COST Energic Video

The COST Energic network has been running now for 3 years, and one of the outputs from the network is the video below, which explore a very valuable form of Volunteered Geographic Information (VGI). This is information that is coming from participatory projects between researchers and indigenous communities, and this short film provide examples from Bolivia, British Columbia, and the Congo Basin, where researchers in the network are working with local communities to collect information about their areas and issues that concern them.

The video was produced by Lou del Bello, and include some stock photos and footage. The images that are marked with titles are from COST Energic Activities. Lou has also created a short video on the work of the Extreme Citizen Science group in her report on Mapping the Congo on SciDev

The video is released just before a meeting of the COST Network, held in Tallinn, and hosted by the Interaction Design Lab of Tallinn University.

Spatial Conversation – #VGIday #COSTEnergic

The COST Energic network (see VGIBox.eu ) is running a 2 day geolocated twitter chat, titled ‘Volunteered Geographic Information Day’ so the hashtag is #VGIDay. The conversation will take place on 14th and 15th May 2015, and we are universalists – join from anywhere in the world!
Joining is easy – and require 3 steps:

  1. Follow the @COST_Energic profile
  2. Enable your phone to disclose your position – this will allow to geocode your tweets.
  3. To participate to the discussion, use at least one of the dedicated hashtags in tweets: #COSTEnergic, #VGIday

What are we trying to do?

Discussions will be started by @COST_Energic. Through this twitter handle, we will share resources, results and ideas about the topic of VGI and geographic crowdsourcing. You can join the discussions, bring your ideas and links, and involve your contacts, and this will spread this event through the Twittersphere (and beyond?).
At the end of the experiment, we will produce a report of the generated discussion for our ENERGIC repository, and the dataset of tweets can be then used by researchers who want to visaulise, analyse and try to do things with it. It might end up as teaching material, or in IronSheep

AAG 2015 notes – day 4 – Citizen Science & OpenStreetMap Studies

The last day of AAG 2015 is about citizen science and OpenStreetMap studies.

The session Beyond motivation? Understanding enthusiasm in citizen science and volunteered geographic information was organised together with Hilary Geoghegan. We were interest to ‘explore and debate current research and practice moving beyond motivation, to consider the associated enthusiasm, materials and meanings of participating in citizen science and VGI.’

As Hilary couldn’t attend the conference, we started the session with a discussion about experiences of enthusiasm – for example, my own experience with IBM World Community Grid.  Jeroen Verplanke raised the addiction in volunteer thinking projects, such as logging in to Zooniverse or Tomnod project, and time fly-by. Mairead de Roiste described mapping wood-pigeon in New Zealand – public got involved because they wanted to help, but when they hear that the data wasn’t use, they might lose interest. Urgency can also be a form influencing participation.

Britta Ricker – University of Washington Tacoma – Look what I can do! Harnessing drone enthusiasm for increased motivation to participate. On-going research. Looking at the Geoweb – it allow people to access information, and made imagery available to the public, and the data is at the whim of whoever give us the data. With drones, we can send them up when we want or need to. Citizen Science is deeply related to geoweb – challenge is to get people involve and make them stay involved. We can harness drone enthusiasm – they evoke negative connotation but also thinking about them for good – humanitarian applications. Evidence for the enthusiasm is provided by YouTube where there are plenty of drone video – 3.44M – lots of action photography: surfing community and GoPro development. People are attached to the drone – jumping to the water to save them. So how the enthusiasm to drones can be harnessed to help participatory mapping. We need to design a workflow around stages: pre-flight, flight, post processing. She partnered with water scientists to explore local issues. There are considerations of costs and popularity – and selected quadcopter for that. DJI Phantom Vision 2+. With drones need to read the manual and plan the flight. There are legal issues of where it is OK to fly, and Esri & MapBox provide information on where you can fly them. Need to think of camera angle – need also to correct fisheye, and then process the images. Stitch imagery can be done manually (MapKnitter/QGIS/ArcGIS). Possible to do it in automated software, but open source (e.g. OpenDroneMap) is not yet good enough in terms of ease of use. Software such as Pix4D is useful but expensive. Working with raster data is difficult, drones require practice, and software/hardware is epensive – not yet ready to everyone. NGOs can start using it. Idea: sharing photos , classifying images together by volunteers.

Brittany Davis – Allegheny College – Motivated to Kill: Lionfish Derbies, Scuba Divers, and Citizen Science. Lionfish are stunning under water – challenging to differentiate between the two sub species but it doesn’t matter if you’re trying to catch them. They are invasive species and are without predators, exploded – especially from 2010. There is a lot of informational campaign and encouraging people to hunt them, especially in dive centres – telling people that it is a way to save a Caribbean reefs. When people transform themselves from ‘benign environmental activity’ to ‘you tell me that I can hunt? cool!’. Lionfish is tasty so having the meat for dinner is a motivation. Then doing ‘lionfish derbies’ – how many can you kill in a day. Seen a lot of enthusiasm for lionfish derbies. Trying to sign up people to where they go but they are not recording where they hunt the lionfish. People go to another site for competition as they want to capture more. REEF trying to encourage a protocol for capturing them, and there are cash prizes for the hunting. They use the catch to encourage people to hunt lionfish. Derbies increase in size – 14832 were removed from 2009 to 2014 and some evidence for the success of the methodology. There was a pressure on ‘safely and humanely capture and euthanase these fish’ – challenge for PADI who run special scuba courses that are linked to conservation. People hear about the hunting and that motivate people to go diving. There is a very specific process of REEF sanctioned lionfish derby, so trying to include recording and public information. But there are challenges below the depth of recreational divers. She also explored if it is possible to improve data collection for scientists.

Cheryl Gilge – University of Washington – The rhetorical flourish of citizen participation (or, the formation of cultural fascism?) offered a theoretical analysis of citizen science and web 2.0 as part of a wider project to understand labour relationships and power. She argues that there is agency to the average citizen to link to their environment. They have the ability to contribute, and to receive information is part of Web 2.0. As a technology layer, it changes both the individual and society levels. The collaboration and participation in Web 2.0 is framed around entrepreneurialism, efficiencies, and innovation. The web is offering many opportunities to help wider projects, where amateur and expert knowledge are both valued. However, there is a risk of reducing the politics of participation – semblance of agency. Democratic potential – but also co-opting the spirit is in evidence. There is plenty of examples of inducing individuals to contribute data and information, researchers are eager to understand motivation over a long period. Rational system to explain what is going on can’t explain the competing goals and values that are in action. The desire to participation is spread – fun, boredom etc. From understanding people as ‘snowflakes’ to unashamed exploitation. Why do people contribute to the wider agenda? As provocation, harnessing crowd potential to neoliberalisation agenda of universities. We give freedom to the efficiency and promise of digital tools. Government promise ‘open government’ or ‘smart cities’ that put efficiency as the top value. Deep libertarian desire for small government is expressed through technology. The government have sensors that reduce cost of monitoring what is happening. In the academic environment – reduce funding, hiring freeze, increase in pressure to publish – an assumption that it is possible to mechanically produce top research. Trading in ideas are less valued. Desire for capacity of information processing, or dealing with humanitarian efforts – projects like Galaxy Zoo require more people to analyse the masses of data that research produces, or mapathons to deal with emergencies. Participants are induced to do more through commitment to the project and harnessing enthusiasm. Adding inducement to the participants. She introduce the concept of micro-fascism from Guattari  – taking over freedoms in the hope of future promises. It enable large group formation to happen – e.g. identities such as I’m Mac/PC – it is harder to disconnect. Fascism can be defined as an ideology that rely on the masses in believing in the larger goals, the unquestioned authority of data in Web 2.0. Belief in technology induce researchers to get data and participation regardless of the costs. Open source is presented as democracy, but there are also similarities with fascism. Participation in the movement and participants must continue to perform. It bring uncomfortable participation – putting hope on these activities, but also happens in top down and bottom up, and Web 2.0. What is the ethical role of researchers who are involved in these projects? How do we value this labour? Need to admit that it is a political.

In a final comment, Teresa Scassa pointed that we need to consider the implication of legitimising drones, killing fish or employing unpaid labour – underlying all is a moral discomfort.

Afternoon, the two sessions on OpenStreetMap that Alan McConchie and I organised, taking the 10th birthday of OSM as a starting point, this session will survey the state of geographical research on OpenStreetMap and recognising that OSM studies are different from VGI. The session is supported by the European COST Energic (COST Action IC1203) network: European Network Exploring Research into Geospatial Information Crowdsourcing.

OpenStreetMap Studies 1 

Jennings Anderson, Robert Soden, Mikel Maron, Marina Kogan & Ken Anderson – University of Colorado, Boulder – The Social Life of OpenStreetMap: What Can We Know from the Data? New Tools and Approaches. OSM provides a platform to understand human centred computing. The is very valuable information in OSM history file, and they built a framework (EPIC OSM) that can run spatial and temporal queries and produces JSON output that can be then analysed. They are use existing tools and software frameworks to deliver it. The framework was demonstrated: can ask questions by day, or by month and even bin them by week and other ways. Running such questions which are evaluated by Ruby, so easy to add more questions and change them. They already use the framework in a paper in CHI about the Haiti earthquake (see video below).  Once they’ve created the underlying framework, they also developed an interface – OSM Markdown – can embed code and see changesets, accumulative nodes collected and classification by type of user. They are also providing information with tags. When analysing Haiti response, they see spike in noted added and what they see in buildings – the tags of collapse=yes

Christian Bittner – Diverse crowds, diverse VGI? Comparing OSM and Wikimapia in JerusalemChristian looked at differences in Wikimapia and OSM as sources of VGI. Especially interested in the social implications such as the way exclusion plays in VGI – challenges between Palestine/Israel – too contradicting stories that play out in a contested space, and there are conflict and fights over narratives that the two sides enact in different areas. With new tools, there is a ‘promise’ of democratisation – so a narrative of collaboration and participation. In crowdsourced geographic information we can ask: who is the crowd, and who is not? Studying social bias in OSM is a topic that is being discussed in the literature. The process is to look at the database of OSM. Analysing the data and metadata and used the municipal boundaries of Jerusalem. Simplified representation of the city, and region are classified by majority – Arab or Jewish. Then used cartograpms according to size of population and the amount of information collected.In OSM, Jewish areas are over-represented, while Arab areas are under-represented. Bias toward male from privileged socio-economic background as participants. In Wikimapia, the process is tagging places and uses visual information from Google. Wikimapia is about qualitative information so objects are messy and overlap, with no definitions of what consist of a place. In Wikimapia, there is much more descriptions of the Arab areas which are over-represented. The amount of information in Wikimpaia is smaller – 2679 objects, compared to 33,411 ways in OSM. In OSM there is little Arabic, and more Hebrew, though Latin is the most used language. Wikimapia is the other way around, with Hebrew in the minority. The crowd is different between projects. There are wider implications – diverse crowd so diverse VGI? VGI is diverse form of data, and they are produced in different ways from different knowledge cultures. He call for very specific studies on each community before claiming that VGI is general form of information.

Tim Elrick  & Georg Glasze – University of Erlangen-Nuremberg, Germany –  A changing mapping practices? Representation of Places of Worship in OpenStreetMap and other sourcesThe start of the process is noticing that churches are presented on official maps, but not a masques, noticing how maps are used to produce specific narratives. What happen in new forms of mapping? In Google Maps, the masque is presented, but not the church, in OSM both are mapped. What is happening? In the old topographic maps, the official NMAs argue that it provides a precise representation – but failing to do so in terms of religious differences. Some state do not include non-Christian places of worship – the federal mapping agency came with symbols for such places (masques, synagogues) but the preference from the states NMAs was for a generic mark for all non-Christian places that do not differentiate between religions. USGS just have single mark for house of worship – with cross. The USGS suggested to carry out crowdsourcing to identify places of worship so they are willing to change. In OSM there are free tagging and marks for religion, but the rendering dictate only some tags. In 2007 there was suggestion to change rendering of non-Christian places. Once Steve Chilton created cartographic symbols for the change. OSM do-ocracy can lead to change, but in other places that use OSM this was not accepted – there are different symbols in OpenCycleMaps. In Germany, there are conflicts about non visible places of worship (e.g. Masque in social club). Adaptive approach to dealing with location in OSM. In Google there is a whole set of data sources that are used, but also crowdsourcing which go to moderators in Google – no accountability or local knolwedge. Places of worship is not transparent. Categorisation and presentation change with new actors – corporate and open data. Google use economy of attention.

Alan McConchie – University of British Columbia – Map Gardening in Practice: Tracing Patterns of Growth and Maintenance in OpenStreetMap. Looking at history of OSM. Editing existing features is an important as adding new ones – having to collaborate and dealing with other people data. In the US, OSM is a mixed of volunteer and imported data – it’s ongoing aspect of the project. Questions: do the ‘explorers’ stick around? the people who like empty spaces . Do imports hinder the growth of the community? and does activity shift to ‘gardening’? The TIGER import in 2007 have been significant to the growth of the project. There are also many other imports – address in Denmark, French land cover, incomplete land cover imports in Canada. There was community backlash from people who were concerned about the impact of imports (e.g. Crowe 2011; Fredrik Ramm, 2012, Tobias Knerr, 2015). The debate is also between different regional factions. There is an assumption that only empty areas are exciting. That is problematic in terms of someone joining now in Germany. New best practices that are evolving Imports in Seattle were used to encourage the community and build it. Zielstra et al. 2013 explored imports show different growths, but not so simple as just to pin it on imports. Alan takes the ‘Wiki Gardening’ concept – people who like to keep things tidy and well maintained. Analysing small areas. Identifying blank spots, but trying to normalise across city in the world – e.g. population from the gridded population of the world. Exploring edits per month. We see many imports happening all the time. At individual city, explore the behaviour of explorers and those that never mapped the unknown. In London, new mappers are coming in while at Vancouver the original mapper are the one that continue to maintain the map. There is power law effects that trump anything else, and shift to new contributors and it is not clear cut.

Monica G. Stephens – University at Buffalo – Discussant: she started looking at OSM only few years ago, because of a statement from Mike Goodchild that women are not included, so done survey of internet users in Google Maps and OSM. She found that geotagging is much more male – more then just sharing image. In her survey she noticed gender bias in OSM. Maps are biased by the norms, traditions, assumptions and politics of map maker (Harley 1989). Biases – but biases of map maker – bikes in Denver (what interest them), or uneven representation of Hebrew in Jerusalem, or Religious attributes. Also there is how the community makes decision – how to display information? what to import? There are issues of ethos – there are fundamental differences in UK and Germany communities to US mapping communities. This lead to interesting conversations between these communities. There are also comparison, Wikimapia, Google Maps, Topo Maps – the tell us what OSM is doing. OSM democracy is more efficient and responding to communities ideas. The discussions on tagging childcare – rejected but there are discussions that led to remapping of tags in response to the critique. Compare to Google Maps, who was creating local knowledge? in Google Maps 96% of reviewers are male (in Google Map Maker 2012), so the question is who is the authority that govern Wikimapia.

OpenStreetMap Studies 2  included the following:

Martin Loidl – Department of Geoinformatics, University of Salzburg – An intrinsic approach for the detection and correction of attributive inconsistencies and semantic heterogeneity in OSM data. Martin come from data modelling perspective, accepting that OSM is based on bottom-up approach, with flat data modelling and attributes, with no restriction on tag usage. There are attributive inconsistencies. Semantics heterogeneity is influencing visualisation, statistics and spatial analysis. Suggesting to improve results by harmonization and correction through estimation. There has been many comparison of OSM quality over the years. There is little work on attribute information. Martin suggested an intrinsic approach that rely on the data in OSM – expecting major roads to be connected and consistent. Showing how you can attributes in completeness. Most of the road in OSM are local roads and  and there is high heterogeneity, but we need them and we should care about them. There are issues with keeping the freedom to tag – it expose the complexity of OSM.

Peter A. Johnson – University of Waterloo Challenges and Constraints to Municipal Government Adoption of OpenStreetMap. The collaboration of MapBox with NYC – agreement on data sharing was his starting point and motivation to explore how we can connect government and citizens to share data. Potentially, OSM community will help with official data, improve it and send it back. Just delivering municipal data over OSM base map is not much – maybe we need to look at mirroring – questions about currency, improvement of our services, and cheaper/easier to get are core questions. Evaluating official data and OSM data. Interview with governments in Canada, with range of sizes – easy in large cities, basic steps in medium and little progress in rural places. No official use of OSM, but do make data available to OSM community, and anecdotal evidence of using it for different jobs unofficially. Not seeing benefits in mirroring data, and they are the authoritative source for information, no other data is relevant. Constraints: not sure that OSM is more accurate and risk averse culture. They question fit with organisation needs, lacking required attributes, and they do see costs in altering existing data. OSM might be relevant to rural and small cities where data is not being updated.

Muki Haklay – University College London COST Energic – A European Network for research of VGI: the role of OSM/VGI/Citizen Science definitionsI’ve used some of the concepts that I first presented in SOTM 2011 in Vienna, and extended them to the general area of citizen science and VGI. Arguing that academics need to be ‘critical friends’, in a nice way, to OSM and other communities. The different talks and Monica points about changes in tagging demonstrate that this approach is effective and helpful.

Discussant: Alan McConchie – University of British Columbia. The later session looked at intrinsic or extrinsic analysis of OSM – such as Martin’s work on internal consistency, there are issues of knowing specific person in the bits of the process who can lead to the change. There is a very tiny group of people that make the decisions, but there is a slow opening towards accountability (e.g. OSM rendering style on Github). There are translation of knowledge and representation that happen in different groups and identifying how to make the information correctly. There is a sense of ‘no one got the right answer’. Industry and NGOs also need to act as critical friends – it will make it a better project. There is also critical GIS conversations – is there ‘fork’ within the OSM studies? We can have conversations about these issues.

Follow up questions explored the privacy of the participants and maybe mentioned it to participants and the community, and also the position as participant or someone who alters the data and as a researcher – the implications of participatory observations.

Geographic Information Science and Citizen Science

Thanks to invitations from UNIGIS and Edinburgh Earth Observatory / AGI Scotland, I had an opportunity to reflect on how Geographic Information Science (GIScience) can contribute to citizen science, and what citizen science can contribute to GIScience.

Despite the fact that it’s 8 years since the term Volunteers Geographic Information (VGI) was coined, I didn’t assume that all the audience is aware of how it came about or the range of sources of VGI. I also didn’t assume knowledge of citizen science, which is far less familiar term for a GIScience audience. Therefore, before going into a discussion about the relationship between the two areas, I opened with a short introduction to both, starting with VGI, and then moving to citizen science. After introduction to the two areas, I’m suggesting the relationships between them – there are types of citizen science that are overlapping VGI – biological recording and environmental observations, as well as community (or civic) science, while other types, such as volunteer thinking includes many projects that are non-geographical (think EyeWire or Galaxy Zoo).

However, I don’t just list a catalogue of VGI and citizen science activities. Personally, I found trends a useful way to make sense of what happen. I’ve learned that from the writing of Thomas Friedman, who used it in several of his books to help the reader understand where the changes that he covers came from. Trends are, of course, speculative, as it is very difficult to demonstrate causality or to be certain about the contribution of each trends to the end result. With these caveats in mind, there are several technological and societal trends that I used in the talk to explain how VGI (and the VGI element of citizen science) came from.

Of all these trends, I keep coming back to one technical and one societal that I see as critical. The removal of selective availability of GPS in May 2000 is my top technical change, as the cascading effect from it led to the deluge of good enough location data which is behind VGI and citizen science. On the societal side, it is the Flynn effect as a signifier of the educational shift in the past 50 years that explains how the ability to participate in scientific projects have increased.

In terms of the reciprocal contributions between the fields, I suggest the following:

GIScience can support citizen science by considering data quality assurance methods that are emerging in VGI, there are also plenty of Spatial Analysis methods that take into account heterogeneity and therefore useful for citizen science data. The areas of geovisualisation and human-computer interaction studies in GIS can assist in developing more effective and useful applications for citizen scientists and people who use their data. There is also plenty to do in considering semantics, ontologies, interoperability and standards. Finally, since critical GIScientists have been looking for a long time into the societal aspects of geographical technologies such as privacy, trust, inclusiveness, and empowerment, they have plenty to contribute to citizen science activities in how to do them in more participatory ways.

On the other hand, citizen science can contribute to GIScience, and especially VGI research, in several ways. First, citizen science can demonstrate longevity of VGI data sources with some projects going back hundreds of years. It provides challenging datasets in terms of their complexity, ontology, heterogeneity and size. It can bring questions about Scale and how to deal with large, medium and local activities, while merging them to a coherent dataset. It also provide opportunities for GIScientists to contribute to critical societal issues such as climate change adaptation or biodiversity loss. It provides some of the most interesting usability challenges such as tools for non-literate users, and finally, plenty of opportunities for interdisciplinary collaborations.

The slides from the talk are available below.

International Encyclopedia of Geography – Quality Assurance of VGI

The Association of American Geographers is coordinating an effort to create an International Encyclopedia of Geography. Plans started in 2010, with an aim to see the 15 volumes project published in 2015 or 2016. Interestingly, this shows that publishers and scholars are still seeing the value in creating subject-specific encyclopedias. On the other hand, the weird decision by Wikipedians that Geographic Information Science doesn’t exist outside GIS, show that geographers need a place to define their practice by themselves. You can find more information about the AAG International Encyclopedia project in an interview with Doug Richardson from 2012.

As part of this effort, I was asked to write an entry on ‘Volunteered Geographic Information, Quality Assurance‘ as a short piece of about 3000 words. To do this, I have looked around for mechanisms that are used in VGI and in Citizen Science. This are covered in OpenStreetMap studies and similar work in GIScience, and in the area of citizen science, there are reviews such as the one by Andrea Wiggins and colleagues of mechanisms to ensure data quality in citizen science projects, which clearly demonstrated that projects are using multiple methods to ensure data quality.

Below you’ll find an abridged version of the entry (but still long). The citation for this entry will be:

Haklay, M., Forthcoming. Volunteered geographic information, quality assurance. in D. Richardson, N. Castree, M. Goodchild, W. Liu, A. Kobayashi, & R. Marston (Eds.) The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. Hoboken, NJ: Wiley/AAG

In the entry, I have identified 6 types of mechanisms that are used to ensure quality assurance when the data has a geographical component, either VGI or citizen science. If I have missed a type of quality assurance mechanism, please let me know!

Here is the entry:

Volunteered geographic information, quality assurance

Volunteered Geographic Information (VGI) originate outside the realm of professional data collection by scientists, surveyors and geographers. Quality assurance of such information is important for people who want to use it, as they need to identify if it is fit-for-purpose. Goodchild and Li (2012) identified three approaches for VGI quality assurance , ‘crowdsourcing‘ and that rely on the number of people that edited the information, ‘social’ approach that is based on gatekeepers and moderators, and ‘geographic’ approach which uses broader geographic knowledge to verify that the information fit into existing understanding of the natural world. In addition to the approaches that Goodchild and li identified, there are also ‘domain’ approach that relate to the understanding of the knowledge domain of the information, ‘instrumental observation’ that rely on technology, and ‘process oriented’ approach that brings VGI closer to industrialised procedures. First we need to understand the nature of VGI and the source of concern with quality assurance.

While the term volunteered geographic information (VGI) is relatively new (Goodchild 2007), the activities that this term described are not. Another relatively recent term, citizen science (Bonney 1996), which describes the participation of volunteers in collecting, analysing and sharing scientific information, provide the historical context. While the term is relatively new, the collection of accurate information by non-professional participants turn out to be an integral part of scientific activity since the 17th century and likely before (Bonney et al 2013). Therefore, when approaching the question of quality assurance of VGI, it is critical to see it within the wider context of scientific data collection and not to fall to the trap of novelty, and to consider that it is without precedent.

Yet, this integration need to take into account the insights that emerged within geographic information science (GIScience) research over the past decades. Within GIScience, it is the body of research on spatial data quality that provide the framing for VGI quality assurance. Van Oort’s (2006) comprehensive synthesis of various quality standards identifies the following elements of spatial data quality discussions:

  • Lineage – description of the history of the dataset,
  • Positional accuracy – how well the coordinate value of an object in the database relates to the reality on the ground.
  • Attribute accuracy – as objects in a geographical database are represented not only by their geometrical shape but also by additional attributes.
  • Logical consistency – the internal consistency of the dataset,
  • Completeness – how many objects are expected to be found in the database but are missing as well as an assessment of excess data that should not be included.
  • Usage, purpose and constraints – this is a fitness-for-purpose declaration that should help potential users in deciding how the data should be used.
  • Temporal quality – this is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.

While some of these quality elements might seem independent of a specific application, in reality they can be only be evaluated within a specific context of use. For example, when carrying out analysis of street-lighting in a specific part of town, the question of completeness become specific about the recording of all street-light objects within the bounds of the area of interest and if the data set includes does not include these features or if it is complete for another part of the settlement is irrelevant for the task at hand. The scrutiny of information quality within a specific application to ensure that it is good enough for the needs is termed ‘fitness for purpose’. As we shall see, fit-for-purpose is a central issue with respect to VGI.

To understand the reason that geographers are concerned with quality assurance of VGI, we need to recall the historical development of geographic information, and especially the historical context of geographic information systems (GIS) and GIScience development since the 1960s. For most of the 20th century, geographic information production became professionalised and institutionalised. The creation, organisation and distribution of geographic information was done by official bodies such as national mapping agencies or national geological bodies who were funded by the state. As a results, the production of geographic information became and industrial scientific process in which the aim is to produce a standardised product – commonly a map. Due to financial, skills and process limitations, products were engineered carefully so they can be used for multiple purposes. Thus, a topographic map can be used for navigation but also for urban planning and for many other purposes. Because the products were standardised, detailed specifications could be drawn, against which the quality elements can be tested and quality assurance procedures could be developed. This was the backdrop to the development of GIS, and to the conceptualisation of spatial data quality.

The practices of centralised, scientific and industrialised geographic information production lend themselves to quality assurance procedures that are deployed through organisational or professional structures, and explains the perceived challenges with VGI. Centralised practices also supported employing people with focus on quality assurance, such as going to the field with a map and testing that it complies with the specification that were used to create it. In contrast, most of the collection of VGI is done outside organisational frameworks. The people who contribute the data are not employees and seemingly cannot be put into training programmes, asked to follow quality assurance procedures, or expected to use standardised equipment that can be calibrated. The lack of coordination and top-down forms of production raise questions about ensuring the quality of the information that emerges from VGI.

To consider quality assurance within VGI require to understand some underlying principles that are common to VGI practices and differentiate it from organised and industrialised geographic information creation. For example, some VGI is collected under conditions of scarcity or abundance in terms of data sources, number of observations or the amount of data that is being used. As noted, the conceptualisation of geographic data collection before the emergence of VGI was one of scarcity where data is expensive and complex to collect. In contrast, many applications of VGI the situation is one of abundance. For example, in applications that are based on micro-volunteering, where the participant invest very little time in a fairly simple task, it is possible to give the same mapping task to several participants and statistically compare their independent outcomes as a way to ensure the quality of the data. Another form of considering abundance as a framework is in the development of software for data collection. While in previous eras, there will be inherently one application that was used for data capture and editing, in VGI there is a need to consider of multiple applications as different designs and workflows can appeal and be suitable for different groups of participants.

Another underlying principle of VGI is that since the people who collect the information are not remunerated or in contractual relationships with the organisation that coordinates data collection, a more complex relationships between the two sides are required, with consideration of incentives, motivations to contribute and the tools that will be used for data collection. Overall, VGI systems need to be understood as socio-technical systems in which the social aspect is as important as the technical part.

In addition, VGI is inherently heterogeneous. In large scale data collection activities such as the census of population, there is a clear attempt to capture all the information about the population over relatively short time and in every part of the country. In contrast, because of its distributed nature, VGI will vary across space and time, with some areas and times receiving more attention than others. An interesting example has been shown in temporal scales, where some citizen science activities exhibit ‘weekend bias’ as these are the days when volunteers are free to collect more information.

Because of the difference in the organisational settings of VGI, a different approaches to quality assurance is required, although as noted, in general such approaches have been used in many citizen science projects. Over the years, several approaches emerged and these include ‘crowdsourcing ‘, ‘social’, ‘geographic’, ‘domain’, ‘instrumental observation’ and ‘process oriented’. We now turn to describe each of these approaches.

Thecrowdsourcing approach is building on the principle of abundance. Since there are is a large number of contributors, quality assurance can emerge from repeated verification by multiple participants. Even in projects where the participants actively collect data in uncoordinated way, such as the OpenStreetMap project, it has been shown that with enough participants actively collecting data in a given area, the quality of the data can be as good as authoritative sources. The limitation of this approach is when local knowledge or verification on the ground (‘ground truth’) is required. In such situations, the ‘crowdsourcing’ approach will work well in central, highly populated or popular sites where there are many visitors and therefore the probability that several of them will be involved in data collection rise. Even so, it is possible to encourage participants to record less popular places through a range of suitable incentives.

Thesocial approach is also building on the principle of abundance in terms of the number of participants, but with a more detailed understanding of their knowledge, skills and experience. In this approach, some participants are asked to monitor and verify the information that was collected by less experienced participants. The social method is well established in citizen science programmes such as bird watching, where some participants who are more experienced in identifying bird species help to verify observations by other participants. To deploy the social approach, there is a need for a structured organisations in which some members are recognised as more experienced, and are given the appropriate tools to check and approve information.

Thegeographic approach uses known geographical knowledge to evaluate the validity of the information that is received by volunteers. For example, by using existing knowledge about the distribution of streams from a river, it is possible to assess if mapping that was contributed by volunteers of a new river is comprehensive or not. A variation of this approach is the use of recorded information, even if it is out-of-date, to verify the information by comparing how much of the information that is already known also appear in a VGI source. Geographic knowledge can be potentially encoded in software algorithms.

Thedomain approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate.

Theinstrumental observation approach remove some of the subjective aspects of data collection by a human that might made an error, and rely instead on the availability of equipment that the person is using. Because of the increased in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provide evidence for the quality and accuracy of the information.

Finally, the ‘process oriented approach bring VGI closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and an online resources to learn about data collection and reporting.

Importantly, these approach are not used in isolation and in any given project it is likely to see a combination of them in operation. Thus, an element of training and guidance to users can appear in a downloadable application that is distributed widely, and therefore the method that will be used in such a project will be a combination of the process oriented with the crowdsourcing approach. Another example is the OpenStreetMap project, which in the general do not follow limited guidance to volunteers in terms of information that they collect or the location in which they collect it. Yet, a subset of the information that is collected in OpenStreetMap database about wheelchair access is done through the highly structured process of the WheelMap application in which the participant is require to select one of four possible settings that indicate accessibility. Another subset of the information that is recorded for humanitarian efforts is following the social model in which the tasks are divided between volunteers using the Humanitarian OpenStreetMap Team (H.O.T) task manager, and the data that is collected is verified by more experienced participants.

The final, and critical point for quality assurance of VGI that was noted above is fitness-for-purpose. In some VGI activities the information has a direct and clear application, in which case it is possible to define specifications for the quality assurance element that were listed above. However, one of the core aspects that was noted above is the heterogeneity of the information that is collected by volunteers. Therefore, before using VGI for a specific application there is a need to check for its fitness for this specific use. While this is true for all geographic information, and even so called ‘authoritative’ data sources can suffer from hidden biases (e.g. luck of update of information in rural areas), the situation with VGI is that variability can change dramatically over short distances – so while the centre of a city will be mapped by many people, a deprived suburb near the centre will not be mapped and updated. There are also limitations that are caused by the instruments in use – for example, the GPS positional accuracy of the smartphones in use. Such aspects should also be taken into account, ensuring that the quality assurance is also fit-for-purpose.

References and Further Readings

Bonney, Rick. 1996. Citizen Science – a lab tradition, Living Bird, Autumn 1996.
Bonney, Rick, Shirk, Jennifer, Phillips, Tina B. 2013. Citizen Science, Encyclopaedia of science education. Berlin: Springer-Verlag.
Goodchild, Michael F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221.
Goodchild, Michael F., and Li, Linna. 2012, Assuring the quality of volunteered geographic information. Spatial Statistics, 1 110-120
Haklay, Mordechai. 2010. How Good is volunteered geographical information? a comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703.
Sui, Daniel, Elwood, Sarah and Goodchild, Michael F. (eds), 2013. Crowdsourcing Geographic Knowledge, Berlin:Springer-Verlag.
Van Oort, Pepjin .A.J. 2006. Spatial data quality: from description to application, PhD Thesis, Wageningen: Wageningen Universiteit, p. 125.

OpenStreetMap studies (and why VGI not equal OSM)

As far as I can tell, Nelson et al. (2006) ‘Towards development of a high quality public domain global roads database‘ and Taylor & Caquard (2006) Cybercartography: Maps and Mapping in the Information Era are the first peer-reviewed papers that mention OpenStreetMap. Since then, OpenStreetMap has received plenty of academic attention. More ‘conservative’ search engines such as ScienceDirect or Scopus find 286 and 236 peer reviewed papers (respectively) that mention the project. The ACM digital library finds 461 papers in the areas that are relevant to computing and electronics, while Microsoft Academic Research finds only 112. Google Scholar lists over 9000 (!). Even with the most conservative version from Microsoft, we can see an impact on fields ranging from social science to engineering and physics. So lots to be proud of as a major contribution to knowledge beyond producing maps.

Michael Goodchild, in his 2007 paper that started the research into Volunteered Geographic Information (VGI), mentioned OpenStreetMap (OSM), and since then there is a lot of conflation of OSM and VGI. In some recent papers you can find statements such as ‘OpenstreetMap is considered as one of the most successful and popular VGI projects‘ or ‘the most prominent VGI project OpenStreetMap so, at some level, the boundary between the two is being blurred. I’m part of the problem – for example, with the title of my 2010 paper ‘How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasetsHowever, the more I think about it, the more uncomfortable I am with this equivalence. I feel that the recent line from Neis & Zielstra (2014) is more accurate: ‘One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM)‘. I’ll explain why.

Let’s look at the whole area of OpenStreetMap studies. Over the past decade, several types of research paper have emerged.

First, there is a whole set of research projects that use OSM data because it’s easy to use and free to access (in computer vision or even string theory). These studies are not part of ‘OSM studies’ or VGI, as, for them, this is just data to be used.

Edward Betts. CC-By-SA 2.0 via Wikimedia Commons

Second, there are studies about OSM data: quality, evolution of objects and other aspects from researchers such as Peter Mooney, Pascal Neis, Alex Zipf  and many others.

Third, there are studies that also look at the interactions between the contribution and the data – for example, in trying to infer trustworthiness.

Fourth, there are studies that look at the wider societal aspects of OpenStreetMap, with people like Martin Dodge, Chris Perkins, and Jo Gerlach contributing in interesting discussions.

Finally, there are studies of the social practices in OpenStreetMap as a project, with the work of Yu-Wei Lin, Nama Budhathoki, Manuela Schmidt and others.

[Unfortunately, due to academic practices and publication outlets, many of these papers are locked behind paywalls, but thatis another issue… ]

In short, there is a significant body of knowledge regarding the nature of the project, the implications of what it produces, and ways to understand the information that emerges from it. Clearly, we now know that OSM produces good data and are ware of the patterns of contribution. What is also clear is that many of these patterns are specific to OSM. Because of the importance of OSM to so many application areas (including illustrative maps in string theory!) these insights are very important. Some of these insights are expected to also be present in other VGI projects (hence my suggestions for assertions about VGI) but this needs to be done carefully, only when there is evidence from other projects that this is the case. In short, we should avoid conflating VGI and OSM.