19 September, 2014
The Association of American Geographers is coordinating an effort to create an International Encyclopedia of Geography. Plans started in 2010, with an aim to see the 15 volumes project published in 2015 or 2016. Interestingly, this shows that publishers and scholars are still seeing the value in creating subject-specific encyclopedias. On the other hand, the weird decision by Wikipedians that Geographic Information Science doesn’t exist outside GIS, show that geographers need a place to define their practice by themselves. You can find more information about the AAG International Encyclopedia project in an interview with Doug Richardson from 2012.
As part of this effort, I was asked to write an entry on ‘Volunteered Geographic Information, Quality Assurance‘ as a short piece of about 3000 words. To do this, I have looked around for mechanisms that are used in VGI and in Citizen Science. This are covered in OpenStreetMap studies and similar work in GIScience, and in the area of citizen science, there are reviews such as the one by Andrea Wiggins and colleagues of mechanisms to ensure data quality in citizen science projects, which clearly demonstrated that projects are using multiple methods to ensure data quality.
Below you’ll find an abridged version of the entry (but still long). The citation for this entry will be:
Haklay, M., Forthcoming. Volunteered geographic information, quality assurance. in D. Richardson, N. Castree, M. Goodchild, W. Liu, A. Kobayashi, & R. Marston (Eds.) The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. Hoboken, NJ: Wiley/AAG
In the entry, I have identified 6 types of mechanisms that are used to ensure quality assurance when the data has a geographical component, either VGI or citizen science. If I have missed a type of quality assurance mechanism, please let me know!
Here is the entry:
Volunteered geographic information, quality assurance
Volunteered Geographic Information (VGI) originate outside the realm of professional data collection by scientists, surveyors and geographers. Quality assurance of such information is important for people who want to use it, as they need to identify if it is fit-for-purpose. Goodchild and Li (2012) identified three approaches for VGI quality assurance , ‘crowdsourcing‘ and that rely on the number of people that edited the information, ‘social’ approach that is based on gatekeepers and moderators, and ‘geographic’ approach which uses broader geographic knowledge to verify that the information fit into existing understanding of the natural world. In addition to the approaches that Goodchild and li identified, there are also ‘domain’ approach that relate to the understanding of the knowledge domain of the information, ‘instrumental observation’ that rely on technology, and ‘process oriented’ approach that brings VGI closer to industrialised procedures. First we need to understand the nature of VGI and the source of concern with quality assurance.
While the term volunteered geographic information (VGI) is relatively new (Goodchild 2007), the activities that this term described are not. Another relatively recent term, citizen science (Bonney 1996), which describes the participation of volunteers in collecting, analysing and sharing scientific information, provide the historical context. While the term is relatively new, the collection of accurate information by non-professional participants turn out to be an integral part of scientific activity since the 17th century and likely before (Bonney et al 2013). Therefore, when approaching the question of quality assurance of VGI, it is critical to see it within the wider context of scientific data collection and not to fall to the trap of novelty, and to consider that it is without precedent.
Yet, this integration need to take into account the insights that emerged within geographic information science (GIScience) research over the past decades. Within GIScience, it is the body of research on spatial data quality that provide the framing for VGI quality assurance. Van Oort’s (2006) comprehensive synthesis of various quality standards identifies the following elements of spatial data quality discussions:
- Lineage – description of the history of the dataset,
- Positional accuracy – how well the coordinate value of an object in the database relates to the reality on the ground.
- Attribute accuracy – as objects in a geographical database are represented not only by their geometrical shape but also by additional attributes.
- Logical consistency – the internal consistency of the dataset,
- Completeness – how many objects are expected to be found in the database but are missing as well as an assessment of excess data that should not be included.
- Usage, purpose and constraints – this is a fitness-for-purpose declaration that should help potential users in deciding how the data should be used.
- Temporal quality – this is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.
While some of these quality elements might seem independent of a specific application, in reality they can be only be evaluated within a specific context of use. For example, when carrying out analysis of street-lighting in a specific part of town, the question of completeness become specific about the recording of all street-light objects within the bounds of the area of interest and if the data set includes does not include these features or if it is complete for another part of the settlement is irrelevant for the task at hand. The scrutiny of information quality within a specific application to ensure that it is good enough for the needs is termed ‘fitness for purpose’. As we shall see, fit-for-purpose is a central issue with respect to VGI.
To understand the reason that geographers are concerned with quality assurance of VGI, we need to recall the historical development of geographic information, and especially the historical context of geographic information systems (GIS) and GIScience development since the 1960s. For most of the 20th century, geographic information production became professionalised and institutionalised. The creation, organisation and distribution of geographic information was done by official bodies such as national mapping agencies or national geological bodies who were funded by the state. As a results, the production of geographic information became and industrial scientific process in which the aim is to produce a standardised product – commonly a map. Due to financial, skills and process limitations, products were engineered carefully so they can be used for multiple purposes. Thus, a topographic map can be used for navigation but also for urban planning and for many other purposes. Because the products were standardised, detailed specifications could be drawn, against which the quality elements can be tested and quality assurance procedures could be developed. This was the backdrop to the development of GIS, and to the conceptualisation of spatial data quality.
The practices of centralised, scientific and industrialised geographic information production lend themselves to quality assurance procedures that are deployed through organisational or professional structures, and explains the perceived challenges with VGI. Centralised practices also supported employing people with focus on quality assurance, such as going to the field with a map and testing that it complies with the specification that were used to create it. In contrast, most of the collection of VGI is done outside organisational frameworks. The people who contribute the data are not employees and seemingly cannot be put into training programmes, asked to follow quality assurance procedures, or expected to use standardised equipment that can be calibrated. The lack of coordination and top-down forms of production raise questions about ensuring the quality of the information that emerges from VGI.
To consider quality assurance within VGI require to understand some underlying principles that are common to VGI practices and differentiate it from organised and industrialised geographic information creation. For example, some VGI is collected under conditions of scarcity or abundance in terms of data sources, number of observations or the amount of data that is being used. As noted, the conceptualisation of geographic data collection before the emergence of VGI was one of scarcity where data is expensive and complex to collect. In contrast, many applications of VGI the situation is one of abundance. For example, in applications that are based on micro-volunteering, where the participant invest very little time in a fairly simple task, it is possible to give the same mapping task to several participants and statistically compare their independent outcomes as a way to ensure the quality of the data. Another form of considering abundance as a framework is in the development of software for data collection. While in previous eras, there will be inherently one application that was used for data capture and editing, in VGI there is a need to consider of multiple applications as different designs and workflows can appeal and be suitable for different groups of participants.
Another underlying principle of VGI is that since the people who collect the information are not remunerated or in contractual relationships with the organisation that coordinates data collection, a more complex relationships between the two sides are required, with consideration of incentives, motivations to contribute and the tools that will be used for data collection. Overall, VGI systems need to be understood as socio-technical systems in which the social aspect is as important as the technical part.
In addition, VGI is inherently heterogeneous. In large scale data collection activities such as the census of population, there is a clear attempt to capture all the information about the population over relatively short time and in every part of the country. In contrast, because of its distributed nature, VGI will vary across space and time, with some areas and times receiving more attention than others. An interesting example has been shown in temporal scales, where some citizen science activities exhibit ‘weekend bias’ as these are the days when volunteers are free to collect more information.
Because of the difference in the organisational settings of VGI, a different approaches to quality assurance is required, although as noted, in general such approaches have been used in many citizen science projects. Over the years, several approaches emerged and these include ‘crowdsourcing ‘, ‘social’, ‘geographic’, ‘domain’, ‘instrumental observation’ and ‘process oriented’. We now turn to describe each of these approaches.
The ‘crowdsourcing’ approach is building on the principle of abundance. Since there are is a large number of contributors, quality assurance can emerge from repeated verification by multiple participants. Even in projects where the participants actively collect data in uncoordinated way, such as the OpenStreetMap project, it has been shown that with enough participants actively collecting data in a given area, the quality of the data can be as good as authoritative sources. The limitation of this approach is when local knowledge or verification on the ground (‘ground truth’) is required. In such situations, the ‘crowdsourcing’ approach will work well in central, highly populated or popular sites where there are many visitors and therefore the probability that several of them will be involved in data collection rise. Even so, it is possible to encourage participants to record less popular places through a range of suitable incentives.
The ‘social’ approach is also building on the principle of abundance in terms of the number of participants, but with a more detailed understanding of their knowledge, skills and experience. In this approach, some participants are asked to monitor and verify the information that was collected by less experienced participants. The social method is well established in citizen science programmes such as bird watching, where some participants who are more experienced in identifying bird species help to verify observations by other participants. To deploy the social approach, there is a need for a structured organisations in which some members are recognised as more experienced, and are given the appropriate tools to check and approve information.
The ‘geographic’ approach uses known geographical knowledge to evaluate the validity of the information that is received by volunteers. For example, by using existing knowledge about the distribution of streams from a river, it is possible to assess if mapping that was contributed by volunteers of a new river is comprehensive or not. A variation of this approach is the use of recorded information, even if it is out-of-date, to verify the information by comparing how much of the information that is already known also appear in a VGI source. Geographic knowledge can be potentially encoded in software algorithms.
The ‘domain’ approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate.
The ‘instrumental observation’ approach remove some of the subjective aspects of data collection by a human that might made an error, and rely instead on the availability of equipment that the person is using. Because of the increased in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provide evidence for the quality and accuracy of the information.
Finally, the ‘process oriented’ approach bring VGI closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and an online resources to learn about data collection and reporting.
Importantly, these approach are not used in isolation and in any given project it is likely to see a combination of them in operation. Thus, an element of training and guidance to users can appear in a downloadable application that is distributed widely, and therefore the method that will be used in such a project will be a combination of the process oriented with the crowdsourcing approach. Another example is the OpenStreetMap project, which in the general do not follow limited guidance to volunteers in terms of information that they collect or the location in which they collect it. Yet, a subset of the information that is collected in OpenStreetMap database about wheelchair access is done through the highly structured process of the WheelMap application in which the participant is require to select one of four possible settings that indicate accessibility. Another subset of the information that is recorded for humanitarian efforts is following the social model in which the tasks are divided between volunteers using the Humanitarian OpenStreetMap Team (H.O.T) task manager, and the data that is collected is verified by more experienced participants.
The final, and critical point for quality assurance of VGI that was noted above is fitness-for-purpose. In some VGI activities the information has a direct and clear application, in which case it is possible to define specifications for the quality assurance element that were listed above. However, one of the core aspects that was noted above is the heterogeneity of the information that is collected by volunteers. Therefore, before using VGI for a specific application there is a need to check for its fitness for this specific use. While this is true for all geographic information, and even so called ‘authoritative’ data sources can suffer from hidden biases (e.g. luck of update of information in rural areas), the situation with VGI is that variability can change dramatically over short distances – so while the centre of a city will be mapped by many people, a deprived suburb near the centre will not be mapped and updated. There are also limitations that are caused by the instruments in use – for example, the GPS positional accuracy of the smartphones in use. Such aspects should also be taken into account, ensuring that the quality assurance is also fit-for-purpose.
References and Further Readings
Bonney, Rick. 1996. Citizen Science – a lab tradition, Living Bird, Autumn 1996.
Bonney, Rick, Shirk, Jennifer, Phillips, Tina B. 2013. Citizen Science, Encyclopaedia of science education. Berlin: Springer-Verlag.
Goodchild, Michael F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221.
Goodchild, Michael F., and Li, Linna. 2012, Assuring the quality of volunteered geographic information. Spatial Statistics, 1 110-120
Haklay, Mordechai. 2010. How Good is volunteered geographical information? a comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703.
Sui, Daniel, Elwood, Sarah and Goodchild, Michael F. (eds), 2013. Crowdsourcing Geographic Knowledge, Berlin:Springer-Verlag.
Van Oort, Pepjin .A.J. 2006. Spatial data quality: from description to application, PhD Thesis, Wageningen: Wageningen Universiteit, p. 125.
6 September, 2014
When you look at the discussions that are emerging around the term ‘Citizen Science‘, you can often find discussion about the ‘Citizen‘ part of the term. What about the ‘Science‘ part? This is something that once you start being involved in Citizen Science you are forced to contemplate. As Francois Grey like to note ‘Science is too important to be left out to scientists‘ and we need to find a way to make it more inclusive as a process and practice. Sometime, Citizen Science challenges ‘established’ science and protocols. This can be about small things – such as noticing that diffusion tubes are installed at 2.5m (while the area of real concern is 1-1.5m), or bigger things, such as noticing that a lot of noise measurement is about what is possible to measure (sound) and avoiding what is difficult (noise). Even more challenging is the integration of local, lay and traditional knowledge within the citizen science framework with scientific knowledge. In short, there is value in considering what we mean by ‘science’.
For me, the challenge that evolved was ‘how can we have a definition of science that recognises that it’s a powerful form of knowledge, while allowing other forms of knowledge to work with it?‘. After experimenting with different ideas in the past year, I ended with the following, directly paraphrasing from the famous quote* from Winston Churchill about democracy as the least worst form of government. So the current, work in progress, definition that I’m using is the following:
“Science is the least worst method to accumulate human knowledge about the natural world (and it need to work, in a respectful way, with other forms of knowledge)”
What I am trying to do with this definition is first to recognise that knowledge is produced collaboratively and, ideally, in a democratic process. For that, the original form of the phrase is useful. Second, I wanted to note that science is not infallible but meandering, getting into blind alleys and all the rest, which the ‘least worst’ is capturing better than ‘the best’. Third, it is allowing the recognition that it is a very effective and powerful form of human knowledge.
Does it work? Is it suitable?
* I always like to find the correct source, and if you look at the Hansard, you’ll see that Churchill was more forthright and said: “Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed, it has been said that democracy is the worst form of Government except all those other forms that have been tried from time to time;”. Now that I know that, it’s tempting to try and replace democracy with science and government with knowledge…
12 July, 2014
The Vespucci initiative has been running for over a decade, bringing together participants from wide range of academic backgrounds and experiences to explore, in a ‘slow learning’ way, various aspects of geographic information science research. The Vespucci Summer Institutes are week long summer schools, most frequently held at Fiesole, a small town overlooking Florence. This year, the focus of the first summer institute was on crowdsourced geographic information and citizen science.
The workshop was supported by COST ENERGIC (a network that links researchers in the area of crowdsourced geographic information, funded by the EU research programme), the EU Joint Research Centre (JRC), Esri and our Extreme Citizen Science research group. The summer school included about 30 participants and facilitators that ranged from master students students that are about to start their PhD studies, to established professors who came to learn and share knowledge. This is a common feature of Vespucci Institute, and the funding from the COST network allowed more early career researchers to participate.
Apart from the pleasant surrounding, Vespucci Institutes are characterised by the relaxed, yet detailed discussions that can be carried over long lunches and coffee breaks, as well as team work in small groups on a task that each group present at the end of the week. Moreover, the programme is very flexible so changes and adaptation to the requests of the participants and responding to the general progression of the learning are part of the process.
This is the second time that I am participating in Vespucci Institutes as a facilitator, and in both cases it was clear that participants take the goals of the institute seriously, and make the most of the opportunities to learn about the topics that are explored, explore issues in depth with the facilitators, and work with their groups beyond the timetable.
The topics that were covered in the school were designed to provide an holistic overview of geographical crowdsourcing or citizen science projects, especially in the area where these two types of activities meet. This can be when a group of citizens want to collect and analyse data about local environmental concerns, or oceanographers want to work with divers to record water temperature, or when details that are emerging from social media are used to understand cultural differences in the understanding of border areas. These are all examples that were suggested by participants from projects that they are involved in. In addition, citizen participation in flood monitoring and water catchment management, sharing information about local food and exploring data quality of spatial information that can be used by wheelchair users also came up in the discussion. The crossover between the two areas provided a common ground for the participants to explore issues that are relevant to their research interests.
The holistic aspect that was mentioned before was a major goal for the school – so to consider the tools that are used to collect information, engaging and working with the participants, managing the data that is provided by the participants and ensuring that it is useful for other purposes. To start the process, after introducing the topics of citizen science and volunteered geographic information (VGI), the participants learned about data collection activities, including noise mapping, OpenStreetMap contribution, bird watching and balloon and kite mapping. As can be expected, the balloon mapping raised a lot of interest and excitement, and this exercise in local mapping was linked to OpenStreetMap later in the week.
The experience with data collection provided the context for discussions about data management and interoperability and design aspects of citizen science applications, as well as more detailed presentations from the participants about their work and research interests. With all these details, the participants were ready to work on their group task: to suggest a research proposal in the area of VGI or Citizen Science. Each group of 5 participants explored the issues that they agreed on – 2 groups focused on a citizen science projects, another 2 focused on data management and sustainability and finally another group explored the area of perception mapping and more social science oriented project.
Some of the most interesting discussions were initiated at the request of the participants, such as the exploration of ethical aspects of crowdsourcing and citizen science. This is possible because of the flexibility in the programme.
Now that the institute is over, it is time to build on the connections that started during the wonderful week in Fiesole, and see how the network of Vespucci alumni develop the ideas that emerged this week.
At the beginning of April, the European Citizen Science Association (ECSA) held its first Annual General Meeting in Copenhagen. In the meeting, which lasted a long afternoon and an evening many topics were covered – from membership (it’s now possible to join) to reports from the working groups. With an aim to be transparent and open ECSA has published all the material from the AGM on its website – including the slides from presentations and talks and the main points from the discussion. I have been involved in the ‘Committee Principles and standards in citizen science: sharing best practice and building capacity’ which was is led by Lucy Robinson from the UK Natural History Museum. One of the first activities that Lucy guided was the development of 10 principles of citizen science, with the aim that they can help ECSA in defining what types of projects to endorse. The tentative principles – shared between the people in the committee and now are provided on the AGM site – see her presentation. However, they are of wider interest and we are, as a group looking for comments. So the principles are:
- Citizen science projects involve citizens who actively contribute to scientific research. Citizens can act as contributors, collaborators, or as project leader and have a meaningful role in the research project (they are not simply research subjects).
- Citizen science projects have a genuine scientific question or goal, if possible resulting from discussions between citizens and professional scientists.
- Citizens are encouraged to participate in multiple stages of the scientific process, from developing the research question to co-designing the research process, gathering and analysing data, co-evaluating the research results and finally publishing the results for different audiences.
- The data gathered and/or analysed are shared and made publicly available either during or after the project, unless there are security or privacy concerns that prevent this. If the results are published academically, where possible this should be in an open access format.
- Participants receive feedback from the project lead on how their contribution adds to the project e.g. how their data will be used and what the research findings are. This adds both reward and opportunity to learn more about the science. The more communication and two-way engagement, the better!
- Citizen science activities celebrate and value the contributions of the citizen, and these are actively acknowledged in project results and publications.
- Citizen science programmes are characterised by mutual respect and acknowledgement of different skills and perspectives. Where possible, steering committees should integrate both scientists and citizen delegates. The scientists and organisers should be mindful of the power relations that exist within this social interaction.
- Citizen science projects should be inclusive. Where possible, inclusiveness should be proactive and not only reactive. Considerations of inclusiveness should include (but are not limited to) level of education, gender, age, religious belief, socio-economic factors and access to technologies.
- Being at the frontier between science and society, citizen science programmes have the opportunity to actively promote transdisciplinarity and links between natural and social sciences.
- Citizen science programmes should be evaluated for their scientific output, data quality, and the impact on participants.
The principles are open for discussion – they are not set in stone. In the discussion that followed the presentation and in a meeting of the ‘committee’ (more like sitting on the floor in a corner of the building), we explored the need for policy connection and how the aims of the project interact with these principles – for example, how applied ecological observations influence their applications. We’re still looking out for comments to develop these principles until they become part of ECSA ‘code of practice’. Comments are welcomed and will be passed to the working group.
Some ideas take long time to mature into a form that you are finally happy to share them. This is an example for such thing.
I got interested in the area of Philosophy of Technology during my PhD studies, and continue to explore it since. During this journey, I found a lot of inspiration and links to Andrew Feenberg’s work, for example, in my paper about neogeography and the delusion of democratisation. The links are mostly due to Feenberg’s attention to ‘hacking’ or appropriating technical systems to functions and activities that they are outside what the designers or producers of them thought.
In addition to Feenberg, I became interested in the work of Albert Borgmann and because he explicitly analysed GIS, dedicating a whole chapter to it in Holding on to Reality. In particular, I was intrigues by his formulation to The Device Paradigm and the notion of Focal Things and Practices which are linked to information systems in Holding on to Reality where three forms of information are presented – Natural Information, Cultural Information and Technological Information. It took me some time to see that these 5 concepts are linked, with technological information being a demonstration of the trouble with the device paradigm, while natural and cultural information being part of focal things and practices (more on these concepts below).
I first used Borgmann’s analysis as part of ‘Conversations Across the Divide‘ session in 2005, which focused on Complexity and Emergence. In a joint contribution with David O’Sullivan about ‘complexity science and Geography: understanding the limits of narratives’, I’ve used Borgmann’s classification of information. Later on, we’ve tried to turn it into a paper, but in the end David wrote a much better analysis of complexity and geography, while the attempt to focus mostly on the information concepts was not fruitful.
The next opportunity to revisit Borgmann came in 2011, for an AAG pre-conference workshop on VGI where I explored the links between The Device Paradigm, Focal Practices and VGI. By 2013, when I was invited to the ‘Thinking and Doing Digital Mapping‘ workshop that was organise by ‘Charting the Digital‘ project. I was able to articulate the link between all the five elements of Borgmann’s approach in my position paper. This week, I was able to come back to the topic in a seminar in the Department of Geography at the University of Leicester. Finally, I feel that I can link them in a coherent way.
So what is it all about?
Within the areas of VGI and Citizen Science, there is a tension between the different goals or the projects and identification of practices in terms of what they mean for the participants – are we using people as ‘platform for sensors’ or are we dealing with fuller engagement? The use of Borgmann’s ideas can help in understanding the difference. He argues that modern technologies tend to adopt the myopic ‘Device Paradigm’ in which specific interpretation of efficiency, productivity and a reductionist view of human actions are taking precedence over ‘Focal Things and Practices’ that bring people together in a way meaningful to human life. In Holding On to Reality (1999), he differentiates three types of information: natural, cultural and technological. Natural information is defined as information about reality: for example, scientific information on the movement of the earth or the functioning of a cell. This is information that was created in order to understand the functioning of reality. Cultural information is information that is being used to shape reality, such as engineering design plans. Technological information is information as reality and leads to decreased human engagement with fundamental aspects of reality. Significantly, these categories do not relate to the common usage of the words ‘natural’, ‘cultural and ‘technological’ rather to describe the changing relationship between information and reality at different stages of socio-technical development.
When we explore general geographical information, we can see that some of it is technological information, for example SatNav and the way that communicate to the people who us them, or virtual globes that try to claim to be a representation of reality with ‘current clouds’ and all. The paper map, on the other hand, provide a conduit to the experience of hiking and walking through the landscape, and is part of cultural information.
Things are especially interesting with VGI and Citizen Science. In them, information and practices need to be analysed in a more nuanced way. In some cases, the practices can become focal to the participants – for example in iSpot where the experience of identifying a species in the field is also link to the experiences of the amateurs and experts who discuss the classification. It’s an activity that brings people together. On the other hand, in crowdsourcing projects that grab information from SatNav devices, there is a demonstration of The Device Paradigm, with the potential of reducing of meaningful holiday journey to ‘getting from A to B at the shortest time’. The slides below go through the ideas and then explore the implications on GIS, VGI and Citizen Science.
Now for the next stage – turning this into a paper…
The Guardian’s Political Science blog post by Alice Bell about the Memorandum of Understanding between the UK Natural Environment Research Council and Shell, reminded me of a nagging issue that has concerned me for a while: to what degree GIS contributed to anthropocentric climate change? and more importantly, what should GIS professionals do?
I’ll say from the start that the reason it concerns me is that I don’t have easy answers to these questions, especially not to the second one. While I personally would like to live in a society that moves very rapidly to renewable energy resources, I also take flights, drive to the supermarket and benefit from the use of fossil fuels – so I’m in the Hypocrites in The Air position, as Kevin Anderson defined it. At the same time, I feel that I do have responsibility as someone who teaches future generations of GIS professionals how they should use the tools and methods of GIScience responsibly. The easy way would be to tell myself that since, for the past 20 years, I’ve been working on ‘environmental applications’ of GIS, I’m on the ‘good’ side as far as sustainability is concerned. After all, the origins of the biggest player in our industry are environmental (environmental systems research, even!), we talk regularly about ‘Design With Nature’ as a core text that led to the overlays concept in GIS, and we praise the foresight of the designers of the UNEP Global Resource Information Database in the early 1980s. Even better, Google Earth brings Climate Change information and education to anyone who want to downloaded the information from the Met Office.
But technologies are not value-free, and do encapsulate certain values in them. That’s what critical cartography and critical GIS has highlighted since the late 1990s. Nadine Schuurman’s review is still a great starting point to this literature, but most of it analysed the link of the history of cartography and GIS to military applications, or, in the case of the volume ‘Ground Truth’, the use of GIS in marketing and classification of people. To the best of my knowledge, Critical GIScience has not focused its sight on oil exploration and extraction. Of course, issues such as pollution, environmental justice or environmental impacts of oil pipes are explored, but do we need to take a closer look at the way that GIS technology was shaped by the needs of the oil industry? For example, we use, without a second thought, the EPSG (European Petroleum Survey Group) definitions of co-ordinates reference systems in many tools. There are histories of products that are used widely, such as Oracle Spatial, where some features were developed specifically for the oil & gas industry. There are secretive and proprietary projections and datums, and GIS products that are unique to this industry. One of the most common spatial analysis methods, Kriging, was developed for the extractive industry. I’m sure that there is much more to explore.
So, what is the problem with that, you would say?
Fossil fuels – oil, coal, gas – are at the centre of the process that lead to climate change. Another important thing about them is that once they’ve been extracted, they are likely to be used. That’s why there are calls to leave them in the ground. When you look at the way explorations and production work, such as the image here from ‘Well Architect‘, you realise that geographical technologies are critical to the abilities to find and extract oil and gas. They must have played a role in the abilities of the industry to identify, drill and extract in places that were not feasible few decades ago. I remember my own amazement at the first time that I saw the complexity of the information that is being used and the routes that wells take underground, such as what is shown in the image (I’ll add that this was during an MSc project sponsored by Shell). In another project (sponsored by BP), it was just as fascinating to see how paleogeography is used for oil exploration. Therefore, within the complex process of finding and extracting fossil fuels, which involves many engineering aspects, geographical technologies do have an important role, but how important? Should Critical GIScientists or the emerging Critical Physical Geographers explore it?
This brings about the more thorny issue of the role of GIS professionals today and more so with people who are entering the field, such as the students who are studying for an MSc in GIS, and similar programmes. If we accept that most of the fossil fuels should stay underground and not be extracted, than what should we say to students? If the person that involved in working to help increasing oil production does not accept the science of climate change, or doesn’t accept that there is an imperative to leave fossil fuels in the ground, I may accept and respect their personal view. After all, as Mike Hulme noted, the political discussion is more important now than the science and we can disagree about it. On the other hand, we can take the point of view that we should deal with climate change urgently and go on the path towards reducing extraction rapidly. In terms of action, we see students joining campaigns for fossil free universities, with which I do have sympathy. However, we’re hitting another difficult point. We need to consider the personal cost of higher education and the opportunity for well paid jobs, which include tackling interesting and challenging problems. With the closure of many other jobs in GIS, what is the right thing to do?
I don’t have an easy answer, nor can I say that categorically I will never work with the extractive sector. But when I was asked recently to provide a reference letter by a student in the oil and gas industry, I felt obliged to state that ‘I can completely understand why you have chosen this career, I just hope that you won’t regret it when you talk with your grandchildren one day in the future’
Following the last post, which focused on an assertion about crowdsourced geographic information and citizen science I continue with another observation. As was noted in the previous post, these can be treated as ‘laws’ as they seem to emerge as common patterns from multiple projects in different areas of activity – from citizen science to crowdsourced geographic information. The first assertion was about the relationship between the number of volunteers who can participate in an activity and the amount of time and effort that they are expect to contribute.
This time, I look at one aspect of data quality, which is about consistency and coverage. Here the following assertion applies:
‘All information sources are heterogeneous, but some are more honest about it than others’
What I mean by that is the on-going argument about authoritative and crowdsourced information sources (Flanagin and Metzger 2008 frequently come up in this context), which was also at the root of the Wikipedia vs. Britannica debate, and the mistrust in citizen science observations and the constant questioning if they can do ‘real research’.
There are many aspects for these concerns, so the assertion deals with the aspects of comprehensiveness and consistency which are used as a reason to dismiss crowdsourced information when comparing them to authoritative data. However, at a closer look we can see that all these information sources are fundamentally heterogeneous. Despite of all the effort to define precisely standards for data collection in authoritative data, heterogeneity creeps in because of budget and time limitations, decisions about what is worthy to collect and how, and the clash between reality and the specifications. Here are two examples:
Take one of the Ordnance Survey Open Data sources – the map present themselves as consistent and covering the whole country in an orderly way. However, dig in to the details for the mapping, and you discover that the Ordnance Survey uses different standards for mapping urban, rural and remote areas. Yet, the derived products that are generalised and manipulated in various ways, such as Meridian or Vector Map District, do not provide a clear indication which parts originated from which scale – so the heterogeneity of the source disappeared in the final product.
The census is also heterogeneous, and it is a good case of specifications vs. reality. Not everyone fill in the forms and even with the best effort of enumerators it is impossible to collect all the data, and therefore statistical analysis and manipulation of the results are required to produce a well reasoned assessment of the population. This is expected, even though it is not always understood.
Therefore, even the best information sources that we accept as authoritative are heterogeneous, but as I’ve stated, they just not completely honest about it. The ONS doesn’t release the full original set of data before all the manipulations, nor completely disclose all the assumptions that went into reaching the final value. The Ordnance Survey doesn’t tag every line with metadata about the date of collection and scale.
Somewhat counter-intuitively, exactly because crowdsourced information is expected to be inconsistent, we approach it as such and ask questions about its fitness for use. So in that way it is more honest about the inherent heterogeneity.
Importantly, the assertion should not be taken to be dismissive of authoritative sources, or ignoring that the heterogeneity within crowdsources information sources is likely to be much higher than in authoritative ones. Of course all the investment in making things consistent and the effort to get universal coverage is indeed worth it, and it will be foolish and counterproductive to consider that such sources of information can be replaced as is suggest for the census or that it’s not worth investing in the Ordnance Survey to update the authoritative data sets.
Moreover, when commercial interests meet crowdsourced geographic information or citizen science, the ‘honesty’ disappear. For example, even though we know that Google Map Maker is now used in many part
s of the world (see the figure), even in cases when access to vector data is provided by Google, you cannot find out about who contribute, when and where. It is also presented as an authoritative source of information.
Despite the risk of misinterpretation, the assertion can be useful as a reminder that the differences between authoritative and crowdsourced information are not as big as it may seem.