A new book has just been published about OpenStreetMap and Geographic Information Science. The book, which was edited by Jamal Jokar Arsanjani, Alexander Zipf, Peter Mooney, Marco Helbich is “OpenStreetMap in GISciene : Experiences, Research, and applications” contains 16 chapters on different aspects of OpenStreetMap in GIScience including 1) Data Management and Quality, 2) Social Context, 3) Network Modeling and Routing, and 4) Land Management and Urban Form.
If you follow the discussion (search in Twitter for #geothink) you can see how it evolved and which issues were covered.
At one point, I have asked the question:
It is always intriguing and frustrating, at the same time, when a discussion on Twitter is taking its own life and many times move away from the context in which a topic was brought up originally. At the same time, this is the nature of the medium. Here are the answers that came up to this question:
You can see that the only legal expert around said that it’s a tough question, but of course, everyone else shared their (lay) view on the basis of moral judgement and their own worldview and not on legality, and that’s also valuable. The reason I brought the question was that during the discussion, we started exploring the duality in the digital technology area to ownership and responsibility – or rights and obligations. It seem that technology companies are very quick to emphasise ownership (expressed in strong intellectual property right arguments) without responsibility over the consequences of technology use (as expressed in EULAs and the general attitude towards the users). So the nub of the issue for me was about agency. Software does have agency on its own but that doesn’t mean that it absolved the human agents from responsibility over what it is doing (be it software developers or the companies).
In ethics discussions with engineering students, the cases of Ford Pinto or the Thiokol O-rings in the Discovery Shuttle disaster come up as useful examples to explore the responsibility of engineers towards their end users. Ethics exist for GIS – e.g. the code of ethics of URISA, or the material online about ethics for GIS professional and in Esri publication. Somehow, the growth of the geoweb took us backward. The degree to which awareness of ethics is internalised within a discourse of ‘move fast and break things‘, software / hardware development culture of perpetual beta, lack of duty of care, and a search for fast ‘exit’ (and therefore IBG-YBG) make me wonder about which mechanisms we need to put in place to ensure the reintroduction of strong ethical notions into the geoweb. As some of the responses to my question demonstrate, people will accept the changes in societal behaviour and view them as normal…
After a very full first day, the second day opened with a breakfast that provided opportunity to meet the board of the Citizen Science Association (CSA), and to talk and welcome people who got up early (starting at 7am) for another full day of citizen science. Around the breakfast tables, new connections were emerging. Similarly to the registration queue in the first day, people where open and friendly, starting conversations with new acquaintances, and sharing their interest in citizen science. An indication to the enthusiasm was that people continued talking as they departed to the morning sessions.
The session explored the use of different data collection tools to capture and share traditional knowledge. Dawn Wright, Esri chief scientist started with Emerging Citizen Science Initiatives at Esri. Dawn started with Esri view of science – beyond fundamental science understanding, it is important to see science as protecting life, enabling stewardship and to share information about how the Earth works, how it should look (geodesign) and how we should look at the Earth. As we capture the data with various mobile devices – from mobile phones to watches and sensors we are becoming more geoaware and geoenabled. The area of geotechnologies that enable it – are apps and abilities such as storytelling are very valuable. Esri views geoliteracy as combination of understanding geography and scientific data – issues are more compelling when they are mapped and visualised. The Collector for ArcGIS provide the ability to collect data in the field, and it has been used by scouts as well as in Malawi where it is used by indigenous farmers to help in managing local agriculture. There are also abilities to collect information in the browser with ‘GeoForm’ that support such data collection. Maps were used to collect information about street light coverage and buffering the range that is covered. A third method is a StoryMaps.arcgis.com that allow to tell information with a narrative. Snap2Map is an app that allow to link data collection and put it directly to story-maps. There is also a crowdsource.storymaps.arcgis.com that allow collection of information directly from the browser.
Michalis Vitos, UCL – Sapelli, a data collection platform for non-literate, citizen-scientists in the rainforest. Michalis described the Extreme Citizen Science group – which was set up with the aim to provide tools for communities all over the world. In the Congo-basin communities face challenges from illegal logging and poaching , but forest people have direct competition for resources such as the trees that they use, and with the FLEGT obligations in the Republic of Congo, some protection is emerging. The team collaborate with a local NGOs which works with local communities, and there are challenges including literacy, energy, and communication. Sapelli collector is an application work with different levels that allow the data collection area. The Sapelli launcher locks the interface of the phone, and allow specific functions to be exposed to the user. The issue of connectivity was address in communication procedures that use SMS. The issue of providing electricity can be done in different ways – including while cooking. There is a procedure for engaging with a community – starting with Free and Prior Informed Consent, and the process start with icons, using them in printed form and make sure that the icons are understood – after the agreement on the icons, there is an introduction to the smartphones – how to touch, how to tap and the rest of the basics. The next stage is to try it in the field. Sapelli is now available in Google Play – the next stage is to ensure that we can show the participants what they collected, but as satellite images are difficult to obtain, the group is experimenting with drone imagery and mapping to provide the information back to the community. In terms of the results to the community, the project is moving from development to deployment with a logging company. The development of the icons is based on working with anthropologists who discuss the issues with the community and lead the development of the icons. Not all the icons work and sometime need to be change. The process involved compensating the community for the time and effort that they put in.
Sam Sudar, University of Washington – Collecting data with Open-Data-Kit (ODK) - Sam gave a background on the tool – the current version and the coming ODK 2.0. ODK is information management tools for collecting and storing data and making it usable, targeted at resource-constrained environment – anywhere where there is limited connectivity, without assuming smartphone literacy. It is used all over the world. It is being used in Kenya, and by Jane Goodall Institute (JGI) in Tanzania, the Surui tribe use it in Brazil to gain carbon credits, and the Carter Center in Egypt for election monitoring, as well as WWF in Rwanda. The technology is used in very diverse ways. Need to consider how technology empowers data collection. The ODK workflow is first, build the form, collect the data, and finally aggregate the results. ODK build / ODK XLSform is the way to build it in Excel, then there is ODK collect to render the forms, and finally ODK aggregate can run locally or on Google App Engine. There is a strong community around ODK with much support for it. In ODK 1.0 there is no data update on the mobile device, as it replicated the paper process. There is limitation for customisation of the interface, or linking to sensors. ODK 2.0 can provide better abilities and it allow syncing of information even it is done on the cloud. The ODK survey replacing ODK collect, and the ODK tables is a way to interact with data on the device. The intention is to make it possible to interact with the data in an easier way.
A question from the audience asked if local communities worries about the data collected about them? ODK work with a lot of medical information, but the team doesn’t goes on the ground so it is left to whoever use the system to ensure ethical guidelines are followed. Michalis noted that there are not only problems with external body, but also cultural sensitivities about what data should be seen by whom, and there is an effort to develop tools that are responsive to it.
Tanya Birch, Google Earth – Outreach Community-based field data collection and Google mapping tools the video include Jane Goodall work in Tanzania with Chimpanzee, due to habitat lost, there are less than 300,000 chimpanzee left in the wild. In the video, Lillian Pintea (JGI) noted the importance of satellite images that demonstrate all the bare hills in the area of Tanzania. That lead to improve the life of the local villagers so they become partners in conservation. The local communities are essential – they share the status of the work with the people in the village. The forest monitor role is to work across the area, collect data and monitor it to ensure that they can collected data with ODK. Location information is easier in tablet and then upload it to Google, and then it is shared with global effort to monitor forests. Gombe national park is the laboratory for scaling up across the area of habitat of Chimpanzees and using Google abilities and reach to share it widely.
Another question that came up was: How you have used the tools with youth or challenges of working with young people? Dawn noted that the engagement with youth, the term digital native is true and they end teaching the teachers on how to improve the apps. The presentations discussed the simplicity in technology so you don’t need to know what is going on in the background. Another question is: do people want to change the scale of analysis – standing in the point and taking a picture of a mountain, and how to address different scales? Dawn noted that the map as part of the collection tool allow people to see it as they collect the data and for example allow them to indicate the scale of what they viewed. Michalis noted that there is also the option in Sapelli to measure scale in football pitches, and Luis noted that in CyberTracker, there is an option to indicate that the information was collected in a different place to where the observer is. Data sharing is something that is important, but make sure that it can be exported in something as simple as
6E Symposium: Human-Centred Technologies for Citizen Science
Kevin Crowston (Syracuse U.) & Andrea Wiggins (U. Maryland & symposium convener): Project diversity and design implications describe a survey in which most attention was paid to small projects, and by surveying a wider range of projects they discover different practices. To evaluate the design implication they suggested that we need to understand what the goal of the project, the participation activities – from science, conservation, to photography – different things that people are doing, with observations is the most common type of contribution (see First Monday paper). Data quality come up in all the projects and there are different strategies to deal with it. There are diversities of engagement – from conference and meetings to social media. There are also rewards for participation – some projects are not doing rewards at all, others provide volunteer appreciation, training , equipment and another approach is to provide competitive rewards in leaderboards. There are also socialisation – and even formal education. Funding – diverse, from grants, private contributions, to sponsorship and sustainability is an issue.
Mobile and Social Technologies
-Anne Bowser (U. Maryland) Gamifying phenology with Floracaching app – geocaching for plants – the application focuses on phenology and earlier version was developed for Project BudBurst. Traditional volunteers focus contribution to science, while millennials might be interested in mobile app that is based on games. Embedded maps can be used to create a cache and there is a leader-board and points. Floracaching was created from paper prototyping and focus groups. They found perception of gamification was important to millennials, they also enjoyed competition. Also wanted to be told what to do and feedback on how they’ve done. ‘I’m not going to drive an hour to see a plant bloom’ . Missions can be added to the design and help people to learn the application and the data collection.
-Michalis Vitos (UCL): Sapelli, a mobile data collection platform for non-literate indigenous communities, Michalis covered Sapelli, and the importance of the interface design (see previous session). The design of the icons is being discussed with, effectively, paper prototyping
-Muki Haklay (UCL): Geographical human-computer interaction for citizen science apps (I’ll blog it later!)
-Matt Germonprez, Alan Kolok, U. Nebraska Omaha, & Matt Levy (San Francisco State U.): Enacting citizen science through social media - Matt come from a technology angle – he suggested that social media is providing different form of information, and social media – can it be integrated into a citizen science projects. The science project is to monitor Atrazine which started in 2012, with a process similar to a litmus test, the project worked, but they wanted to use social media in the social setting that they work. Facebook wasn’t used beyond the information, but Twitter and Instagram was used to report observations publicly. The problems – no social conversations, so the next stage they want to maintain social conversation as the next goal. The project can be found when you search for Lil’ Miss Atrazine.
-Jen Hammock (Smithsonian Institution): An infrastructure for data distribution and use, the aim of the project of looking at snails – findability problem, a tool that they want to develop is for data search – so following different sources for information, and merging the taxa, location, as well as providing alerts about interests. Notification will be provided to the researcher and to the contributor. There can be knowledge about the person that contribute the information. There are technical and social barriers – will researchers and experienced naturalists be interested in sharing information.
-Yurong He (U. Maryland): Improving biodiversity data sharing among diverse communities. looking at biodiversity – and the encyclopaedia of life. There are content partners who provide the data. She looked at 259 content partners and found 6 types of data providers – and they are professional organisations that operate over time such as IUCN, NHM etc. The second type is repositories, professional database emerge in the 1990s. There are citizen science intiative and communities of interest, such as Xeno-Canto for bird song. Fourth, social media platforms such as wikipedia, Fifth, education communities who add information while they focus on education and finally subsidiaries. We need to know the practices of the providers more to support sharing of information.
-Stuart Lynn, Adler Planetarium & Zooniverse: Developing tools for the next scientific data deluge. Stuart discussed about their online community. They have 1.2m users. The challenge in the future is that there are going to be many projects and data sources that give huge amount of data. The aim is to partner with machine learning algorithm developers but how to keep the crowd interested and not just give the most difficult cases with no opportunity to learn or progress slowly. Gamification can be stressful, so they try to give more information and learning. They also try to create a community and discuss the issues. There is huge distribution of comments – and deepening engagement. There is no one size fits all and we need to model and understand them better.
Contributors and Communities
-Jenny Preece (U. Maryland): Motivating and demotivating factors for long-term participation – what motivate people to come back again and again. The different motivational aspects – describing the work of the late Dana Rotman who collected information in the US, India and Costa Rica. 142 surveys from the us, 156 from India and also interviews in the three countries. She used grounded theory approach and developed a framework initial, and for long term impact there are internal and external motivation. Demotivations – time, problems with technology, long commitment with the task.
-Carsten Oesterlund, Gabriel Mugar, & Kevin Crowston (Syracuse U.): Technology features and participant motivations, the heterogeneity and variety of participants – how might we approach them? people change over time? looking at zooniverse – specifically planet hunters, there are annotations, talk and other sources of information. The talk pages – new comers and encouraged to annotate and comment about the image and also looking at what other people have done. They also find people that are more experienced. Use of talk change over time, people start putting in comments, then they go down and stop commenting and then later on started putting more information. There is also role discovery in terms of engagement and what they do in their community.
-Charlene Jennet (UCL): Identifying and promoting creativity – creativity is a puzzling question, which is debated in psychology with some people look for breakthrough moment, while other look at everyday creativity. There are examples of projects that led to creativity – such as foldit, in terms of everyday creativity in citizen cyberscience and conducting interviews with volunteers and results include artwork from the old weather forum or the Galaxy Zoo Peas and eyewire chatbots that were created for members. People who are engaged in the project are contributing more to the project. Providing feedback on progress is important, and alos regular communication and personal feedback in blogs and answering in tweeters. Event help and also need to have ability role management.
-Carl Lagoze (U. Michigan) Inferring participant expertise and data quality – focusing on eBird and there is a paper in big data and society. The standard way is to control the provenance of the data. The library is creating ‘porous zone’ so today there is less control over the who area. There are barriers that break down between novices and experts. How can we tell experts/non experts – this happen across areas, and it is sort of distributed sensor network with weak sensors. are there signal in the data that help you to identify people and the quality of their information.
7C Panel: Citizen Science and Disasters: The Case of OpenStreetMap –
Robert Soden (University of Colorado, Boulder) described the GFDRR project of Open Cities to collect data for resilience planning and explained the reasons to select OpenStreetMap to use for it. Kathmandu is recognised as at risk place, and there was an aim to identify schools that are at risk, but there was a need to do the basic mapping. There was a local partnership with universities in the area. There was a challenge of figuring out data model – number of stories, usage, roof type, wall type, age. There was a need to make students to collect information that will help in modelling the risk. They produced a lot of training material. The project was successful in collecting the data and enriching the information. The process helped in creating an OpenStreetMap community out of it, and then they launched a local NGO (Kathmandu Living Labs). Trust in the data was important and there was a risk of discrediting the data – to deal with that, they involved targeted users early as well as spot check the data and done a fuller assessment of the data. They launching similar projects in Jamaica. Vietnam and Madagascar. They want to engage people in more than just data collection, and how they can be support to grow the community
Mikel Maron (Humanitarian OpenStreetMap Team) Mikel covered what is OpenStreetMap (OSM), the OSM foundation is a different entity than Wikimedia, which is confusing. OSM are a very wide community of many thousands of people that continue to contribute. Humanitarian OpenStreetMap Team (H.O.T) is following the ‘Cute Cat theory for humanitarian maps’ – use something that is alive and people are used to contribute to, when you need it in emergency situations. OSM is used in many organisation and projects in government. Attempts to map all these organisations is challenging. In Bangladesh, there are 6 OSM projects and require cooperation between agencies – at least all projects contribute to the same database. Organisations find it challenging that they need to support but can’t control. Starting from Gaza in 2009, OSM community started to map the area although there was no specific request. OSM was eventually used to create local tourist map. The community in Gaza didn’t continue – providing long term support is difficult.Haiti 2010 helped in producing the data and it was difficult to coordinate, so that led to the tasking manager. MapGive is providing support through imagery to the crowd – a way to support OSM by utilising the DigitalGlobe database. There are development of linking OSM and citizen science. There is very rich data in OSM and there is need to understand social science and data research.
8E Symposium: Ethical Dimensions of Citizen Science Research
Caren Cooper opened with a list of issues: participation vs exploitation; beneficence, maleficence, autonomy and justice; incentives vs manipulation; IP and data ownership; data misuse, sharing accessiblity; opennes vs privacy and security; cultural competence.
Holly Menninger led yourwildlife.org – the project that she focusing on – home microbiom at home. Asking dust samples from home that volunteers share and they look at the content. Volunteers want to understand their home but also the science. There was the issue of reporting back to participants – They want to understand the information, and they provided some information and it was a challenge to translate the scientific information into something useful. People are interested in the information at home, sometime due to personal issues – e.g. request to get the results because someone is ill in the house. There is a lag of 2 years between samples and results, and it need to be explained to the participants. There is also an issue that the science is exploratory, which mean that there are no specific answers that can be answered for participants.
Madhusudan Katti explored the appropriation of citizens knowledge. In the realm of IP in traditional knowledge is discussed a lot. Appropriating local knowledge and then publishing when the information came from local knowledge through interviews – but the scientists get the fame. Collecting information about engendered species where there is risk from local community. he mentioned the film Living with elephants which focus on the conflicts between humans and elephants but that also might help poachers.
Janet Stemwedel highlighted that even participant-led citizen science can be helped with DIY science. DIY science it is self efficacy, and control the process, so if the participants running the show, than what can go wrong? Who better to protect my autonomy than me? The answer that autonomy is tricky and need good information about potential risks and benefits and your current choices can hurt future prospects for choosing freely (don’t use autonomy to get addicted, or what you do with your personal information), finally our exercise of autonomy can impact others’ prospects of free choice (DNA analysis have an impact on your wider family). Institutional Research Board (IRB) is a mechanism to think it through – potential consequence (good and bad), who could be impacted? strategies for answering the question. Reasons to resist IRB – not legally required, and the academic scientists complain about it, as well as no access to an IRB.
The reason to get over the resistance is that unintentional harm is not a good thing, also to get feedback from more eyes helped to know about tools and approach. Ethical objectivity is to go beyond just gut feeling and discuss with other people.
Anne Bowser discussed the ethics of gamification – the use of game design elements in non-game contexts (using leader boards). Old weather had an element of games, and also the floracaching as an example. There is labour/exploitation too – in games such as Civilization II is done for fun, while you learn about history. Online games are using different approaches to extract more from their users. Does contribution to science cleanse the ethical issues because it’s not for motives? crowdsourcing was critique in different ways. There are also tracking and privacy, so it also provide habits and all sort of details about the users (e.g. in foursquare) – salesforce is getting badges to encourage people to act in specific ways as employees. Ethical citizen science: treat participants as collaborators; don’t waste volunteer time; volunteers are not computers (Prestopnik & Cowston 2012). Ethical design allow participants to be aware of the implication and decide if they want gamification or not.
Lea Shanley – covering data privacy – her awareness came from working with Native American tribes, with participatory mapping. Tribes started to use participatory GIS. There were many things they wanted to map – and the participants had difference in views about sharing the data or not. Some places were careful and some was not. In disaster response, there is all the social media curation, and many people that are open data evangelist and they started sharing location of first aiders location and actually risking them. In citizen science, there is lack of attention to location – places were they recorded, and even real time information that risk physical security of participants. Face recognition is possible. Information collected by volunteer can reveal medical information that can harm people prospects. sensitive information, sacred sites location, endangered species. Toxic environments can risk volunteers. There are also issues with who interpret and manage the data. social norms and reinforcing social norms. An emerging area is security of social media – crowdsourcing teams where hacked in DARPA red balloon challenge. There can be issues with deliberate hacking to citizen science from people who don’t like it.
Dianne Quigley – Northeast Ethics Education Partnership, that came from issues of environmental and social justice to improve ethical knowledge of researchers. When researchers start with a community they start with discussion of risk/benefits and consider who is getting something out of it. Training graduate students to know how to work with communities. avoid harming – non-maleficence; also informed consent of working with communities, protecting data; justice is a way to think of linguistic diversity, respect to local knowledge, and also recruitment in a fair way in terms of representation. Data management and protocols. There is a need to learn humility – to respect the needs and practices of the community.
There are ideas to start an ethics group in the CSA and consider code of ethics or participant bill of rights. do we need to extend IRB oversight? co-created common rule? is there a value in code of ethics or will it be a dead word? The discussion explored the need bottom up projects which also need to consider the impacts and outputs, communication with the public and promising what the research will deliver, and the investment of time in citizen science by early career researchers can also impact their career prospect. These are challenges that are common in community participatory research.
The panel is specifically reflecting on the citizenship aspects of citizen science. Citizen science is a significant phenomena, and feeling that need a critical voice within it. What is the place of the citizen in citizen science? question about governance, methodologies practices and methodologies. How does it connect to wider democratisation of knowledge?
Eugenia Rodrigues (University of Edinburgh, UK) asked: what model of citizenship it promotes? one way is to look at the demographics, but we can ask about the term – possible to use volunteer, amateur, or extended peer community (as in Post-Normal Science). The term citizen include autonomy, creativity, liberty , responsibility, having a stake and other meaning. What are the citizens doing and are we constructing a story that recognises the citizen scientists as a citizen? The story that is appearing in work in North-east of England dealing with water pollution in local woodland, where they noted that the Environment Agency was not doing things satisfactory way, so their need of their local habitat was overlooked. In this case we have contextual/experiential knowledge and expert monitoring skills to lead to a change. Citizen science can be seen as counter expertise. We need to include – some classification are trying to control the role of the citizens, the need to control levels of participation to improve quality, do not give space for participants to exercise their citizenship fully.
Shannon Dosemagen (Public Lab) – in public lab there are specific attention to environmental monitoring and there is a need to re-imagine the role. In public lab they prefer to use civic science or community science and not citizen science because it can be controversial or different in different places. They also think of scientists and non-scientists not in a supplicant way. Consider how engage people in the whole process. Different roles play out in different ways – they want to be active about it. There are different roles within the community of public lab but it is about egalitarian approach to roles?
Esther Turnhout (Wageningen University) looking at expertise and quality control in citizen science networks for biodiversity knowledge. Biodiversity knowledge is existing in amateur naturalists and they started using the term citizen science. To conceptualise – there are complex relationships with mainstream science. Biodiversity recording been around for a long time and the data is increasing demand for decision making. What it brought with it is demand to professionalise and increase standards and quality. The validation is the complex networks of amateurs, experts, professionals and decision makers – looking at actors in the network. Validation is done in different places with different motivations – there are hierarchical network inside the naturalists groups and enforcing them with novices. The digitise data is compared with existing observation and there is reciprocity between observer and the process of collecting and organise the data. There are lots of things – butterflies, community of observers, the field guide – the process is circular. But increasingly, validation is imposed and procedural. Validation seizes to be collective and the records no longer circulate. The main concern is to keep check where the data go and belong to the observer. The citizenship dependent on not just turning the data into probabilities. There is a need to maintain control over the data.
Rick Hall (Ignite!, UK) there been different learned societies around the country – the learned societies that emerged from the 18th century, the acts of enclosures and the workhouses enslaved large groups in society. Today, we can ask about Internet barons if they are trying to do the same as mill owners. There is a cultural entitlement in the human right declaration. The current president of the Royal Society – finding things for yourself is at the very heart of science. It matter where it takes place – for example in a popup shop that allows community curiosity labs and explore questions that matter to them. Spaces in schools that young people can take ownership over their investigations. In spaces like Lab_13 are spaces to learn how to become a scientist. The issues are asking young people what people want to know know. We need spaces where citizens learn not just science but how to become scientists… We need more community and civic citizen scientists because the world need more curios minds.
Erinma Ochu (University of Manchester, UK) – as a neuroscientist she found her research that it requires empathy and stories as a way the science evolved as powerful and controlling. What happen when you bring science to the public realm? How to ensure that it is inclusive for women and minorities?
For me, the discussion highlighted that it was mostly about collective action and egalitarianism in the production of knowledge -so expertise without hierarchy.
another observer raised the issue of democratisation and what notion of political actions we would like to see within citizen science
The final keynote was from Amy Robinson EyeWire: Why Do Gamers Enjoy Mapping the Brain? demonstrating the game and how it works. Lessons from EyeWire – it’s been running for 2 years and a lot of things that were learned. The idea: if we build it, they will play – that’s not happen. Actually, carefully crafted, slowly built community – creating the tools, learning about how things are used. Media is crucial – 60% of eyewire registration came within 5 days of major media event. Major media event is in facebook, twitter and other social media – suddenly things are coming from media. Facebook page can convert viewers to participants. Media relations are an active engagement, not just waiting for journalist – share all sort of things, and funny things. Reaching out to media also require being prepared to it – and you need to cope with it and capture it. Create internal analytics to understand how the project works. Engagement is also a major issue – there is a huge drop off after two months. By creating games and missions can provide a reason to capture people’s interest. Prestige within the community can work to motivate them – changing the user handle colour can demonstrate the recognition by the project. There are also specific challenges and set their own challenges. Accuracy and efficiency – using the power players in the game to have a bigger role in the project. How do you recognise a potential power players in your game? Design of the entry page is critical – the page is minimalist and reduce the amount of information that you need to enter the system. They have created all sort of interesting collaboration such as fascinating visualisations. There is also need to take risks and see if they are going to work or not.
Abe Miller-Rushing close the conference asking people to share talks and links, as well as posters will come online. We are aiming to create a community and serve the needs. The new board chair, Greg Newman continue with some take aways from the conference which completed the conference.
Another account of the conference is available at https://wildlifesnpits.wordpress.com/2015/02/12/power-of-the-people-thoughts-from-the-first-citizen-science-association-conference/
The Open University, with support from Nominet Trust and UTC Sheffield have launched the nQuire-it.org website, which seem to have a great potential for running citizen science activities. The nQuire platform allows participants to create science inquiry ‘missions’. It is accompanied by an Android app called Sense-it that exposed all the sensors that are integrated in a smartphone and let you see what they are doing and the values that they are showing.
The process of setting up a project on the nQuire-it site is fairly quick and you can figure it out in few clicks. Then, joining the project that you’ve created on the phone is also fairly simple, and the integration with Google, Facebook and Twitter accounts mean that linking the profiles is quick. Then you can get few friends to start using it, and the Sense-it app let you collect the data and then share it with other participants in the project on the nQuire website. Then participants can comment on the data, ask questions about how it was produced and up or down vote it. All these make nQuire a very suitable place for experimentation with sensors in smartphones and prototyping citizen science activities. It also provides an option for recording geographic location, and it good to see that it’s disabled by default, so the project designer need to actively switch it on.
Thanks to invitations from UNIGIS and Edinburgh Earth Observatory / AGI Scotland, I had an opportunity to reflect on how Geographic Information Science (GIScience) can contribute to citizen science, and what citizen science can contribute to GIScience.
Despite the fact that it’s 8 years since the term Volunteers Geographic Information (VGI) was coined, I didn’t assume that all the audience is aware of how it came about or the range of sources of VGI. I also didn’t assume knowledge of citizen science, which is far less familiar term for a GIScience audience. Therefore, before going into a discussion about the relationship between the two areas, I opened with a short introduction to both, starting with VGI, and then moving to citizen science. After introduction to the two areas, I’m suggesting the relationships between them – there are types of citizen science that are overlapping VGI – biological recording and environmental observations, as well as community (or civic) science, while other types, such as volunteer thinking includes many projects that are non-geographical (think EyeWire or Galaxy Zoo).
However, I don’t just list a catalogue of VGI and citizen science activities. Personally, I found trends a useful way to make sense of what happen. I’ve learned that from the writing of Thomas Friedman, who used it in several of his books to help the reader understand where the changes that he covers came from. Trends are, of course, speculative, as it is very difficult to demonstrate causality or to be certain about the contribution of each trends to the end result. With these caveats in mind, there are several technological and societal trends that I used in the talk to explain how VGI (and the VGI element of citizen science) came from.
Of all these trends, I keep coming back to one technical and one societal that I see as critical. The removal of selective availability of GPS in May 2000 is my top technical change, as the cascading effect from it led to the deluge of good enough location data which is behind VGI and citizen science. On the societal side, it is the Flynn effect as a signifier of the educational shift in the past 50 years that explains how the ability to participate in scientific projects have increased.
In terms of the reciprocal contributions between the fields, I suggest the following:
GIScience can support citizen science by considering data quality assurance methods that are emerging in VGI, there are also plenty of Spatial Analysis methods that take into account heterogeneity and therefore useful for citizen science data. The areas of geovisualisation and human-computer interaction studies in GIS can assist in developing more effective and useful applications for citizen scientists and people who use their data. There is also plenty to do in considering semantics, ontologies, interoperability and standards. Finally, since critical GIScientists have been looking for a long time into the societal aspects of geographical technologies such as privacy, trust, inclusiveness, and empowerment, they have plenty to contribute to citizen science activities in how to do them in more participatory ways.
On the other hand, citizen science can contribute to GIScience, and especially VGI research, in several ways. First, citizen science can demonstrate longevity of VGI data sources with some projects going back hundreds of years. It provides challenging datasets in terms of their complexity, ontology, heterogeneity and size. It can bring questions about Scale and how to deal with large, medium and local activities, while merging them to a coherent dataset. It also provide opportunities for GIScientists to contribute to critical societal issues such as climate change adaptation or biodiversity loss. It provides some of the most interesting usability challenges such as tools for non-literate users, and finally, plenty of opportunities for interdisciplinary collaborations.
The slides from the talk are available below.
If you have been reading the literature on citizen science, you must have noticed that many papers that describe citizen science start with an historical narrative, something along the lines of:
As Silvertown (2009) noted, until the late 19th century, science was mainly developed by people who had additional sources of employment that allowed them to spend time on data collection and analysis. Famously, Charles Darwin joined the Beagle voyage, not as a professional naturalist but as a companion to Captain FitzRoy[*]. Thus, in that era, almost all science was citizen science albeit mostly by affluent gentlemen and gentlewomen scientists[**]. While the first professional scientist is likely to be Robert Hooke, who was paid to work on scientific studies in the 17th century, the major growth in the professionalisation of scientists was mostly in the latter part of the 19th and throughout the 20th century.
Even with the rise of the professional scientist, the role of volunteers has not disappeared, especially in areas such as archaeology, where it is common for enthusiasts to join excavations, or in natural science and ecology, where they collect and send samples and observations to national repositories. These activities include the Christmas Bird Watch that has been ongoing since 1900 and the British Trust for Ornithology Survey, which has collected over 31 million records since its establishment in 1932 (Silvertown 2009). Astronomy is another area in which amateurs and volunteers have been on a par with professionals when observation of the night sky and the identification of galaxies, comets and asteroids are considered (BBC 2006). Finally, meteorological observations have also relied on volunteers since the early start of systematic measurements of temperature, precipitation or extreme weather events (WMO 2001). (Haklay 2013 emphasis added)
The general messages of this historical narrative are: first, citizen science is a legitimate part of scientific practice as it was always there, we just ignored it for 50+ years; second, that some citizen science is exactly as it was – continuous participation in ecological monitoring or astronomical observations, only that now we use smartphones or the Met Office WOW website and not pen, paper and postcards.
The second aspect of this argument is one that I was wondering about as I was writing a version of the historical narrative for a new report. This was done within a discussion on how the educational and technological transitions over the past century reshaped citizen science. I have argued that the demographic and educational transition in many parts of the world, and especially the rapid growth in the percentage and absolute numbers of people with higher education degrees who are potential participants is highly significant in explaining the popularity of citizen science. To demonstrate that this is a large scale and consistent change, I used the evidence of Flynn effect, which is the rapid increase in IQ test scores across the world during the 20th century.
However, while looking at the issue recently, I came across Jim Flynn TED talk ‘Why our IQ levels are higher than our grandparents‘ (below). At 3:55, he raise a very interesting point, which also appears in his 2007 ‘What is Intelligence?‘ on pages 24-26. Inherently, Flynn argues that the use of cognitive skills have changed dramatically over the last century, from thinking that put connections to concrete relationship with everyday life as the main way of understanding the world, to one that emphasise scientific categories and abstractions. He use an example of a study from the early 20th Century, in which participants where asked about commonalities between fish and birds. He highlights that it was not the case that in the ‘pre-scientific’ worldview people didn’t know that both are animals, but more the case that this categorisation was not helpful to deal with concrete problems and therefore not common sense. Today, with scientific world view, categorisation such as ‘these are animals’ come first.
This point of view have implications to the way we interpret and understand the historical narrative. If correct, than the people who participate in William Whewell tide measurement work (see Caren Cooper blogpost about it), cannot be expected to think about contribution to science, but could systematically observed concrete events in their area. While Whewell view of participants as ‘subordinate labourers’ is still elitist and class based, it is somewhat understandable. Moreover, when talking about projects that can show continuity over the 20th Century – such as Christmas Bird Count or phenology projects – we have to consider the option that an the worldview of the person that done that in 1910 was ‘how many birds there are in my area?’ while in 2010 the framing is ‘in order to understand the impact of climate change, we need to watch out for bird migration patterns’. Maybe we can explore in historical material to check for this change in framing? I hope that projects such as Constructing Scientific Communities which looks at citizen science in the 19th and 21th century will shed light on such differences.
[*] Later I found that this is not such a simple fact – see van Wyhe 2013 “My appointment received the sanction of the Admiralty”: Why Charles Darwin really was the naturalist on HMS Beagle
[**] And we shouldn’t forget that this was to the exclusion of people such as Mary Anning
The Association of American Geographers is coordinating an effort to create an International Encyclopedia of Geography. Plans started in 2010, with an aim to see the 15 volumes project published in 2015 or 2016. Interestingly, this shows that publishers and scholars are still seeing the value in creating subject-specific encyclopedias. On the other hand, the weird decision by Wikipedians that Geographic Information Science doesn’t exist outside GIS, show that geographers need a place to define their practice by themselves. You can find more information about the AAG International Encyclopedia project in an interview with Doug Richardson from 2012.
As part of this effort, I was asked to write an entry on ‘Volunteered Geographic Information, Quality Assurance‘ as a short piece of about 3000 words. To do this, I have looked around for mechanisms that are used in VGI and in Citizen Science. This are covered in OpenStreetMap studies and similar work in GIScience, and in the area of citizen science, there are reviews such as the one by Andrea Wiggins and colleagues of mechanisms to ensure data quality in citizen science projects, which clearly demonstrated that projects are using multiple methods to ensure data quality.
Below you’ll find an abridged version of the entry (but still long). The citation for this entry will be:
Haklay, M., Forthcoming. Volunteered geographic information, quality assurance. in D. Richardson, N. Castree, M. Goodchild, W. Liu, A. Kobayashi, & R. Marston (Eds.) The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. Hoboken, NJ: Wiley/AAG
In the entry, I have identified 6 types of mechanisms that are used to ensure quality assurance when the data has a geographical component, either VGI or citizen science. If I have missed a type of quality assurance mechanism, please let me know!
Here is the entry:
Volunteered geographic information, quality assurance
Volunteered Geographic Information (VGI) originate outside the realm of professional data collection by scientists, surveyors and geographers. Quality assurance of such information is important for people who want to use it, as they need to identify if it is fit-for-purpose. Goodchild and Li (2012) identified three approaches for VGI quality assurance , ‘crowdsourcing‘ and that rely on the number of people that edited the information, ‘social’ approach that is based on gatekeepers and moderators, and ‘geographic’ approach which uses broader geographic knowledge to verify that the information fit into existing understanding of the natural world. In addition to the approaches that Goodchild and li identified, there are also ‘domain’ approach that relate to the understanding of the knowledge domain of the information, ‘instrumental observation’ that rely on technology, and ‘process oriented’ approach that brings VGI closer to industrialised procedures. First we need to understand the nature of VGI and the source of concern with quality assurance.
While the term volunteered geographic information (VGI) is relatively new (Goodchild 2007), the activities that this term described are not. Another relatively recent term, citizen science (Bonney 1996), which describes the participation of volunteers in collecting, analysing and sharing scientific information, provide the historical context. While the term is relatively new, the collection of accurate information by non-professional participants turn out to be an integral part of scientific activity since the 17th century and likely before (Bonney et al 2013). Therefore, when approaching the question of quality assurance of VGI, it is critical to see it within the wider context of scientific data collection and not to fall to the trap of novelty, and to consider that it is without precedent.
Yet, this integration need to take into account the insights that emerged within geographic information science (GIScience) research over the past decades. Within GIScience, it is the body of research on spatial data quality that provide the framing for VGI quality assurance. Van Oort’s (2006) comprehensive synthesis of various quality standards identifies the following elements of spatial data quality discussions:
- Lineage – description of the history of the dataset,
- Positional accuracy – how well the coordinate value of an object in the database relates to the reality on the ground.
- Attribute accuracy – as objects in a geographical database are represented not only by their geometrical shape but also by additional attributes.
- Logical consistency – the internal consistency of the dataset,
- Completeness – how many objects are expected to be found in the database but are missing as well as an assessment of excess data that should not be included.
- Usage, purpose and constraints – this is a fitness-for-purpose declaration that should help potential users in deciding how the data should be used.
- Temporal quality – this is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.
While some of these quality elements might seem independent of a specific application, in reality they can be only be evaluated within a specific context of use. For example, when carrying out analysis of street-lighting in a specific part of town, the question of completeness become specific about the recording of all street-light objects within the bounds of the area of interest and if the data set includes does not include these features or if it is complete for another part of the settlement is irrelevant for the task at hand. The scrutiny of information quality within a specific application to ensure that it is good enough for the needs is termed ‘fitness for purpose’. As we shall see, fit-for-purpose is a central issue with respect to VGI.
To understand the reason that geographers are concerned with quality assurance of VGI, we need to recall the historical development of geographic information, and especially the historical context of geographic information systems (GIS) and GIScience development since the 1960s. For most of the 20th century, geographic information production became professionalised and institutionalised. The creation, organisation and distribution of geographic information was done by official bodies such as national mapping agencies or national geological bodies who were funded by the state. As a results, the production of geographic information became and industrial scientific process in which the aim is to produce a standardised product – commonly a map. Due to financial, skills and process limitations, products were engineered carefully so they can be used for multiple purposes. Thus, a topographic map can be used for navigation but also for urban planning and for many other purposes. Because the products were standardised, detailed specifications could be drawn, against which the quality elements can be tested and quality assurance procedures could be developed. This was the backdrop to the development of GIS, and to the conceptualisation of spatial data quality.
The practices of centralised, scientific and industrialised geographic information production lend themselves to quality assurance procedures that are deployed through organisational or professional structures, and explains the perceived challenges with VGI. Centralised practices also supported employing people with focus on quality assurance, such as going to the field with a map and testing that it complies with the specification that were used to create it. In contrast, most of the collection of VGI is done outside organisational frameworks. The people who contribute the data are not employees and seemingly cannot be put into training programmes, asked to follow quality assurance procedures, or expected to use standardised equipment that can be calibrated. The lack of coordination and top-down forms of production raise questions about ensuring the quality of the information that emerges from VGI.
To consider quality assurance within VGI require to understand some underlying principles that are common to VGI practices and differentiate it from organised and industrialised geographic information creation. For example, some VGI is collected under conditions of scarcity or abundance in terms of data sources, number of observations or the amount of data that is being used. As noted, the conceptualisation of geographic data collection before the emergence of VGI was one of scarcity where data is expensive and complex to collect. In contrast, many applications of VGI the situation is one of abundance. For example, in applications that are based on micro-volunteering, where the participant invest very little time in a fairly simple task, it is possible to give the same mapping task to several participants and statistically compare their independent outcomes as a way to ensure the quality of the data. Another form of considering abundance as a framework is in the development of software for data collection. While in previous eras, there will be inherently one application that was used for data capture and editing, in VGI there is a need to consider of multiple applications as different designs and workflows can appeal and be suitable for different groups of participants.
Another underlying principle of VGI is that since the people who collect the information are not remunerated or in contractual relationships with the organisation that coordinates data collection, a more complex relationships between the two sides are required, with consideration of incentives, motivations to contribute and the tools that will be used for data collection. Overall, VGI systems need to be understood as socio-technical systems in which the social aspect is as important as the technical part.
In addition, VGI is inherently heterogeneous. In large scale data collection activities such as the census of population, there is a clear attempt to capture all the information about the population over relatively short time and in every part of the country. In contrast, because of its distributed nature, VGI will vary across space and time, with some areas and times receiving more attention than others. An interesting example has been shown in temporal scales, where some citizen science activities exhibit ‘weekend bias’ as these are the days when volunteers are free to collect more information.
Because of the difference in the organisational settings of VGI, a different approaches to quality assurance is required, although as noted, in general such approaches have been used in many citizen science projects. Over the years, several approaches emerged and these include ‘crowdsourcing ‘, ‘social’, ‘geographic’, ‘domain’, ‘instrumental observation’ and ‘process oriented’. We now turn to describe each of these approaches.
The ‘crowdsourcing’ approach is building on the principle of abundance. Since there are is a large number of contributors, quality assurance can emerge from repeated verification by multiple participants. Even in projects where the participants actively collect data in uncoordinated way, such as the OpenStreetMap project, it has been shown that with enough participants actively collecting data in a given area, the quality of the data can be as good as authoritative sources. The limitation of this approach is when local knowledge or verification on the ground (‘ground truth’) is required. In such situations, the ‘crowdsourcing’ approach will work well in central, highly populated or popular sites where there are many visitors and therefore the probability that several of them will be involved in data collection rise. Even so, it is possible to encourage participants to record less popular places through a range of suitable incentives.
The ‘social’ approach is also building on the principle of abundance in terms of the number of participants, but with a more detailed understanding of their knowledge, skills and experience. In this approach, some participants are asked to monitor and verify the information that was collected by less experienced participants. The social method is well established in citizen science programmes such as bird watching, where some participants who are more experienced in identifying bird species help to verify observations by other participants. To deploy the social approach, there is a need for a structured organisations in which some members are recognised as more experienced, and are given the appropriate tools to check and approve information.
The ‘geographic’ approach uses known geographical knowledge to evaluate the validity of the information that is received by volunteers. For example, by using existing knowledge about the distribution of streams from a river, it is possible to assess if mapping that was contributed by volunteers of a new river is comprehensive or not. A variation of this approach is the use of recorded information, even if it is out-of-date, to verify the information by comparing how much of the information that is already known also appear in a VGI source. Geographic knowledge can be potentially encoded in software algorithms.
The ‘domain’ approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate.
The ‘instrumental observation’ approach remove some of the subjective aspects of data collection by a human that might made an error, and rely instead on the availability of equipment that the person is using. Because of the increased in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provide evidence for the quality and accuracy of the information.
Finally, the ‘process oriented’ approach bring VGI closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and an online resources to learn about data collection and reporting.
Importantly, these approach are not used in isolation and in any given project it is likely to see a combination of them in operation. Thus, an element of training and guidance to users can appear in a downloadable application that is distributed widely, and therefore the method that will be used in such a project will be a combination of the process oriented with the crowdsourcing approach. Another example is the OpenStreetMap project, which in the general do not follow limited guidance to volunteers in terms of information that they collect or the location in which they collect it. Yet, a subset of the information that is collected in OpenStreetMap database about wheelchair access is done through the highly structured process of the WheelMap application in which the participant is require to select one of four possible settings that indicate accessibility. Another subset of the information that is recorded for humanitarian efforts is following the social model in which the tasks are divided between volunteers using the Humanitarian OpenStreetMap Team (H.O.T) task manager, and the data that is collected is verified by more experienced participants.
The final, and critical point for quality assurance of VGI that was noted above is fitness-for-purpose. In some VGI activities the information has a direct and clear application, in which case it is possible to define specifications for the quality assurance element that were listed above. However, one of the core aspects that was noted above is the heterogeneity of the information that is collected by volunteers. Therefore, before using VGI for a specific application there is a need to check for its fitness for this specific use. While this is true for all geographic information, and even so called ‘authoritative’ data sources can suffer from hidden biases (e.g. luck of update of information in rural areas), the situation with VGI is that variability can change dramatically over short distances – so while the centre of a city will be mapped by many people, a deprived suburb near the centre will not be mapped and updated. There are also limitations that are caused by the instruments in use – for example, the GPS positional accuracy of the smartphones in use. Such aspects should also be taken into account, ensuring that the quality assurance is also fit-for-purpose.
References and Further Readings
Bonney, Rick. 1996. Citizen Science – a lab tradition, Living Bird, Autumn 1996.
Bonney, Rick, Shirk, Jennifer, Phillips, Tina B. 2013. Citizen Science, Encyclopaedia of science education. Berlin: Springer-Verlag.
Goodchild, Michael F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221.
Goodchild, Michael F., and Li, Linna. 2012, Assuring the quality of volunteered geographic information. Spatial Statistics, 1 110-120
Haklay, Mordechai. 2010. How Good is volunteered geographical information? a comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703.
Sui, Daniel, Elwood, Sarah and Goodchild, Michael F. (eds), 2013. Crowdsourcing Geographic Knowledge, Berlin:Springer-Verlag.
Van Oort, Pepjin .A.J. 2006. Spatial data quality: from description to application, PhD Thesis, Wageningen: Wageningen Universiteit, p. 125.