DITOs, Doing It TOgether Science – introductory video

The Doing It Together Science (DITOs) project is now in its 20th Month. It is a 3-year project, funded by the EU Horizon 2020 programme, that is aimed to increase awareness of and participation in citizen science across Europe and beyond. As such, it is focused on communication, coordination, and support of citizen science activities. Therefore, the project promotes the sharing of best practices among existing networks for a greater public and policy engagement with citizen science through a wide range of events and activities. Some of these activities include doing citizen science, as ‘engaging by doing’ is central to the effort of the project. Other activities, both online and offline, are focused on communicating different facets of citizen science, from in-depth engagement with small and organised groups to large-scale engagement via social media.
DITOs supports existing and new projects across the landscape of citizen science: top-down projects, in which people join an activity that is designed and coordinated by scientists; bottom-up science activities, in which people, scientifically trained or not, organise a research project around a problem of direct concern (this is sometimes known as DIY (Do It Yourself) science); as well as collaborative projects that are created jointly by scientists and participants.

In collaboration across the consortium, the Waag Society produced a short video of less than 3 minutes about the project. It was made from material from our events and it is good to such a short introduction to explain what the project is about…

Advertisements

Citizen Science & Scientific Crowdsourcing – week 5 – Data quality

This week, in the “Introduction to Citizen Science & Scientific Crowdsourcing“, our focus was on data management, to complete the first part of the course (the second part starts in a week’s time since we have a mid-term “Reading Week” at UCL).

The part that I’ve enjoyed most in developing was the segment that addresses the data quality concerns that are frequently raised about citizen science and geographic crowdsourcing. Here are the slides from this segment, and below them a rationale for the content and detailed notes

I’ve written a lot on this blog about data quality and in many talks that I gave about citizen science and crowdsourced geographic information, the question about data quality is the first one to come up. It is a valid question, and it had led to useful research – for example on OpenStreetMap and I recall the early conversations, 10 years ago, during a journey to the Association for Geographic Information (AGI) conference about the quality and the longevity potential of OSM.

However, when you are being asked the same question again, and again, and again, at some point, you start considering “why am I being asked this question?”. Especially when you know that it’s been over 10 years since it was demonstrated that the quality is beyond “good enough”, and that there are over 50 papers on citizen science quality. So why is the problem so persistent?

Therefore, the purpose of the segment was to explain the concerns about citizen science data quality and their origin, then to explain a core misunderstanding (that the same quality assessment methods that are used in “scarcity” conditions work in “abundance” conditions), and then cover the main approaches to ensure quality (based on my article for the international encyclopedia of geography). The aim is to equip the students with a suitable explanation on why you need to approach citizen science projects differently, and then to inform them of the available methods. Quite a lot for 10 minutes!

So here are the notes from the slides:

[Slide 1] When it comes to citizen science, it is very common to hear suggestions that the data is not good enough and that volunteers cannot collect data at a good quality, because unlike trained researchers, they don’t understand who they are – a perception that we know little about the people that are involved and therefore we don’t know about their ability. There are also perceptions that like Wikipedia, it is all a very loosely coordinate and therefore there are no strict data quality procedures. However, we know that even in the Wikipedia case that when the scientific journal Nature shown over a decade ago (2005) that Wikipedia is resulting with similar quality to Encyclopaedia Britannica, and we will see that OpenStreetMap is producing data of a similar quality to professional services.
In citizen science where sensing and data collection from instruments is included, there are also concerns over the quality of the instruments and their calibration – the ability to compare the results with high-end instruments.
The opening of the Hunter et al. paper (which offers some solutions), summarises the concerned that are raised over data

[Slide 2] Based on conversations with scientists and concerned that are appearing in the literature, there is also a cultural aspect at play which is expressed in many ways – with data quality being used as an outlet to express them. This can be similar to the concerns that were raised in the cult of the amateur (which we’ve seen in week 2 regarding the critique of crowdsourcing) to protect the position of professional scientists and to avoid the need to change practices. There are also special concerns when citizen science is connected to activism, as this seems to “politicise” science or make the data suspicious – we will see next lecture that the story is more complex. Finally, and more kindly, we can also notice that because scientists are used to top-down mechanisms, they find alternative ways of doing data collection and ensuring quality unfamiliar and untested.

[Slide 3] Against this background, it is not surprising to see that checking data quality in citizen science is a popular research topic. Caren Cooper have identified over 50 papers that compare citizen science data with those that were collected by professional – as she points: “To satisfy those who want some nitty gritty about how citizen science projects actually address data quality, here is my medium-length answer, a brief review of the technical aspects of designing and implementing citizen science to ensure the data are fit for intended uses. When it comes to crowd-driven citizen science, it makes sense to assess how those data are handled and used appropriately. Rather than question whether citizen science data quality is low or high, ask whether it is fit or unfit for a given purpose. For example, in studies of species distributions, data on presence-only will fit fewer purposes (like invasive species monitoring) than data on presence and absence, which are more powerful. Designing protocols so that citizen scientists report what they do not see can be challenging which is why some projects place special emphasize on the importance of “zero data.”
It is a misnomer that the quality of each individual data point can be assessed without context. Yet one of the most common way to examine citizen science data quality has been to compare volunteer data to those collected by trained technicians and scientists. Even a few years ago I’d noticed over 50 papers making these types of comparisons and the overwhelming evidence suggested that volunteer data are fine. And in those few instances when volunteer observations did not match those of professionals, that was evidence of poor project design. While these studies can be reassuring, they are not always necessary nor would they ever be sufficient.” (http://blogs.plos.org/citizensci/2016/12/21/quality-and-quantity-with-citizen-science/)

[Slide 4] One way to examine the issue with data quality is to think of the clash between two concepts and systems of thinking on how to address quality issue – we can consider the condition of standard scientific research conditions as ones of scarcity: limited funding, limited number of people with the necessary skills, a limited laboratory space, expensive instruments that need to be used in a very specific way – sometimes unique instruments.
The conditions of citizen science, on the other hand, are of abundance – we have a large number of participants, with multiple skills, but the cost per participant is low, they bring their own instruments, use their own time, and are also distributed in places that we usually don’t get to (backyards, across the country – we talked about it in week 2). Conditions of abundance are different and require different thinking for quality assurance.

[Slide 5] Here some of the differences. Under conditions of scarcity, it is worth investing in long training to ensure that the data collection is as good as possible the first time it is attempted since time is scarce. Also, we would try to maximise the output from each activity that our researcher carried out, and we will put procedures and standards to ensure “once & good” or even “once & best” optimisation. We can also force all the people in the study to use the same equipment and software, as this streamlines the process.
On the other hand, in abundance conditions we need to assume that people are coming with a whole range of skills and that training can be variable – some people will get trained on the activity over a long time, while to start the process we would want people to have light training and join it. We also thinking of activities differently – e.g. conceiving the data collection as micro-tasks. We might also have multiple procedures and even different ways to record information to cater for a different audience. We will also need to expect a whole range of instrumentation, with sometimes limited information about the characteristics of the instruments.
Once we understand the new condition, we can come up with appropriate data collection procedures that ensure data quality that is suitable for this context.

[Slide 6] There are multiple ways of ensuring data quality in citizen science data. Let’s briefly look at each one of these. The first 3 methods were suggested by Mike Goodchild and Lina Li in a paper from 2012.

[Slide 7] The first method for quality assurance is crowdsourcing – the use of multiple people who are carrying out the same work, in fact, doing peer review or replication of the analysis which is desirable across the sciences. As Watson and Floridi argued, using the examine of Zooniverse, the approaches that are being used in crowdsourcing give these methods a stronger claim on accuracy and scientific correct identification because they are comparing multiple observers who work independently.

[Slide 8] The social form of quality assurance is using more and less experienced participants as a way to check the information and ensure that the data is correct. This is fairly common in many areas of biodiversity observations and integrated into iSpot, but also exist in other areas, such as mapping, where some information get moderated (we’ve seen that in Google Local Guides, when a place is deleted).

[Slide 9] The geographical rules are especially relevant to information about mapping and locations. Because we know things about the nature of geography – the most obvious is land and sea in this example – we can use this knowledge to check that the information that is provided makes sense, such as this sample of two bumble bees that are recorded in OPAL in the middle of the sea. While it might be the case that someone seen them while sailing or on some other vessel, we can integrate a rule into our data management system and ask for more details when we get observations in such a location. There are many other such rules – about streams, lakes, slopes and more.

[Slide 10] The ‘domain’ approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate. If we see a monarch butterfly within the marked area, we can assume that it will not harm the dataset even if it was a mistaken identity, while an outlier (temporally, geographically, or in other characteristics) should stand out.

[Slide 11] The ‘instrumental observation’ approach removes some of the subjective aspects of data collection by a human that might make an error, and rely instead on the availability of equipment that the person is using. Because of the increase in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with the ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provides evidence for the quality and accuracy of the information. This is where the metadata of the information becomes very valuable as it provides the necessary evidence.

[Slide 12] Finally, the ‘process oriented’ approach bring citizen science closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include the provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and online resources to learn about data collection and reporting.

[Slide 13]  What is important to be aware of is that methods are not being used alone but in combination. The analysis by Wiggins et al. in 2011 includes a framework that includes 17 different mechanisms for ensuring data quality. It is therefore not surprising that with appropriate design, citizen science projects can provide high-quality data.

 

 

Citizen Science for Observing and Understanding the Earth

Since the end of 2015, I’ve been using the following mapping of citizen science activities in a range of talks:

Range of citizen science activities
Explaining citizen science

The purpose of this way of presentation is to provide a way to guide my audience through the landscape of citizen science (see examples on SlideShare). The reason that I came up with it, is that since 2011 I give talks about citizen science. It started with the understanding that I can’t explain extreme citizen science when my audience doesn’t understand what citizen science is, and that turned into general talks on citizen science.

Similarly to Caren Cooper, I have an inclusive approach to citizen science activities, so in talks, I covered everything – from bird watching to DIY science. I felt that it’s too much information, so this “hierarchy” provides a map to go through the overview (you can look at our online course to see why it’s not a great typology). It is a very useful way to go through the different aspects of citizen science, while also being flexible enough to adapt it – I can switch the “long-running citizen science” fields according to the audience (e.g. marine projects for marine students).

An invitation for Pierre-Philippe Mathieu (European Space Agency) in 2015 was an opportunity to turn this mapping and presentation into a book chapter. The book is dedicated to “Earth Observation Open Science and Innovation and was edited by Pierre-Philippe and Christoph Aubrecht.

When I got to writing the chapter, I contacted two researchers with further knowledge of citizen science and Earth Observation – Suvodeep Mazumdar and Jessica Wardlaw. I was pleased that they were happy to join me in the effort.

Personally, I’m very pleased that we could include in the chapter the story of the International Geophysical Year, (thank Alice Bell for this gem), with Moonwatch and Sputnik monitoring.

The book is finally out, it is open access, and you can read our chapter, “Citizen Science for Observing and Understanding the Earth” for free (as well as all the other chapters). The abstract of the paper is provided below:

Citizen Science, or the participation of non-professional scientists in a scientific project, has a long history—in many ways, the modern scientific revolution is thanks to the effort of citizen scientists. Like science itself, citizen science is influenced by technological and societal advances, such as the rapid increase in levels of education during the latter part of the twentieth century, or the very recent growth of the bidirectional social web (Web 2.0), cloud services and smartphones. These transitions have ushered in, over the past decade, a rapid growth in the involvement of many millions of people in data collection and analysis of information as part of scientific projects. This chapter provides an overview of the field of citizen science and its contribution to the observation of the Earth, often not through remote sensing but a much closer relationship with the local environment. The chapter suggests that, together with remote Earth Observations, citizen science can play a critical role in understanding and addressing local and global challenges.

 

Citizen Science & Scientific Crowdsourcing – week 3 – Participation inequality

One of the aspects that fascinates me about citizen science and crowdsourcing is the nature of participation and in particular participation inequality. As I’ve noted last week, when you look at large scale systems, you expected to see it in them (so Google Local Guides is exhibiting 95:5:0.005 ratio).

I knew that this phenomenon has been observed many times in Massive Online Open Courses (MOOCs) so I expected it to happen in the course. I’m particularly interested in the question of the dynamic aspect of participation inequality: for example, at the point of the beginning of the “introduction to citizen science and scientific crowdsourcing” course, every single person is at exactly the same level of participation – 0. However, within three weeks, we are starting to see the pattern emerges. Here are some of the numbers:

At this point in time, there are 497 people that went through the trouble of accessing UCLeXtend and creating a profile. They are a small group of the people that seen the blog post (about 1,100) or the tweet about it (about 600 likes, retweets or clicking on the link). There are further 400 people that filled in the online form that I set before the course was open and stated their interest in it.

The course is structured as a set of lectures, each of them broken into segments of 10 minutes each, and although the annotated slides are available and it is likely that many people prefer them over listening to a PowerPoint video (it’s better in class!), the rate of viewing of the videos gives an indication of engagement.

Here are our viewing statistics for now:

ICSSC260118Videos

We can start seeing how the sub-tasks (viewing a series of videos) is already creating the inequality – lots of people watch part of the first video, and either give up (maybe switching to the notes) or leaving it to another time. By part 4 of the first lecture, we are already at very few views (the “Lecture 3 Part 2” video is the one that I’ve integrated in the previous blog post).

What is interesting to see is how fast participation inequality emerges within the online course, and notice that there is now a core of about 5-10 people (about 1% to 2%) that are following the course at the same rate as the 9 students who are in the face to face class. I expect people to also follow the course over a longer period of time, so I wouldn’t read too much into the pattern and wait until the end of the course and a bit after it to do a full analysis.

When I was considering setting up the course as a hybrid online/offline, I was expecting this, since the amount of time that is required to follow up the course is nearly 4-5 hours a week – something reasonable for an MSc student during a course, but tough for a distance learner (I have a huge appreciation to these 10 people that are following!).

 

 

Citizen Science & Scientific Crowdsourcing – week 2 – Google Local Guides

The first week of the “Introduction to Citizen Science and Scientific Crowdsourcing” course was dedicated to an introduction to the field of citizen science using the history, examples and typologies to demonstrate the breadth of the field. The second week was dedicated to the second half of the course name – crowdsourcing in general, and its utilisation in scientific contexts. In the lecture, after a brief introduction to the concepts, I wanted to use a concrete example that shows a maturity in the implementation of commercial crowdsourcing. I also wanted something that is relevant to citizen science and that many parallels can be drawn from, so to learn lessons. This gave me the opportunity to use Google Local Guides as a demonstration.

My interest in Google Local Guides (GLG) come from two core aspects of it. As I pointed in OpenStreetMap studies, I’m increasingly annoyed by claims that OpenStreetMap is the largest Volunteered Geographical Information (VGI) project in the world. It’s not. I guessed that GLG was, and by digging into it, I’m fairly confident that with 50,000,000 contributors (of which most are, as usual, one-timers), Google created the largest VGI project around. The contributions are within my “distributed intelligence” and are voluntary. The second aspect that makes the project is fascinating for me is linked to a talk from 2007 in one of the early OSM conferences about the usability barriers that OSM (or more general VGI) need to cross to reach a wide group of contributors – basically about user-centred design. The design of GLG is outstanding and shows how much was learned by the Google Maps and more generally by Google about crowdsourcing. I had very little information from Google about the project (Ed Parsons gave me several helpful comments on the final slide set), but by experiencing it as a participant who can notice the design decisions and implementation, it is hugely impressive to see how VGI is being implemented professionally.

As a demonstration project, it provides examples for recruitment, nudging participants to contribute, intrinsic and extrinsic motivation, participation inequality, micro-tasks and longer tasks, incentives, basic principles of crowdsourcing such as “open call” that support flexibility, location and context aware alerts, and much more. Below is the segment from the lecture that focuses on Google Local Guides, and I hope to provide a more detailed analysis in a future post.

The rest of the lecture is available on UCLeXtend.

Launching a citizen science course – week 1

Today, I gave the opening lectures of the new UCL course ‘Introduction to Citizen Science and Scientific Crowdsourcing‘. In a way, it was more work than I originally thought, but I also thought that I’m underestimating the effort – so it’s not completely unexpected.

Although I am responsible for the first installation of Moodle, the virtual learning environment, at UCL in 2003, I have not used it in the context of an online course for remote learners. I have experienced the development of the Esri Survey123 module with Patrick Rickles and the excellent team at Esri that done most the work. It’s actually quite a challenge. Luckily, the e-learning support team of UCL was happy to guide us and set us on an appropriate path of developing the material for the course.

Having the course materialising is also closing a part of the original ExCiteS proposal that was left open. Here what the proposal for Challenging Engineering said: “In the fourth year, the research group will begin to consolidate the technology (with the first PhD students completing their studies) and will develop a further focused research proposal utilising the lessons from Adventure 2… In this year, a module on Citizen Science will be offered for MSc and PhD students at UCL.”. The project officially started in September 2011, so the fourth year was 2016 – so launching it in early 2018, within the 2017/2018 academic year should be considered to be on time in academic proposal terms!

Compared to things that I’ve done in the past, I have to note that the evolution of what is considered as boring technology – e.g. Microsoft PowerPoint (MSPP) – is instrumental to the ability to put this course together. Below you’ll see the opening segment. In actual terms, the extra effort to turn it into online teaching material was not huge – record voice over in MSPP, save as a video, upload to YouTube, link to Moodle (or here). I do hope that we’re getting it right with the course, but I’ll see as we develop it.

The rest of the lecture is available on UCLeXtend.

Citizen Science Inquiry event and book launch at the Open University

Citizen Inquiry is a new book, edited by Christothea (Thea) Herodotou,‎ Mike Sharples,  and Eileen Scanlon – all are education technology experts at the Open University. To celebrate the book, the Institute of Education Technology organised a citizen science impact symposium.  These are my notes from the day.

The day opened with Eileen Scanlon covered Citizen Science at the Open University. Eileen provided context about the role of the Open University in providing an alternative way of learning science. Concepts about teaching science and how to understand the experience of the learner. There is a series of innovating pedagogy reports – the 2017 report will come out soon. Eileen examined how the introduction of technology change science learning and teaching. Technology should be understood more widely: development of experimental kits that were created to allow students to explore science at home, with thousands of students joining in the 1970s. The OU has used television as a way of linking learning to the courses that they lead, and today they link to other popular programmes, with a lot of interactions on the web and using online technology. They’ve done the SO2 pollution national experiment from 1971-1979 with acknowledgement to the contribution of the volunteers in a paper by Rose and Peare 1972 (p378). The work involves teaching science in a social experiment and carried out with first-year students. Further work was carried out by Peggy Varley – drosophila that were captured in matchboxes with insects. Later versions of the introductory course included moths traps. The aim was to engage students with science. In 2007-2009 another activity at the OU is iSpot that focused on geographical aspects of species distribution and developed by Jonathan Silvertown. The OpenScienceLab is to open science to people across the spectrum of learning. There is a journey between informal and formal learning and can travel in both directions (e.g. iSpot evolved into supporting a MOOC in ecology). There are massive challenges for new learning – informal to formal, passive to active, solitary to sharing and from learner to teachers.

I was asked to provide a keynote, and provided a talk about learning in contributory, collegial and co-created citizen science, drawing especially on the experience of the ExCiteS group.

The next presentation was by Thea Herodotou about the LEARN Citsci: a project that involved UCDavis, OU, Oxford, NHM, CalAdacmey and LA County. The project is looking at citizen science and focuses on youth participants (5-19) and the learning outcomes – what they learn through participation. There are multiple overlapping settings – how the goals help and hinder their learning. The project looks specifically at NHMs and the citsci projects that they’re doing. They look at Basu and Barton Citizen Science Agency which was adapted by Heidi Ballard. The objective of the project, in particular, the OU, trying to describe the learning settings where citizen science takes place – describe the physical or digital space where it’s happening, what are the roles of young people in projects, and also social interaction, family communication, staff, scientists etc. Looking at relevant activities – one day. They examine iNaturalist application in a bioblitz and the way it is used. They also examine Zooniverse and looking at NHM project – miniature fossils that are being used in the project. In year 1 the focus is on describing settings, and then move to capture learning, then redesign new citizen science programmes and then data analysis. The intended impacts include how to design online and offline citizen science programmes to scaffold learning and participation for young people.

The final morning talk was by Liz FitzGerald – about Situ8 – a tool to let annotate physical places with digital information, it is now a web platform. A hub for Geolocated media, originally created as a generic platform. Situ8 was with limited resources and initial prototype as a smartphone app and became a web portal. Allow people to register and by anyone. Used it in an OU field course, and in S288 module for Practical science – with measures of water quality. The platforms support data, images, text, video. They also allow exploring the data that was collected. Supports both qualitative data collection (poems or recording of information) and scientific data. They are addressing the copyright of the data and control over the downloading permissions. They use MO – Media Objects – and the platform is very generic.

Mike Sharples –  talked about nQuire – the original version, which provided a tool for schools to developed and get involved in inquiry-based learning in schools. Open learning allow for sophisticated exploration, including the virtual microscope at the OU that allows the exploration of moon rocks. The system doesn’t work due to changes in technology. The OU approach is starting from mobile and inquiry-based learning, and how to engage citizens and a wider range of participants. The successes include “citizen inquiry” as a proposal which became a reality (originally mention in an ERC synergy proposal that wasn’t successful). Citizen inquiry is becoming a framework that is recognised that combines with citizen science and inquiry-based ideas. They also developed tools – the nQuire platform, supported by Nominet Trust. The nQuire0t platform is a more open activity which includes spot-it, sense-it and win-it missions. They have 1106 users and 187 projects. The nQuire-it platform is supported by an app that unlocks the sensors on the mobile phone that the system opens to a user. Challenge – how to get to the mass scale that is beyond surveying. There are issues of recruitment, think of engagement – such as a low barrier to entry and intimidating to newcomers. The introductory screen of many websites assumes existing interest. Also how to gain value from contributing positive feedback, join a community of practice (in future learn). The next issue is sustainability – how to keep a community going: identity (we’re rock hunters/cloud spotters), development – is there a sequence of forming, storming, norming, performing relevant to cit sci, and what guidance, curation and mentoring. Finally Maturity, including considering the maturity of a community and its mitosis (breaking up to new group). Need to thing of places for people to interact with each other, support each other.

The third challenge is how to do good science with valuable outcomes that is appropriate, reliable, robust and ethical.

Good citizen inquiry need to do valuable learning, linked to teaching, have a large scale data set, good element of engagement and serendipity, involvement of trained scientists and accurate data collection and analysis.

 

Some of the book chapters:

Maria Aristeidou provided the analysis of the nQuire It platform, identifying the design requirements and then evaluated the implementation. Participants self reporting didn’t report on the inquiry process and suggested recommendation and guidelines

Gill Clough talked about geocaching about the use of geocaching then and now – she done a study in 2007. She done a detailed mixed survey of closed and open questions, and she discovered a lot of learning – 84% learn something online. Geocaching have become a subscription app, not expensive, and the commercialisation led to debate in the community. GPS is also available on the phone, and it is relying on them.

Stuart Dunn and Mark Hedges look at citizen humanities and transmission of knowledge. Looked at crowdsourcing in humanities projects  http://www.ahrc.ac.uk/documents/project-reports-and-reviews/connected-communities/crowd-sourcing-in-the-humanities/ notice different types of projects that are close to the classical crowdsourcing. Crowd gets methodological proficiency, domain expertise about the subject – but outside universities. They also identified collective knowledge and practical skills.