How many citizen scientists in the world?

Since the development of the proposal for the Doing It Together Science project (DITOs), I have been using the “DITOs escalator” model to express the different levels of engagement in science, while also demonstrating that the higher level have fewer participants, which mean that there is a potential for people to move between levels of engagement – sometime towards deeper engagement, and sometime towards lighter one according to life stages, family commitments, etc. This is what the escalator, after several revisions, look like:

DitosEscalator7

I have an ongoing interest in participation inequality (the observation that very few participants are doing most of the work) and the way it plays out and influences citizen science projects. When you start attaching numbers to the different levels of public engagement in science, participation inequality is appearing in this area, too. Since writing the proposal in 2015, I have been looking for indications that will support the estimation of the number of participants. During the process of working on a paper that uses the escalator, I’ve done the research to identify sources of information to support these estimations. While the paper is starting its peer review journey, I am putting out the part that relates to these numbers so this part can get open peer review here. I have decided to use 2017 as a recent year for which we can carry out the analysis. As for geographical scale, I’m using the United Kingdom as a country with very active citizen science community as my starting point.

At the bottom of the escalator, Level 1 considers the whole population, about 65 million people. Because of the impact of science across society, the vast majority, if not all, will have some exposure to science – even if this is only in the form of medical encounters.

However, the bare minimum of engagement is to passively consume information about science through newspapers, websites, and TV and Radio programme (Level 2). We can gauge the number of people at this level from the BBC programmes Blue Planet II and Planet Earth II, both focusing on natural history, with viewing figures of 14 million and about 10 million, respectively. We can, therefore, estimate these “passive consumers” at about 25% of the population.

At the next level is active consumption of science – such as visits to London’s Science Museum (UK visitors in 2017 – about 1.3), or the Natural History Museum (UK visitors in 2017 – about 2.1m), so an estimation of participation at 10% of the population seem justified.

Next, we can look at active engagement in citizen science but to a limited degree. Here, the Royal Society for the Protection of Birds (RSPB) annual Big Garden Birdwatch requires the participants to dedicate a single hour in the year. The project attracted about 500,000 participants in 2017, and we can, therefore, estimate participation at this level at about 1% of the population. This should also include about 170,000 people who carried out a single task on Zooniverse and other online projects.

At the fifth level, there are projects that require remote engagement, such as volunteer thinking on the Zooniverse platform, or in volunteer computing on the IBM World Community Grid (WCG), in which participants download a software on their computer to allow processing to assist scientific research. The number of participants in WCG from the UK in 2017 was about 18,000. In Zooniverse about 74,0000 people carried out more than a single task in 2017, thus estimating participation at this level at 0.1% of the population (thanks to Grant Miller, Zooniverse and Caitlin Larkin, IBM for these details).

The sixth level requires the regular data collection, such as the participation in the British Trust of Ornithology Garden Birdwatch got about 6,500 active participants in 2017 (BTO 2018), while about 5000 contributed to the biodiversity recording system iRecord (thanks to Tom August, CEH) and it will be reasonable to estimate that the participation is about 0.01% of the population.

The most engaged level include those who are engaged in DIY Science, such as exploring DIY Bio, or developing their own sensors, etc. We can estimate that it represents 0.001% of the UK population at most (thanks to Philippe Boeing & Ilia Levantis).

We can see that as the level of engagement increases, the demand from participants increase and the number of participants drops. Not that this is earth-shattering, though what is interesting is that the difference between levels is in order of magnitude. We also know that the UK enjoys all the possible benefits that are needed to foster citizen science: a long history of citizen science activities, established NGOs and academic institutions that support citizen science, good technological infrastructure (broadband, mobile phone use), well-educated population (39.1% with tertiary education), etc. So we’re talking about a best-case scenario.

It is also important, already at this point, to note that UNESCO’s estimates of the percentage of UK population who are active scientists (working in research jobs), is 0.4% which is bigger than the 0.111 for levels 5,6 and 7.  

Let’s try to extrapolate from the UK to the world.

First, how many people we can estimate to have the potential of being citizen scientists? We want them to be connected and educated, with a middle-class lifestyle that gives them leisure time for hobbies and volunteering.

The connectivity gives us a large number – according to ITU, 3.5 Billion people are using the Internet. The estimation of the size of middle-class is a bit smaller, at 3.2 Billion people.  However, we know that participants in most citizen science projects which use passive inclusiveness, where everyone is welcome without an active effort in outreach to under-represented groups, tend to be from people with higher education (a.k.a tertiary education). There is actually data about it – here is the information from Wikipedia about tertiary educational attainment. According to UNESCO’s statistics, there were over 672 million people with a form of tertiary education in 2017. Let’s say that not everyone in citizen science is with tertiary education (which is true) so our potential starting number is 1 Billion.

I’ll assume the same proportion of the UK, ignoring that it present for us the best case. So about 250 million of these are passive consumers of science (L2), and 100 million are active consumer (e.g. going to science museums) (L3). We can then have 10 million people that participate in the once a year events (L4); 1 million that are active in online citizen science (this is more than a one-off visit or trial) (L5); about 100,000 who are the committed participants (mostly nature observers) and about 10,000 DIY bio, makers, and DIY science people (L6 and L7).

Are these numbers make sense? Looking at the visits to science/natural history museums on Wikipedia, level 3 seems about right. Level 4 looks very optimistic – in addition to Big Garden Birdwatch, there were about 17,000 people participating in City Nature Challenge, and 73,000 participants in the Christmas Bird Count, and about 888,000 done a single task on Zooniverse – it looks like that a more realistic number is 3 million or 4 million. Level 5 is an underestimate – IBM Word Community Grid have 753,000 members, and there are other volunteer computing projects which will make it about 1 million, then there were about 163,000 global Zooniverse contributors (thanks to the information from Grant Miller), 130,000 Wikipedians, 50,000 active contributors in OpenStreetMap, and other online projects such as EyeWire etc. So let’s say that it’s about 1.5 Million. At level 6, again the number is about right – e.g. eBird reports 20,000 birders in their peak day. For the sake of the argument, let’s say that it’s double the number – 200,000. Level 7 also seems right, based on estimations of biohackers numbers in Europe.

Now let’s look at the number of scientists globally: in 2013 there were 7.3 million researchers worldwide. With the estimation of “serious” citizen scientists (levels 5,6 and 7) at about 1.7 million, we can see the issue of crowdsourcing here: the potential crowdsourcer community is, at the moment, much bigger than the volunteers.

Something that is important to highlight here is the amazing productivity of citizen scientists in terms of their ability to analyse, collect information, or inventing tools – we know from participation inequality that this tiny group of participants are doing a huge amount of work – the 50,000 OSM volunteers are mapping the world or the 73,000 Christmas Bird Count participants provided 56,000,000 observations or the attention impact of the Open Insulin Project. So numbers are not the only thing that we need to think about.

Moreover, this is not a reason to give up on increasing the number of citizen scientists. Look at the numbers of Google Local Guides – out of 1 Billion users, a passive crowdsourcing approach reached 50 million single time contributors, and 465,000 in the equivalent of levels 5 to 7. Therefore, citizen science has the potential of reaching much larger numbers. At the minimum, there is the large cohort of people with tertiary education, with at least 98 million people with Masters and PhD in the world.

Therefore, to enable a wider and deeper public engagement with science, apart from the obvious point of providing funding, institutional support, and frameworks to scale up citizen science, we can think of an “escalator” like process, which makes people aware of the various levels and assists them in moving up or down the engagement level. For example, due to a change in care responsibilities or life stages, people can become less active for a period of time and then chose to become more active later. With appropriate funding, support, and attention, growing the global citizen science should be possible. 

Advertisements

Citizen Science & Scientific Crowdsourcing – week 3 – Participation inequality

One of the aspects that fascinates me about citizen science and crowdsourcing is the nature of participation and in particular participation inequality. As I’ve noted last week, when you look at large scale systems, you expected to see it in them (so Google Local Guides is exhibiting 95:5:0.005 ratio).

I knew that this phenomenon has been observed many times in Massive Online Open Courses (MOOCs) so I expected it to happen in the course. I’m particularly interested in the question of the dynamic aspect of participation inequality: for example, at the point of the beginning of the “introduction to citizen science and scientific crowdsourcing” course, every single person is at exactly the same level of participation – 0. However, within three weeks, we are starting to see the pattern emerges. Here are some of the numbers:

At this point in time, there are 497 people that went through the trouble of accessing UCLeXtend and creating a profile. They are a small group of the people that seen the blog post (about 1,100) or the tweet about it (about 600 likes, retweets or clicking on the link). There are further 400 people that filled in the online form that I set before the course was open and stated their interest in it.

The course is structured as a set of lectures, each of them broken into segments of 10 minutes each, and although the annotated slides are available and it is likely that many people prefer them over listening to a PowerPoint video (it’s better in class!), the rate of viewing of the videos gives an indication of engagement.

Here are our viewing statistics for now:

ICSSC260118Videos

We can start seeing how the sub-tasks (viewing a series of videos) is already creating the inequality – lots of people watch part of the first video, and either give up (maybe switching to the notes) or leaving it to another time. By part 4 of the first lecture, we are already at very few views (the “Lecture 3 Part 2” video is the one that I’ve integrated in the previous blog post).

What is interesting to see is how fast participation inequality emerges within the online course, and notice that there is now a core of about 5-10 people (about 1% to 2%) that are following the course at the same rate as the 9 students who are in the face to face class. I expect people to also follow the course over a longer period of time, so I wouldn’t read too much into the pattern and wait until the end of the course and a bit after it to do a full analysis.

When I was considering setting up the course as a hybrid online/offline, I was expecting this, since the amount of time that is required to follow up the course is nearly 4-5 hours a week – something reasonable for an MSc student during a course, but tough for a distance learner (I have a huge appreciation to these 10 people that are following!).

 

 

New paper – Exploring Engagement Characteristics and Behaviours of Environmental Volunteers

Engagement in environmental volunteering

A new paper that is based on the PhD work of Valentine Seymour is out. Valentine has been researching the patterns of volunteering in environmental projects at the organisation The Conservation Volunteers. In the paper, we draw parallels between the activities of environmental volunteers and citizen science participants. The analysis demonstrates that the patterns of participation are similar.

The paper is open access and available here

The summary of the paper is:

Environmental volunteering and environmental citizen science projects both have a pivotal role in civic participation. However, one of the common challenges is recruiting and retaining an adequate level of participant engagement to ensure the sustainability of these projects. Thus, understanding patterns of participation is fundamental to both types of projects. This study uses and builds on existing quantitative approaches used to characterise the nature of volunteer engagement in online citizen science projects, to see whether similar participatory patterns exist in offline environmental volunteering projects. The study uses activity records of environmental volunteers from a UK environmental charity “The Conservation Volunteers,” and focuses on three characteristics linked to engagement: longevity, frequency, and distance travelled. Findings show differences in engagement patterns and contributor activity between the three UK regions of Greater London, Greater Manchester, and Yorkshire. Cluster analysis revealed three main types of volunteer engagement profiles which are similar in scale across all regions, namely participants can be grouped into “One-Session,” “Short-Term,” and “Long-Term” volunteer. Of these, the “One-Session” volunteer accounted for the largest group of volunteers.

Published: Why is Participation Inequality Important?

bookcoverI’ve mentioned the European Handbook for Crowdsourced Geographic Information in the last post, and explained how it came about. My contribution to the book is a chapter titled ‘Why is Participation Inequality Important?. The issue of participation inequality, also known as the 90:9:1 rule, or skewed contribution, has captured my interest for a while now. I have also explored it in my talk at the ECSA conference on ‘participatory [citizen] science‘ and elsewhere.

In this fairly short chapter what I am trying to communicate is that while we know that participation inequality is happening and part of crowdsourced information, we need to consider how it influences issues such as data quality, and think how it come about. I am trying to make suggest how we ended with skewed contributions – after all, at the beginnings of most projects, everyone are at the same level – zero contribution, and then participation inequality emerge.

I have used the iconic graph of contribution to OpenStreetMap that Harry Wood created, but the chapter is discussing other projects and activities where you can come across this phenomena.

Here is a direct link to the chapter, and I’ll be very happy to hear comments about it!

 

Participatory [Citizen] Science

Citizen Science as Participatory Science‘ is one of the most popular posts that I have published here. The post is the core section of a chapter that was published in 2013 (the post itself was written in 2011). For the first European Citizen Science Association conference I was asked to give a keynote on the second day of the conference, which I have titled ‘Participatory Citizen Science‘, to match the overall theme of the conference, which is  ‘Citizen Science – Innovation in Open Science, Society and Policy’. The abstract of the talk:

In the inaugural ECSA conference, we are exploring the intersection of innovation, open science, policy and society and the ways in which we can established new collaborations for a common good. The terms participation and inclusion are especially important if we want to fulfil the high expectations from citizen science, as a harbinger of open science. In the talk, the conditions for participatory citizen science will be explored – the potential audience of different areas and activities of citizen science, and the theoretical frameworks, methodologies and techniques that can be used to make citizen science more participatory. The challenges of participation include designing projects and activities that fit with participants’ daily life and practices, their interests, skills, as well as the resources that they have, self-believes and more. Using lessons from EU FP7 projects such as EveryAware, Citizen Cyberlab, and UK EPSRC projects Extreme Citizen Science, and Street Mobility, the boundaries of participatory citizen science will be charted.

As always, there is a gap between the abstract and the talk itself – as I started exploring the issues of participatory citizen science, some questions about the nature of participation came up, and I was trying to discuss them. Here are the slides:

After opening with acknowledgement to the people who work with us (and funded us), the talk turn the core issue – the term participation.

https://www.google.co.uk/search?q=sherry+and+george+arnstein
Sherry Arnstein with Harry S Truman (image by George Arnstein)

Type ‘participation’ into Google Scholar, and the top paper, with over 11,000 citations, is Sherry Rubin Arnstein’s ‘A ladder of citizen participation’. In her ladder, Sherry offered 8 levels of participation – from manipulation to citizen control. Her focus was on political power and the ability of the people who are impacted by the decisions to participate and influence them. Knowingly simplified, the ladder focus on political power relationships, and it might be this simple presentation and structure that explains its lasting influence.

Since its emergence, other researchers developed versions of participation ladders – for example Wiedmann and Femers (1993), here from a talk I gave in 2011:

These ladders come with baggage: a strong value judgement that the top is good, and the bottom is minimal (in the version above) or worse (in Arnstein’s version). The WeGovNow! Project is part of the range of ongoing activities of using digital tools to increase participation and move between rungs in these concept of participation, with an inherent assumption about the importance of high engagement.

Levels of Citizen Science 2011
Levels of Citizen Science 2011

At the beginning of 2011, I found myself creating a ladder of my own. Influenced by the ladders that I learned from, the ‘levels of citizen science’ make an implicit value judgement in which ‘extreme’ at the top is better than crowdsourcing. However, the more I’ve learned about citizen science, and had time to reflect on what participation mean and who should participate and how, I feel that this strong value judgement is wrong and a simple ladder can’t capture the nature of participation in Citizen Science.

There are two characteristics that demonstrate the complexity of participation particularly well: the levels of education of participants in citizen science activities, and the way participation inequality (AKA 90-9-1 rule) shape the time and effort investment of participants in citizen science activities.

We can look at them in turns, by examining citizen science projects against the general population. We start with levels of education – Across the EU28 countries, we are now approaching 27% of the population with tertiary education (university). There is wide variability, with the UK at 37.6%, France at 30.4%, Germany 23.8%, Italy 15.5%, and Romania 15%. This is part of a global trend – with about 200 million students studying in tertiary education across the world, of which about 2.5 million (about 1.25%) studying to a doctoral level.

However, if we look at citizen science project, we see a different picture: in OpenStreetMap, 78% of participants hold tertiary education, with 8% holding doctoral level degrees. In Galaxy Zoo, 65% of participants with tertiary education and 10% with doctoral level degrees. In Transcribe Bentham (TB), 97% of participants have tertiary education and 24% hold doctoral level degrees. What we see here is much more participation with people with higher degrees – well above their expected rate in the general population.

The second aspect, Participation inequality, have been observed in OpenStreetMap volunteer mapping activities, iSpot – in both the community of those who capture information and those that help classify the species, and even in an offline conservation volunteering activities of the Trust for Conservation Volunteers. In short, it is very persistent aspect of citizen science activities.

For the sake of the analysis, lets think of look at citizen science projects that require high skills from participants and significant engagement (like TB), those that require high skills but not necessarily a demanding participation (as many Zooniverse project do), and then the low skills/high engagement project (e.g. our work with non-literate groups), and finally low skills/low engagement projects. There are clear benefits for participation in each and every block of this classification:

high skills/high engagement: These provide provide a way to include highly valuable effort with the participants acting as virtual research assistants. There is a significant time investment by them, and opportunities for deeper engagement (writing papers, analysis)

high skills/low engagement: The high skills might contribute to data quality, and allow the use of disciplinary jargon, with opportunities for lighter or deeper engagement to match time/effort constraints

low skills/high engagement: Such activities are providing an opportunity for education, awareness raising, increased science capital, and other skills. They require support and facilitation but can show high potential for inclusiveness.

low skills/low engagement: Here we have an opportunity for active engagement with science with limited effort, there is also a potential for family/Cross-generational activities, and outreach to marginalised groups (as OPen Air Laboratories done)

In short – in each type of project, there are important societal benefits for participation, and it’s not only the ‘full inclusion at the deep level’ that we should focus on.

Interestingly, across these projects and levels, people are motivated by science as a joint human activity of creating knowledge that is shared.

So what can we say about participation in citizen science – well, it’s complex. There are cases where the effort is exploited, and we should guard against that, but outside these cases, the rest is much more complex picture.

The talk move on to suggest a model of allowing people to adjust their participation in citizen science through an ‘escalator’ that we are aiming to conceptually develop in DITOs.

Finally, with this understanding of participation, we can understand better the link to open science, open access and the need of participants to potentially analyse the information.

Assertions on crowdsourced geographic information & citizen science #3

Following the two previous assertions, namely that:

you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’ (original post here)

And

‘All information sources are heterogeneous, but some are more honest about it than others’  (original post here)

The third assertion is about pattern of participation. It is one that I’ve mentioned before and in some way it is a corollary of the two assertions above.

‘When looking at crowdsourced information, always keep participation inequality in mind’ 

Because crowdsourced information, either Volunteered Geographic Information or Citizen Science, is created through a socio-technical process, all too often it is easy to forget the social side – especially when you are looking at the information without the metadata of who collected it and when. So when working with OpenStreetMap data, or viewing the distribution of bird species in eBird (below), even though the data source is expected to be heterogeneous, each observation is treated as similar to other observation and assumed to be produced in a similar way.

Distribution of House Sparrow

Yet, data is not only heterogeneous in terms of consistency and coverage, it is also highly heterogeneous in terms of contribution. One of the most persistence findings from studies of various systems – for example in Wikipedia , OpenStreetMap and even in volunteer computing is that there is a very distinctive heterogeneity in contribution. The phenomena was term Participation Inequality by Jakob Nielsn in 2006 and it is summarised succinctly in the diagram below (from Visual Liberation blog) – very small number of contributors add most of the content, while most of the people that are involved in using the information will not contribute at all. Even when examining only those that actually contribute, in some project over 70% contribute only once, with a tiny minority contributing most of the information.

Participation Inequality Therefore, when looking at sources of information that were created through such process, it is critical to remember the nature of contribution. This has far reaching implications on quality as it is dependent on the expertise of the heavy contributors, on their spatial and temporal engagement, and even on their social interaction and practices (e.g. abrasive behaviour towards other participants).

Because of these factors, it is critical to remember the impact and implications of participation inequality on the analysis of the information. There will be some analysis to which it will have less impact and some where it will have major one. In either cases, it need to be taken into account.