Notes from ICCB/ECCB 2015 (Day 2) – Citizen Science data quality

Posters session at ICCB/ECCB2015

The second day of the ICCB/ECCB 2015 started with a session that focused on the use and interpretation of citizen science data. The  Symposium Citizen Science in Conservation Science: the new paths, from data collection to data interpretation was organised by Karine Princé and included the following talks:

Bias, information, signal and noise in citizen science data – Nick Isaac – information content of a dataset is question dependent on what was captured and how, as well as survey effort. Data is coming in different ways from a range of people who collect them for different purposes. Biological records are unstructured – they don’t address a specific question and need to know how they come about – information about the data collection protocols is important to make sense of the data. If you are collecting data through citizen science, remember that data will outlive the porject, so need good metadata, and data standards to ensure that it can be used by others. There are powerful statistical tools and we should use to model the bias and not try to avoid it, and little bit of metadata would go a long way so worth recording it.

Conservation management prioritization with citizen science data and species abundance models – Alison Johnston (BTO/Cornell Lab of Ornithology) distribution of species are dynamic and they change by seasons. This is especially important for migratory birds – conservation at specific times (wintering, breading or migrating). The BirdReturns programme in California is a way to flood rice field to provide water-birds habitat, and is an effective and not hugely costly. However, dynamic conservation need precision in information. Citizen Science data can help in occurrence model and want to identify abundance as this will help to prioritise the activities. They used eBird data. In California there are 230,000 checklists but there are biases in the data. There are variable efforts and expertise, and bias in sites, seasons, time. There are also different relationships with habitat, it is also difficult to identify the extreme abundance. They used the Spatio-Temporal Exploratory Models (STEM) which allow modelling with random grids – averaging across cells that have different origins (Fink et al 2010 Ecological Applications). Using the model, they identified areas of high activities – especially the abundance model. Of the two models, the abundance model seem more suitable in using citizen science data for dynamic conservation. The results were used with reverse auction to maximise the use of the available funds to provide large areas of temporary wetland.

Citizen sciences for monitoring biodiversity in habitat structured spaces – Camille Coron (Paris Sud)  described a model estimate for several species and their abundances – they wanted to use several datasets that are at different types of protocols from citizen science projects. Some with strong protocols and some without. They assume that space is covered wtih different types of habitat, but the habitat itself is not known. They look at bird species in Aquitaine – 34 species. 2 datasets are from precise protocols and the third dataset is oportunistics. They developed a statistical model to allow to estimate the data, using a detection probability, abundance, and the intensity of the observation activity. In opportunistic dataset the effort is not known. The model have important gains when species are rare, secondly when the considered species in hardly detected in the data and when there are many species. By using the combined robust protocol projects, the estimation of species distribution is improved.

Can opportunistic occurrence records improve the large-scale estimation of abundance trends? – Joern Pagel – there is lack of comprehensive data large scale variation in abundance and he describe a model that deal with it. The model is based on the assumption that population density is a main driver of variation in species detectability. Using UK butterfly data they tested the model, combining the very details local transects (140 with weekly monitoring) with opportunistic presence recording (over 500K records) using 10×10 km grid. The transects were used to estimate the abundance (described in a paper in methods in ecology and evolution). They found that opportunistic occurrences records can carry a signal of population density but need to be careful about assumptions and there are high uncertainties that are associated with it.

When do occupancy models produce reliable inferences from opportunistic data?– Arco Van Strien (statistics Netherlands) Statistics Netherlands are involved in butterflies and dragonflies monitoring – from transects and also opportunistic data. opportunistic data – unstandardised data, and can see artificial trends if effort varies over time – so the idea was to changes in recorder efforts derived from occupancy models. They coupled two logistic regression models – modelling the ecological process and the observation process. They wanted to explore the usefulness of opportunistic data & occupancy models, and used a Bayesian model, evaluating the results against standardised data. They looked for inferences – phenology (trying to find the pick date in detection), national trend in distribution, species richness per site, local trends in distribution.  The peak date- found a 0.9 correlation between opportunistic data and standardised data. National trends – there is also strong correlation – 0.8/0.9. Species richness – also correlation of over 0.9, but in local trends, the correlation is dropping to 0.4-0.5 for both butterfly and dragonfly. the conclusion – opportunistic data is great and need to be careful about the inference from it.

Making sense of citizen science data: A review of methods – Olivier Gimenez (CNRS) – interest in large terrestrial and marine mammals, they are difficult to monitor in the field and thinking of citizen science data can be used for that. Looked at all the papers with citizen science, and looked as specifically those that look at the data. Wanted to build taxonomy of methods that are used to handle citizen science data. He identified five methods. First, filtering and correction approach – so know or assume to know bias and trying to correct it – e.g. list length analysis. They are highly sensitive to specific biases. The second category – simulation approach, simulate the bias and check how your favourite method behaves given this bias. Third approach is a regression approach – use relevant variables to account for biases -e.g. ecological variables that used to build and predict models, and then use observer bias variables – e.g. distance from cities. The fourth approach is combination approach – combine citizen science data with data from standard protocol to allow to understand and correct the data. The last approach is the occupancy approach – correction for false-negatives and time/spatial variation in detection, so it can be used also extended to deal with false-positives and and also to deal with multiple species. Conclusion: we should focus more on citizens, to describe the models – we need to understand more about them (e.g. record data and the people that collected it) and social science have a major role to play.


In the session paths for the future: building conservation leadership capacity, Kirithi Karanth (Wildlife Conservation Society) looked at ‘Citizen Scientists as agents for conservation‘. In the 1980s WCS started monitoring tigers and some people who are not trained scientists wanted to join in. What draw in people was interest in tigers, and that was the start of their citizen science journey. 5000 km walked in 229 transects in the forest. It started with ecological survey across entire regions from charismatic species but also to rare species. Current project projects have 40-50 volunteer in amphibian and bird survey outside protected areas. The volunteers identify rare species. As project grown, so the challenges – e.g. around human-wildlife conflicts and that helped in having over 5000 villages and 7000 households surveyed in their area. Through the fieldwork, people understand conservation better. Another project recruited 75 volunteer to document tourism impact and the result were used by decision in the supreme court on how to regulate tourism. The have over 5000 citizen scientists, with active group of 1000 at each moment. The impact over 30 years – over 10,00 surveys in 15 states in India, with over 250 academic publications and 300 popular articles. A lot of the people who volunteers evolved into educators, film-makers, conservationists, and also share information blogs, articles, films, activists, and academics. The recognition also increase in graduate programmes – with professional masters programmes. Some of the volunteers – 10% become fully committed to conservation, but the other 90% are critical to wider engagement in society.


Published by


Professor of GIScience, University College London

2 thoughts on “Notes from ICCB/ECCB 2015 (Day 2) – Citizen Science data quality”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s