Lessons learned from Volunteers Interactions with Geographic Citizen Science – Morning session

On the 27th April, UCL hosted a workshop on the “Lessons learned from Volunteers Interactions with Geographic Citizen Science“. The workshop description was as follows:

“A decade ago, in 2007, Michael Goodchild defined volunteered geographic information (VGI) as ‘the widespread engagement of large numbers of private citizens, often with little in the way of formal qualifications, in the creation of geo­graphic information, a function that for centuries has been reserved to official agencies.’ (p.2). The collection and use of this type of crowdsourced geographic data have grown rapidly with amateurs mapping the earth’s surface for all kind of purposes (e.g. collecting and disseminating information about accessibility in urban centres, for crisis and emergency response purposes, mapping illegal logging in remote areas and so on). A subset of these activities has been described as ‘geographic citizen science’ and includes scientific activities in which amateur scientists (volunteers) participate in geographic data collection, analysis and dissemination within the context of a scientific project (Haklay, 2013) or simply by using scientific methods and equipment. Although, there is an extensive discussion in the VGI and geographic citizen science literature about opportunities as well as implications (e.g. data coverage, data quality and trust issues, motivation and retainment of volunteers and so on), examples from the actual interaction are not so widely discussed, neither has evidence been collected from a broad spectrum of case studies to demonstrate how volunteers interact with those technologies and applications, what they are looking for and what it is that they need/try to accomplish (at a scientific, project and personal level) and what are the common design mistakes that influence interaction.” The following is a summary of the talk and presentations:

Welcome & Instructions – Artemis Skarlatidou the workshop is linked to our ERC funded project Intelligent Maps (ECSAnVis) and  EU funded Doing It Together science (DITOs) and the COST action – our work deal with geographical applications of citizen science and data collection. There is the COST Action CA15212 which got 243 members in 39 countries – all exploring aspects of citizen science – Work Group 1 (WG1) for scientific quality, WG2 education, WG3 society-science policy, WG4 the role of volunteers in citizen science, WG5 data and interoperability, and the synergies in WG6. In WG4, which Artemis lead. we’re looking at stakeholder mapping, motivation, needs and interaction issues, and mapping citizen science across Europe. Another relevant group is the ICA Commission on use user and usability issues, the International Society for Photogrammetry & remote sensing that have a WG V/3 that look at citizen science and crowdsourced information. Sultan Kocaman explained the ISPRS link – WG V/3 focus on the promotion of regional collaboration in citizen science and geospatial technologies within the focus of ISPRS area of education and outreach.

Louis Liebenberg presents Smartphone Icon User Interface design for Oralate Trackers – Louis Liebenberg who for 3 decades have been developing software to allow hunter gatherer to protect their knowledge of tracking. One of the challenges that Louis address is the understanding how our scientific thinking evolved. Louis suggests that tracking is an example for hypothesis testing and rational thinking that evolved in in tracking by hunter gatherers. He worked with !Nate from the San people since 1985 – the context of technology use by San for a long time. Already 100 years ago, hunters discovered that arrow points can be made from fence wire and started using them. This is an example of how hunter-gatherers adopt to technologies around them. Hunter-gatherers are not isolated: they always interacted and traded. Developing a software for a smartphone (you can get an Android phone for $10 in South Africa today), is similar to adopting the fence wire for the arrows 100 years ago. He learned from master trackers – the level of sophistication of trackers is astonished him since the mid 1980s. In the Kalahari, dogs were introduced in the 60s, and therefore the knowledge of tracking and the practices of hunting change. He used tracking and certification in it in order to secure employment. Master trackers are expected in an egalitarian society to show humility, so it is possible to miss them if you go and ask “who’s the best tracker here?” – the certification is a way to provide recognition and work. The tracking provided employment in the 1990s in surveying the movement of animals in the Kalahari. The persistent hunt – when you do it without any equipment, running animals down until they die from exhaustion which is an adaptation that humans have to be able to do that. Karoha was one of the persistence hunters but also able to use CyberTracker and use the system. Parallel to the software, Louis develop the tracker certification, to know if the data is reliable. As Master Trackers die, the knowledge is lost, so the certification provides an opportunity to encourage the younger generation to develop the knowledge and benefit from it. The level of details in animal tracks is very high. There is a high level of ambiguity in tracking and requirement to learn about claw marks and knowing what are the possibilities then it is possible with high certainty to understand which animal it was. Trackers also develop hypotheses on why the shape of hoofs is the way it is, and interpret activities of animals from the track – for example, identifying new ways of interpreting the behaviour of an animal that was not observer before. For example, the ability to guess that caracals are jumping upright in an attempt to catch a bird. CyberTracker started with the early Apple Newton with a GPS module, and then evolved into the Palm Pilot and continue to evolve. The interface was very limited in drawing icons – icons are either phonetic symbols (e.g. using a wheelbarrow to describe an item that sounds similar to the word in Africans). The details can be very extensive – species, age, number, male/female and so home. The data can provide information on abundance and potential of work are the communities. In a project in the Congo, they follow the trackers of different animals and they could show they Ebola impact Chimpanzees, Gorilla, but also other animals and then this was important to understand that you can identify Ebola in wildlife before it spreads into the human population. There is also a wide use of CyberTracker in citizen science on monitoring endangered species, and different projects by indigenous communities  Australia. They can also show that there are different results from what ecologists identify. A paper from 1999 about Rhino was co-authored by a tracker, demonstrating different models of publishing with citizen scientists. The first high impact that was co-authored by trackers was published recently in biological conservation. Questions: how to communicate from hypothesis by hunter-gatherers to the scientific sphere? The need is collaboration: data collected and organised by the trackers, and then the scientists write the report, but providing a report is challenging. The reality is co-authoring as there is always need for mentoring, reciprocal approach between scientists. Louis also circulates papers with experienced scientists to improve the paper. We all need peer review support. In terms of consent and engagement: there is a need to develop the relationship of trust and understanding – the first people who were involved in CyberTracker worked with Louis for 5 years, and Louis engaged as a tracker before they were willing to work with him. Some of the early papers in the Kalahari used trackers without mentioning their name even though the trackers carried out the research. Scientific institutions are one of the last authoritarians institutions – citizen science. Scientific elitism is intransigent and this makes citizen science exciting.

Lessons from supporting non-literate forest communities in the Congo-Basin to record their Traditional Ecological Knowledge – Michalis Vitos & Julia Altenbuchner the context of the Congo-basin is the second largest rainforest. This is a forest with 29 million people, with at least 500,000 nomadic communities that rely on resources. The forest is divided into concessions and then they are sued for resource extraction – how to make local groups heard? Local communities are excluded from protected areas. In the last few years, some legislation is changing – e.g. the FLEGT of the EU to control timber import and request for social payback and responsibility. ExCiteS collaborated with communities to support such process with technology. The challenges are dealing with non-literate groups who are also non-technologically literate. We use pictures as a way to communicate: the application working in a simple fashion – showing categories of things that people want to map, each category is leading to more specific options – the information can be captured and deciding if we want to save information and we can collect video and audio that are geotagged. In 3 simple steps, information can be captured. The process starts with a dialogue of what important for the communities, and then with this agreement on what will be collected. We do explore the usability of the application. About 70% can use the application, but 30% have a problem with categories – you follow a path of mapping banana, avocado and cacao – this requires categories, e.g. one of the set. Some participants found that confusing. Adding more icon to the category is becoming more complex. One approach was to test audio feedback in a local language – explaining the icons and what they mean. The experiments with the audio feedback help a bit, but not a lot. The next step was to go directly to the final icons and go directly to the final card – adding an NFC chip and adding the control to it. Participant finds the specific icon and then touch the card with the phone. With Tap&Map the success rate gets close to 100%.

Julia – the next issue is making sure that communities can manage their data- the vision is of intelligent maps  – having data collection, then local data repository and management, and then visualisation. But there is a challenge of the mapping and this was done by using UAVs and creating within a short time a high-resolution imagery. However, people don’t need maps as they know their area, but the maps are for communication. The maps are being used to check how the map is used – people felt under a lot of pressure when using the map. and the next experiment was not to put under pressure, and instead of doing a treasure hunt: going and looking for data by trying to find German Christmas decorations. The tracks of the people who participated in the study we can see how they looked for information. What we know is that people can use maps and understand them – the reference map. Now we want the thematic information – so when people take ownership and correct issues: this was done using the icons that were used as a resource and then to correct information. People were doing well in correcting information using a Tap&Map approach. We get feature corrections over 90%. This an ad-hoc approach: even without much exposure – we need to allow people to be sensors and the brains behind it.

Forest hunter-gatherers and Extreme Citizen Science: Reporting wildlife crime in collaboration with local and indigenous communities in Cameroon through community-led co-design – Simon Hoyte work in Cameroon for the last year and a half with Baka hunter-gatherers. Working in Cameroon in the south-east corner.Working with Dja reserve, working ZSL and 5 communities. In Cameroon, there are many issues with conservation – gorillas, chimpanzees, parrots, pangolins and elephants. Indigenous communities are lots of time are forgotten – those groups are familiar with the forest, with knowledge of 50,000 years and colonial approaches exclude. The technologies that are being used are Sapelli data collection tool, then there is the data management tool GeoKey and the CommunityMaps from Mapping for Change. The process starts with the community free prior informed consent – first starting with the concerns of the community and also building trust by staying overnight in the village and connect on a personal level. That is an important recommendation. Icons are being drawn from the sand, to a paper and then into the app. Functional actions changed from tick to thumbs app, or recording changed. XML layout of the project allow changes in the field. The second recommendation is the co-design that increases motivation. Audio and video are allowing information to be shared, including tracks – it allows a verification. Audio provides more information. Describing what people found. Indicators on the device are important – when recording is active a red icon allows you to see that something is working. The phone is checking for connection every 4 minutes. Using ID screen to recognise reported – can be used elsewhere. The community protocol also addresses who manage who will manage the phone and look after it. The report is upload and shared with the authorities – we need the diverse outcome. So in summary: trust building, co-design, media, feedback, simple tools, anonymous ID, community-led, and diverse outcomes. The map providing further more information.

Community based monitoring of tropical forests using information and communication technology (ICT) – Søren Brofeldt an example for a study that rely on Sapelli and expand the software to create the Prey Lang App: working in Cambodia, in the Prey Lang – 200,00 people who rely on the forest, and huge pressure of deforestation and a lot of the logging is illegal and it is supposed to be protected. The Prey Lang Community Network (PLCN) created around 2005-2007 and it is now a group of 600 people who are doing work over that last 10 years, and patrolling the area, confiscating chainsaw and catch wood and logs. Trying to address logging in the area. 2013 they try to communicate the problem to international society – to do what they wanted to set a forest monitoring programme and create a system to document illegal logging and provide evidence-based advocacy. The issue is to compile information and document breaches. The data is captured by Sapelli, and the information is validated by PLCN and scientists, which then helped in compiling report locally and globally, which then led to the positive platform. The platform was tweaked a bit and include information through a decision tree, they have different aspects. The things that they developed: unique functions – choosing icons or doing activities – they had basic activities in the first version: they have seen it as too simple. They started with 9 basic functions with 614 end-points of activities. By the third version, they had 9 functions, and 1663 options: types of trees, types of information, species and so on. They now have 10 functions (e.g. dropdown, word complete). Complexity does not lead to incorrect use (if training is adequate and added functionality is done in co-designed way). When people are experienced – people who use the app for 2 years can get into more complex functionality over time. Some of the issues with data – poor documents, double counting. over time, human errors are decreasing, and also technical issues. Poor connectivity and technical issue are a major issue – more than local ability to use. High quality is possible with active data management is needed.

Designing Human-Computer Interaction for Citizen Science Initiatives in Rural Developing Regions – Veljko Pejovic & Artemis Skarlatidou we need to understand how we move initiative from developed to developing regions in citizen science application. ICT4D point to environmental constraints: roads, electricity, There are also that this area lack skills in the workforce and cultural constraints. Clashes with assumptions. in the Extreme Citizen Science context: we need to identify solution adaptation in participatory design, there is a need for holistic implementation, and we need to make sure that we think about the whole process – from data collection to policy and this challenging. Finally, we also to consider the champions and engaging then (the book “Geek Heresy” by Toyama talks about it). The aim is to identify guidelines – this was done through participatory studies that are similar in the rural developing world and carried out 9 interviews with researchers with extensive experience in the field. An hour-long interviews x 2. The questions explored different aspects including interactions. The finding – need to mobilise the community by taking into account societal organisation (e.g. egalitarian aspects). Need to find local champions. We need to identify the ecosystem of the technology: chargers, cables. Also need to consider how the technology that was built to a different context work: rough fingertips, reflection in the screens and so on. There is also the issue of using hierarchical icon organisation which is pretty intuitive for educated people but it is challenging for participants (users) and also navigation buttons. This matches evidence from Medhi et al. Chi 2013. Juxtaposing this with illiterate users in urban Brazil, they managed to deal with hierarchical organisation and navigation – might be that the exposure to smartphones helped in developing these hierarchies. Icon design is different, but we can see that realistic icons with context are more suitable to use, not just an object. There are issues of actions and how to represent them. Getting honest feedback on the spot is a challenge – users don’t criticise before (Dell et al CHI 2012 – “yours is better”). Long trust relationship help in getting honest feedback. The participants lack the vocabulary to discuss HCI issues. To maintain motivation, there is a need to make data collection visible and ensure the real-world impact of data collection. Recommendation: develop context-specific apps – not genetic, and consider application interface that matches user’s skills and geographical information is a key.

Introducing user issues of the Global Forest Watch application – Jamie Gibson – developing with Vizzuality better maps and visualisation. Trying to think of citizen-focused GIS, interacting with the citizen in the design. Global Forests Watch (GFW) was developed in the last 3 years, and it is allowing to see the world’s forest and how they change. They wanted to tell a simple story: where forest is gained and lost. With few clicks, you can see the impact of conservation. GFW allow seeing how deforestation is implemented and how it is stopped. There is a need for global engagement – opening it to a whole crowd of people. Forest don’t have a connection to the web, and try to take data online to the field, walk to the area, investigate recent forest loss and report new areas – 4000-5000 users. They aim to integrate citizens into the design process. Forest Watcher is being used in important areas of the world and not where the most connected people area. They analyse where people use the app – when there are forest fires in Spain, people are updating GFW and explore. Use the analytics to find the places where we want more people to look and explore. This is integrated with interviews and usability testing. Working with experts who been working for a long time – including Jane Goodall Institute, Amazon Conservation Team, CAGDF, and BirdLife. As people use the application they build ownership and they provide a better feedback and richer information. In terms of what they learn, including the use of persona to think about monitors: need to have lots of other things that try to sync after the 14 days offline – the internet is slow and changed the app and the back end to make it faster. Use it to understand frustrations and find ways to wow moment. Face, name and story improve the quality of the thinking and understand their frustrations.

Lessons learned from Missing Maps – Jorieke Vyncke Her personal background is in interest in work that links to humanitarian purposes, and since 2017 is the missing maps coordinator. She is looking at the humanitarian organisation focus -more than 34,000 staffers in MSF and about 470 locations around the world. In many parts of the world there are empty maps and not geographical data. They discover OpenStreetMap and working with the American and British Red Cross, HOT and over 40 partners. They have principles from the Ostrom on working with groups. They compare rural and urban parts. In Idjiwi in DRC, the east of Congo – working with a multitude of problems: violence, refugees and more. Due to a measles outbreak, they needed population and mapping data. Included 250 remote volunteers who mapped 28,000 building in about a week. This helped in creating population estimation – critical for the logistical planning. They managed to identify 94% of the population. An example from Bangladesh in Hazaribagh informal settlement. The area was mapped with both local and remote mapping – including factories and tanneries – locating the workers that they wanted to reach – combining students from the university with workers that were reached through the union. The experience of mapping is done by the technical local students to make things happen. Using smartphones and field papers process. Paper is still effective, and then also the edit data in pairs on how to do the mapping – the end result provided an occupational health survey. The process motivated the community and they continue to use it. In different areas, they use remote mapping but the most important thing is to create a local mapping community and that makes a decision between empowerment and remote mapping with the importance of saving life.

Keynote: Approximated Reality: the use of digital tools by traditional communities in the Amazon – Vasco van Roosmalen working in Ecam – Equipe Conservacao Amazonia in Brazil since 1999. The big challenge is how to reconcile different visions of what the world is. In the Xingu area in Brazil, there was a need to create an ethno-map of the region. The community discusses what they want to map and how they want to represent them, but it also needed to be cartographically accurate as this is how you communicate with external bodies. The whole map is created for the community: to use resources, to remember the dead and to defend their land (using patterns of body paint). We can see that protected areas in the Xingu. Another area that he was involved in mapping is near Surinam – in an area the size of Holland with 2000 people, the community recorded information about their region. This helped in justifying the resources and the protection of the area. An area that is very rough to access, and the local survey by the community managed to map the area done that in 6 maps. The community collected much more data than what the map can show – over the coming years, they mapped with different groups millions of hectares and they developed a process of creating the maps. The collaboration with Google Earth Outreach led to the interaction with Chief Amir of the Surui. The link with commitment with Rebecca Moore helped in filling up areas that are missing and attaching video and audio to the map. They then wanted to record illegal logging using mapping tools and this was done with OpenDataKit – the data collection challenges are accuracy, ease of use, speed, etc. In 2008 started to understand REDD and developed the Surui Carbon Project – need a tremendous amount of data from the air and from the ground. The use of information such as the circumference of trees was done with ODK. They use Garmin devices: they weren’t scratch resistance. Now they use a Samsung smartphones that are cheap and can be replaced easily. For the GPS in the rainforest, it is challenging and they use barcode on the trees. They used the ODK build but discovered that it is not an easy interface: using a programmer in the staff and that is a limitation in terms of allowing to build forms easily. The project managed to demonstrate that indigenous people can collect data but the REDD credits were more challenging and they got them in 2013. Cultural maps where created in other indigenous lands in Brazil. The importance not just to demarcate the land but to collect data and help them to manage the area. Today there are many challenges – 13% of the Brazilian territory. In the Brazilian Amazon, there are many communities – 25 mil people of which only 350,00 indigenous for example, Quilombola groups and many other groups. There was no information on other groups and some of them are disadvantaged – e.g. Quilombola required mapping 7000 communities, they are descendent of West African slaves – they were persecuted, faced a lot of violence, and when slavery was abolished they were forgotten, but from the 1980s they are recognised in the constitution, but not enough recognised officially. His team was involved in creating a new map of the 7000 communities for which only on a team of 40 is looking after in the government level in Brasilia. They used approaches that are similar to the Indigenous mapping in order to record information and manage the land. They had people who became experts in mapping and then demonstrating how to map the land using google earth and demonstrating data collection. The communities also collect socio-economic data – using ODK and understanding their community and developing a life plan for the area (plan for the next 10-30 years). The question is who is listening to the information but by whom. A social network analysis of Facebook (which is 83% of users in Brazil use) Looking at interactions show that local association are not linked to environment, human right and there is missing links to health, to a specific campaign on the Belo Monte Power Plant but it is not linked to the community. They care about health, education, income, and only fifth is the environment – need to talk about what matters to communities. How to make conversations about them in the centre of the discussion and move beyond putting them in the corner of the environment. We need to engage with people with their communities in a way that makes sense to them.

 

 

 

 

 

 

 

Advertisements

GSF-NESTI Open Science & Scientific Excellence workshop – researcher, participants, and institutional aspects

The Global Science Forum – National Experts on Science and Technology Indicators (GSF-NESTI) Workshop on “Reconciling Scientific Excellence and Open Science” (for which you can see the full report here) asked the question “What do we want out of science and how can we incentivise and monitor these outputs?”. In particular, the objective of the workshop was “to explore what we want out of public investment in science in the new era of Open Science and what might be done from a policy perspective to incentivise the production of desired outputs.” with an aim to explore the overarching questions of:
1. What are the desirable (shorter-term) outputs and (longer-term) impacts that we expect from Open Science and what are potential downsides?
2. How can scientists and institutions be incentivised to produce these desirable outcomes and manage the downsides?
3. What are the implications for science monitoring and assessment mechanisms?

The session that I was asked to contribute to focused on Societal Engagement: “The third pillar of Open Science is societal engagement. Ensuring open access to scientific information and data, as considered in the previous sessions, is one way of enabling societal engagement in science. Greater access to the outputs of public research for firms is expected to promote innovation. However, engaging with civil society more broadly to co-design and co-produce research, which is seen as essential to addressing many societal challenges, will almost certainly require more pro-active approaches.
Incentivising and measuring science’s engagement with society is a complex area that ranges across the different stages of the scientific process, from co-design of science agendas and citizen science through to education and outreach. There are many different ways in which scientists and scientific institutions engage with different societal actors to informing decision-making and policy development at multiple scales. Assessing the impact of such engagement is difficult and is highly context and time-dependent“.

For this session, the key questions were

  • “What do we desire in terms of short and long-term outputs and impacts from societal engagement?
  • How can various aspect of scientific engagement be incentivised and monitored?
  • What are the necessary skills and competencies for ‘citizen scientists’ and how can they be developed and rewarded?
  • How does open science contribute to accountability and trust?
  • Can altmetrics help in assessing societal engagement?”

In my talk, I’ve decided to address the first three questions, by reflecting on my personal experience (so the story of a researcher trying to balance the “excellence” concepts and “societal engagement”), then consider the experience of the participants in citizen science projects, and finally the institutional perspective.


I’ve started my presentation [Slide 3] with my early experiences in public engagement with environmental information (and participants interest in creating environmental information) during my PhD research, 20 years ago. This was a piece of research that set me on the path of societal engagement, and open science – for example, the data that we were showing was not accessible to the general public at the time, and I was investigating how the processes that follow the Aarhus convention and use of digital mapping information in GIS can increase public engagement in decision making. This research received a small amount of funding from UCL, and later from ESRC, but not significantly.

I then secured an academic position in 2001, and it took to 2006 [Slide 4] to develop new systems – for example, this London Green Map was developed shortly after Google Maps API became available, and while this is one of the first participatory GIS applications on to of this novel API, this was inherently unfunded (and was done as an MSc project). Most of my funded work at this early stage of my career had no link to participatory mapping and citizen science. This was also true for the research into OpenStreetMap [Slide 5], which started around 2005, and apart from a small grant from the Royal Geographical Society, was not part of the main funding that I secured during the period.

The first significant funding specifically for my work came in 2007-8, about 6 years into my academic career [Slide 6]. Importantly, it came because the people who organised a bid for the Higher Education Innovation Fund (HEIF), realised that they are weak in the area of community engagement and the work that I was doing in participatory mapping fit into their plans. This became a pattern, where people approach with a “community engagement problem” – so there is here a signal that awareness to societal engagement started to grow, but in terms of the budget and place in the projects, it was at the edge of the planning process. By 2009, the investment led to the development of a community mapping system [Slide 7] and the creation of Mapping for Change, a social enterprise that is dedicated to this area.

Fast forward to today [Slide 8-10], and I’m involved in creating software for participatory mapping with non-literate participants, that support the concept of extreme citizen science. In terms of “scientific excellence”, this development, towards creating a mapping system that anyone, regardless of literacy can use [Slide 11] is funded as “challenging engineering” by EPSRC, and as “frontier research” by the ERC, showing that it is possible to completely integrated scientific excellence and societal engagement – answering the “reconciling” issue in the workshop. A prototype is being used with ZSL to monitor illegal poaching in Cameroon [Slide 12], demonstrating the potential impact of such a research.

It is important to demonstrate the challenges of developing societal impact by looking at the development of Mapping for Change [Slide 13]. Because it was one of the first knowledge-based social enterprises that UCL established, setting it up was not simple – despite sympathy from senior management, it didn’t easily fit within the spin-off mechanisms of the university, but by engaging in efforts to secure further funding – for example through a cross universities social enterprise initiatives – it was possible to support the cultural transformation at UCL.

There are also issues with the reporting of the impact of societal engagement [Slide 14] and Mapping for Change was reported with the REF 2014 impact case studies. From the universities perspective, using these cases is attractive, however, if you recall that this research is mostly done with limited funding and resources, the reporting is an additional burden which is not coming with appropriate resources. This lack of resources is demonstrated by Horizon 2020, which with all the declarations on the importance of citizen science and societal engagement, dedicated to Science with and for Society only 0.60% of the budget [Slide 15].

Participant experience

Alice Sheppard presenting her escallatorWe now move to look at the experience of participants in citizen science projects, pointing that we need to be careful about indicators and measurements.

We start by pointing to the wide range of activities that include public engagement in science [Slide 17-18] and the need to provide people with the ability to move into deeper or lighter engagement in different life stages and interests. We also see that as we get into more deep engagement, the number of people that participate drop (this is part of participation inequality).

For specific participants, we need to remember that citizen science projects are trying to achieve multiple goals – from increasing awareness to having fun, to getting good scientific data [Slide 19] – and this complicates what we are assessing in each project and the ability to have generic indicators that are true to all projects. There are also multiple learning that participants can gain from citizen science [Slide 20], including personal development, and also attraction and rejection factors that influence engagement and enquiry [Slide 21]. This can also be demonstrated in a personal journey – in this example Alice Sheppard’s journey from someone with interest in science to a citizen science researcher [Slide 22].

However, we should not look only at the individual participant, but also at the communal level. An example for that is provided by the noise monitoring app in the EveryAware project [Slide 23] (importantly, EveryAware was part of Future Emerging Technologies – part of the top excellence programme of EU funding). The application was used by communities around Heathrow to signal their experience and to influence future developments [Slide 24]. Another example of communal level impact is in Putney, where the work with Mapping for Change led to change in the type of buses in the area [Slide 25].

In summary [Slide 26], we need to pay attention to the multiplicity of goals, objectives, and outcomes from citizen science activities. We also need to be realistic – not everyone will become an expert, and we shouldn’t expect mass transformation. At the same time, we shouldn’t expect it not to happen and give up. It won’t happen without funding (including to participants and people who are dedicating significant time).

Institutional aspects

The linkage of citizen science to other aspects of open science come through DITOs bus in Birmingham participants’ right to see the outcome of work that they have volunteered to contribute to [Slide 28]. Participants are often highly educated, and can also access open data and analyse it. They are motivated by contribution to science, so a commitment to open access publication is necessary. This and other aspects of open science and citizen science are covered in the DITOs policy brief [Slide 29]. A very important recommendation from the brief is that recognition that “Targeted actions are required. Existing systems (funding, rewards, impact assessment and evaluation) need to be assessed and adapted to become fit for Citizen Science and Open Science.”

We should also pay attention to recommendations such as those from the League of European Research Universities (LERU) report from 2016 [Slide 30]. In particular, there are recommendations to universities (such as setting a single contact point) and to funders (such as setting criteria to evaluate citizen science properly). There are various mechanisms to allow universities to provide an entry point to communities that need support. Such a mechanism is called “science shop” and provide a place where people can approach the university with an issue that concerns them and identify researchers that can work with them. Science shops require coordination and funding to the students who are doing their internships with community groups. Science shops and centres for citizen science are a critical part of opening up universities and making them more accessible [Slide 31].

Universities can also contribute to open science, open access, and citizen science through learning – such as, with a MOOC that designed to train researchers in the area of citizen science and crowdsourcing that we run at UCL [Slide 32].

In summary, we can see that citizen science is an area that is expanding rapidly. It got multifaceted aspects for researchers, participants and institutions, and care should be taken when considering how to evaluate them and how to provide indicators about them – mix methods are needed to evaluate & monitor them.

There are significant challenges of recognition: as valid excellent research, to have a sustainable institutional support, and the most critical indicator – funding. The current models in which they are hardly being funded (<1% in NERC, for example) show that funders still have a journey between what they are stating and what they are doing.


Reflection on the discussion: from attending the workshop and hearing about open access, open data, and citizen science, I left the discussion realising that the “societal engagement” is a very challenging aspect of the open science agenda – and citizen science practitioners should be aware of that. My impression is that with open access, as long as the payment is covered (by funder or the institution), and as long as the outlet is perceived as high quality, scientists will be happy to do so. The same can be said about open data – as long as funders are willing to cover the costs and providing mechanisms and support for skills, for example through libraries then we can potentially have progress there, too (although over protection over data by individual scientists and groups is an issue).

However, citizen science is opening up challenges and fears about expertise, and perceptions about it risking current practices, societal status, etc. Especially when considering the very hierarchical nature of scientific work – at the very local level through different academic job ranking, and within a discipline with specific big names setting the agenda in a specific field. These cultural aspects are more challenging.

In addition, there seem to be a misunderstanding of what citizen science is and mixing it with more traditional public engagement, plus some views that it can do fine by being integrated into existing research programmes. I would not expect to see major change without providing a clear signal through significant funding over a period of time that will indicate to scientists that the only way to unlock such funding is through societal engagement. This is not exactly a “moonshot” type funding – pursue any science that you want but open it. This might lead to the necessary cultural change.

OECD Open Science and Scientific Excellence Workshop – Paris

The OECD organised and hosted a Global Science Forum (GSF) and National Experts on Science and Technology Indicators (NESTI) Workshop on  “Reconciling Scientific Excellence and Open Science: What do we want out of science and how can we incentivise and monitor these outputs?” (9 April, 2018, OECD). In agreement with the OECD Secretariat, the information here is not attributed to anyone specific (Here is the blog post about my own presentation).

The workshop opened with the point that speaking about reconciling open science and science seem contradictory. Scientific excellence was based on the value of publications, but the digital transformation and the web have changed things – from elite access to a library where outputs are held to one that is available to everyone over the web, and we can see citizens accessing data. We also need to look at the future – opening even more, which is something positive but there are challenges in measuring, the impact of different bibliometrics and other indicators.

The openness happens quickly, and we need to understand the transformation and then think about the statistical aspects of this information. There is an effort of developing a roadmap to see the integration of open science across science policy initiatives.

The area is fairly complex: excellence, how science is changing, incentivise and measuring science – all these are tightly related to each other. Some of the fundamental questions: what do we want from science? only excellence or other things? How can we incentivise the academic community to move in the direction of open science – and what the policy community of science need to do about it. National Statistical communities and Global Science Forum are two important groups that can influence it in terms of policy and the measurement the impacts and processes.

The meeting is looking at open science, publishing, open data, and engagement with society, as well as indicators and measurement.

The slides from all the talks are available here. 

Session 1. Scientific excellence through open science or vice versa? What is excellence and how can it be operationalised in the evidence and policy debate?

Paula Stephan (Georgia State University, USA) addressed the challenges of science – lack of risk-taking, and lack of career opportunities to Early Career Scientists in their research. The factors that impact that – especially short-term bibliometrics and then, how open science can help in dealing with the issues.

The original rationale for government support science is the high risk that is associated with basic research. The competitive selective procedures reducing risk and leading to safer options to secure funding (including NIH or ERC). James Rothman who won Nobel prize in Physiology pointed that in the 1970s there was a much higher level of risk that allows him to explore things for 5 years before he started being productive. Concerns about that aspects appeared by AAAS in 2008 ARISE report, and NASA and DARPA became much more risk-averse.

In addition, there is lack of career opportunities for ECRs – the number of PhD is growing, but the number of research position declining – both in industry and academia. Positions are scare and working in universities is an alternative career. Because of the way that the scarce jobs or research applications are based on short citation windows – high impact journal paper is critical for career development. Postdocs are desperate to get a Nature or Science paper. Assessment of novel papers (papers that use references never before made together) showed that only 11% of papers are novel, and highly novel papers is associated with risk: disproportionate concentration at the top and bottom in citations distribution, and also get cited outside the field. The more novel the paper is, the less likely it is to appear in high ranking journal. The bibliometrics discourage researchers from taking these risks with novel paper.

Open science gives opportunity – citizen science give an opportunity for new ways of addressing some issues  – e.g. through crowdfunding to accommodate risky research. In addition, publication in open access can support these novel paper strategies.

Richard Gold (McGill University, Montreal, Canada) looked at why institutions choose open science – exponentially increasing costs of research, but it’s not enough and there are requests to ask for more funding. Productivity is declining – measured by the number of papers per investment. Firms are narrowing their focus of research.

We can, therefore, consider Open Science partnerships – OA publications, Open Data and no patents on co-created outputs as a potential way to address these challenges. This can be centred around academic and not-for-profit research centre, and generally about basic understanding of scientific issues, with data in the centre. Institutions look at it as a partial solution – decreasing duplication as no need to replicate, provide quality through many eyes, and providing synergies because there is a more diverse set of partners. It can increase productivity because data can be used in different fields, using wider networks of ideas and the ability to search through a pool of ideas. We can see across fields – more researchers, but fewer outputs in. In patent applications, we see that also the 1950s was the recent peak in novelty in terms of linking unrelated field, and this is dropping since.

An alternative to this is a system like the Structural Genomics Consortium – attracting philanthropic and industrial funding. There is also a citizen science aspects – ability to shape the research agenda in addition to providing the data. The second thing is that the data can be used with their communities – patients and indigenous groups are more willing to be involved. Open science better engages and empower patients in the process – easier to get consent.

Discussion: during the selection of projects, the bibliometrics indications need to be removed from the application and from funding decisions. Need people to read the research ideas, and need to move away from funding only a single person as the first author – need to incentivise teams and support. Need to think how to deal with impact of research and not only on the original research (someone might use the dataset that was produced in open science for a publication, not by the person who did the work).

There is a sense that the “lack of risk-taking” is an issue, but there is a need for measuring and showing if it is happening. Lots of scientists censuring their work and there is a need to document this happening. The global redistribution of people is about which areas people concentrate on – e.g. between physics and agriculture.

Session 2 – Open access publication and dissemination of scientific information

Rebecca Lawrence (Faculty of 1000) described how F1000 is aiming to develop a different model of publication – separating publication from evaluation. The publication is there because of funders and researchers evaluate others around where they publish. There are all sort of manipulations: overselling, p-value fishing, creative outliers, plagiarism, non-publication by a journal that don’t want low impact papers and more. There is a growing call for the move towards open access publication – e.g. the open science policy platform, European open science cloud, principles such as DORA, FAIR (Findable, Accessible, Interoperable, Reusable) and an increase of pre-print sources. There is also a new range of how science is being organised – how to make it sustainable in areas where there aren’t receiving much funding – use of pre-print services, and also exploring the peer review funding. F1000 is about thinking about the speed of s finding. The model was developed with Wellcome, Gates foundation and creating a platform that is controlled by funders, or institutions, and by researchers. In this model, publishers are service providers. F1000 support a wide range of outputs: research article, data, software, methods, case studies. They check that the paper technically: is the data behind it accessible and that it was not published before. The publication is done a complete open peer review – so you can see who is reviewing and what was done by the author. Within the article, you can see the stage in the research – even before peer review. Making the paper a living document – usually 14 days between submission and publication, and usually a month including being reviewed. The peer review here is transparent and the reviewers are being cited. This is good for ECRs to gain experience.

The indicators need to take into account career levels, culture (technical and reflective) and not only fields, and thinking about different structures – individual, group, institution. Need open metrics, and certain badges that tell you what you are looking for and also qualitative measures- traditional publications can curate articles.

2. Vincent Tunru (Flockademic, Netherlands) explored the issue of incentivising open science. Making science more inclusive – making more people being able to contribute to the scientific process. Open access can become the goal instead of the means to become more inclusive. If the information is free, people can read the results of publicly funded research, but there is a barrier to publish research within the OA model – publication costs should be much lower: other areas (music, news) have gone down in costs because of the internet. In some disciplines, there is the culture of sharing pre-print and getting feedback before submission to journals – although places like ArXiv is doing the work. The primary value of the submission to a journal is the credentialing, High-level journals can create scarcity to justify the demand. Nature scientific reports is taking over PLOS ONE because of that. We need to decouple credentialing from the specific journals. Different measures of excellence are possible, but we need to consider how we do it today – assuming that it is reviewers and editors are the ones who consider what excellence means. Need to focus on inclusivity and affordability. [See Vincent blog post here]

Kim Holmberg (University of Turku, Finland) focused on altmetrics –  Robert Merton pointed already in the 1950s that the referencing system is about finding a work that wasn’t known before but also about recognition of the other researchers. That leads then to how the journal impact factor and the H-Index became part of research assessment. These are being used more and more in research evaluation especially in the past 15 years. Earlier research has pointed out many flaws with them. In addition, they fail to take into account the complexity of scientific activities, nor do they tell you anything about the societal impact of research. One way to look at the complexity is the Open Science Career Assessment Matrix (OS-CAM).

We can think about the traces that people leave online as they go through the research process – discussing research ideas, collecting data, analysing, disseminating results. These traces can become altmetrics – another view of research activities. It is not just social media: the aim is to expand the view of what’s impact is about. With altmetrics we can analyse the networks that the researcher is involved in and that can give insights into new ways of interaction between the researcher with society. Citations show that a paper has been used by another researcher, while altmetrics can indicate how it has been disseminated and discussed among a wider audience. But there are still lots of questions about the meaning and applicability of altmetrics.

There are reports from the Mutual Learning Exercise europa.eu/!bj48Xg – looking at altmetrics, incentives and rewards for open science activities. For instance, in the area of career & research evaluation, researchers need specific training and education about open science, and in the area of evolving authorship identifying and rewarding peer review and publishing of negative results need to be developed. Implementation of open science needs to guarantee long-term sustainability and reward role-models who can provide a demonstration of this new approach to involving in science. The roadmap from the MLE suggests a process for this implementation.

Discussion: there is the issue of finding a good researcher in a group of researchers and publications is a way to see the ideas, but the link to open science and how it can help in that is unclear. However, finding a good researcher does not happen through all these metrics – it’s a human problem and not only a metric. Will originality be captured by these systems? Publication is only small part of the research activity – in every domain, there is a need to change and reduce the publication, but not only to think that someone will read the same paper again and again (after each revision). Attention is the scarce resource that needs to be managed and organised not to assume that more find a way to filter the information.

The response to this pointed that because of research funding is public, we should encourage publishing as much as possible so others can find the information, but we need good tools for searching and evaluating research so you can find it.

Another confusion – want to see the link between open access publication and open science. Open access can exist in the publish or perish structure. What is it in OA that offer an alternative to the close publishing structure. How can that lead us to different insight into researchers activities? In response to this, it was pointed out that it is important to understand the difference between Open Access and Open Science (OA = openly available research publications, OS = all activities and efforts that open the whole research process, including publishing of research results).

There is growing pressure for people to become media savvy and that means taking time from research.

Altmetrics: originally thought of as a tool that can help researchers find interesting and relevant research, not necessarily for evaluation (http://altmetrics.org/manifesto/).

Discussion: there is the issue of finding a good researcher in a group of researchers and publications is a way to see the ideas, but the link to open science and how it can help in that is unclear. However, finding a good researcher is not through all these metrics – it’s a human problem and not only a metric. Will originality be captured by these systems? Publication is only small part of the research activity – in every domain, there is a need to change and reduce the publication, but not only to think that someone will read the same paper again and again (after each revision). Attention is the scarce resource that needs to manage and organised not to assume that more find a way to filter the information.

The response to this pointed that because of research funding is public, we should encourage publishing as much as possible so others can find the information, but we need good tools for searching and evaluating research so you can find it.

Another confusion – want to see the link between open access publication and open science. Open access can exist in the publish or perish structure. What is it in OA that offer an alternative to the close publishing structure. How can that lead us to different insight into researchers activities?

There is growing pressure for people to become media savvy and that means taking time from research.

Altmetrics: originally as a tool that can help other researchers, not necessarily for evaluation.

Session 3. Open research data: good data management and data access

Simon Hodson (CODATA) – Open Science and FAIR data. The reconciling elements – the case for open science is the light that it shines on the data and make it useful. It allows reuse, reproducibility, and replicability – it is very much matching each other. CODATA is part of the International Council for Science – focusing capacity building, policy, and coordination. The case for open science – good scientific practice depends on communicating the evidence. In the past, a table or a graph that summarises some data was an easy way of sharing information, but as data and analysis grew, we need to change the practice of sharing results. The publications of “Science as an open enterprise” (2012), including pointing that the failure to report the data underlying the science is seen as malpractice. Secondly, open data practices transform certain areas of research – genomics, remote sensing in earth systems science. Can we replicate this in other research areas? Finally, can we foster innovation and reuse of data and finding within and outside the academic system – making it available to the public at large.

Open science has multiple elements – open science is not only open access and open data. We need data to be interoperable and reusable and should be available for machine learning and have an open discussion. There are perceptions of reproducibility of research but also change in attitudes. We need to think about culture – how scientific communities established their practices. In different research areas, there are very different approaches – e.g. in biomedical research, this is open but in social science, there is little experience of data sharing and reuse and don’t see benefits. There is a need for a sociology of science analysis of these changes. Some of these major changes: meetings about genome research in Bermuda and Fort Lauderdale agreement which was because of certain pressures. There is significant investment in creating data that is not being used once – e.g. Hubble. Why data across small experiments is not open to reuse? We need to find making this happen.

FAIR principle allows data to be reusable. FAIR came from OECD work, Royal Society report 2012 and G8 statement. What we need to address: skills, also limits of sharing, need to clarify guidelines for openness. We need to have standards, skills and reward data stewardship. We need to see data citation of data. There is a need for new incentives – the cultural change happened when prominent people in the field set up the agreement.

Fiona Murphy (Fiona Murphy Mitchell Consulting, UK) Working in the area of data publishing and providing the perspective of someone who is exploring how to practice open science. There are cultural issues: why to share, with whom, what rewards, and what is the risk. Technical – how is that is done, what are the workflows, tools, capacity, and time investment. There are issues of roles and responsibilities and who’s problem is it to organise the data.

Examples of projects – SHARC – share research data alliance – international and multi-stakeholders and aim to grow the capacity to share data. The specific group is working a White Paper on recommendations. The main issues are standards for metrics: need to be transparent, need about reputation, and impact on a wider area. Also, what will be the costs of non-sharing? There are different standards in terms of policies, also need persistent identifiers and the ability to reproduce. Equality of access to services is needed – how to manage peer to peer and how is that integrated into promotion and rewards. The way to explore that is by carrying out pilots projects to understand side effects. There is also a need to develop ethical standards.

The Belmont Forum Data Publishing Policy – looking at creating the data accessibility that is part of a digital publication. Developing consistency of message so researchers will know what they are facing. There are lots of issues – some standard wording is emerging, and capturing multiple data sets, clarify licensing etc.

We can also think about what would have started if all the current system was in place – the scholarlycommons.org is suggesting principles for “born digital” scientific practice should evolve. The approach to thinking about commons, they have created some decision trees to help with the project. Working as open scientists is a challenge today – for example, need to develop a decision tree software and other things are proving challenging to act as a completely open scientist. It’s a busy space and there is a gulf between high-level policy and principles and their delivery.

Jeff Spies (Centre for Open Science, Virginia) [via video-link] Jeff is covering open research data, urgent problems, and incremental solutions. looking at strategies that are the most impactful (which is different from the center for open science). We need to broaden the definition of data – we need context: more than just the data itself or the metadata – it is critical for the assessment, metascience work. We can think of knowledge graph – more then the semantic information for the published text, and the relationship of people, place, data, methods, software… but the situation in incentives is – from psychological perspectives, the getting awards for specific publications is so strong that makes the focus on what is publishable. We have rates of retractions go up as impact factor goes up. There is urgency and the lock-in the publishers are trying to capture the life-cycle of research. The problem is that culture change is very slow and we need to protect the data – funders and policymakers that can make a difference. Researchers don’t have the ability to curate data – but libraries are the people that can have a resource for that and focus. Potential – the researcher asked to link to promotion policies and that will force universities to share them, and if the policy mention data sharing (as a way to force universities to change)

Discussion: there is concern about the ability of researchers to deal with data. There is a problem of basic data literacy.

The problem with making the data FAIR it is about 10% of the project costs and where it is useful, or where it is not enough or too much – just organising the data with the librarians is not enough as data requires a lot of domain knowledge. There are significant costs. however, in the same way, that the total costs of science to include the effort of peer review, or getting to publications (either subscription or publication), then we should also pay for the data curation. There is a need for appraisal and decision how data and process will be done.

We need to think about the future use of data – the same as natural history specimens and we can never know what should be done. Questions about the meaning of data are very important – it’s not only specimens but also photographs and not necessarily digital.

Libraries can adapt and can get respects – they are experts in curation and archiving

Session 4. Societal engagement 

Kazuhiro Hayashi (NISTEP, Tokyo, Japan) Open science as a social engagement in Japan. Is in science and technology – is being involved in open access journal and keen about altmetrics – now involved in open science policy. Generally, see multi-role – top down and bottom up – from working in G7 science expert group in open science, and also in creating software and journals. Involved in citizen science NISTEP journal and lectures, and involved in altmetrics, multi-stakeholders workshop and future earth. He would like to showcase studies:

Citizen science – the funding system in Japan for science is coming from the state mainly and they have a difficult time to do public engagement – spontaneous researchers “wild researchers”. Suggesting a more symmetrical system – creating also independent researchers which are getting their budget from a business and they publish in online journals. Wild researchers are based on crowdfunding and relay on the engagement of citizens. From his experience, recognise the new relationship between citizens and scientists: research style, new career paths and funding. Negative aspects of citizen science include populism in crowdfunding – need to be popular but not suitable for the crowd. Als need a new scheme for ECRs and need to include it. Also, there is a potential for misuse and plagiarism because of lack of data and science literacy.

Altmetrics – contributed to NISO Altmetrics initiative working group – difficult to define, and current altmetrics scores in Japanese literature are closely related to Maslow’s hierarchy of need. There are plenty of institutional repositories that – access to journal articles on repositories is more social – readers are non-researchers who would go to journal websites. Need to look at social impact – look mentioning and network analysis but it is difficult to analyse. There is need to look at the flow of data across the web.

Multi-stakeholders workshop – considering the future of open science and society. With environmental sciences and informatics. the outcome is to think about erasing influences of different socio-economic status on participants. Co-development of data infrastructure and the action of social transformation. There is an importance in capacity building. Need to see how open science and transdisciplinary work co-evolved. For social engagement – very time-consuming and need to get funded, and need open for creative activities for citizens and scientists. Think about new relationships between science and society. Need to use tentative indicators to transform society and culture – creating a future of open science and society – move from “publish or perish” to “share or perish”. Japan will have 2 citizen science sessions at the Japan open science summit on June 18-19 2018.

Muki Haklay (UCL, London, UK) [see my separate blog post]

Cecilia Cabello Valdes (Foundation for Science and Technology, Madrid, Spain) Societal engagement in open science. The foundation is aimed to promote science link with society – original with interest of increasing interest of the Spanish citizens. They are managing calls and fund different activities (about 3,250K Eur). More than 200 projects. They do activities such as Famelab – giving events to promote science and technology, in an open way. The science news agency – there is lack of awareness of scientific research – the SiNC agency – the papers are taken by general media – over 1000 journalists who use the information. They carry out summer science camps: 1920 funded students funded in 16 universities.They also manage the national museum of science and technology (Muncyt) and they share the history of science and technology in Spain. It’s a unique type of a science museum.

In citizen science, they have done a lot of work in awareness of the public to science and technology, and to keep public support for science investment. More recently they create a council of foundations for science – there wasn’t awareness of social foundations that haven’t invest in science and not only cultural activities. There are 3 foundations that are involved with the council and they are having a direct contact with the minister to develop this area of funding. The second initiative is crowdfunding for science – they help to carry out a campaign that helps in creating activities – it is also a tool of engagement.

Outreach is difficult – the council support policymakers and the general public is aware of the issues. So there are challenges – and that need to transform and how do we measure it? Some of the roles that the council need to do is to incentivise policymakers to understand what they want to achieve and then have indicators to assist in seeing that the goals are achieved. They participated in the process of policy recommendation about open science, and then translate that into action – for policymakers and society. In Fecyt they also provide resources: access to WoS/Scopus, evaluation of journals, standardised CV of researchers, and open science. Finally they participation in studies that look at measurements of science and the results

Discussion: Science Shops – are there examples that link to Maker spaces? Yes, there are examples of activities such as Public Lab but also the Living Knowledge network

Many societal engagements are not open science – they treat society as a separate entity: a struggle of making citizen science into open science – data remain closed. What are the aspects that lend themselves to open science and citizen science? – there are many definitions and there are different ways to define the two, but for example, the need to access publications, or the participation in the analysis of open data, or the production of open data, are all examples for an overlap.

Part of the discussion is about sharing knowledge, the part that says that researcher is like anyone else? There is a big difference between the scientific community and everyone else? The effort is not recognised in society and might you remove the prestige than no one would want to participate in science?

As you know, public interest – why the citizens want to participate in research? the citizens want the result of public research will help people to improve their quality of life. The science should address social problems.

How much people participate in – precipita is a new project and fund are not matched and they provide the technical help, and the promotion is through a campaign through different institutions

Should citizen science democratise science which is controversial – when information became more accessible as in Gutenberg, we are increasing the ability. Need to make citizen science a way to increase access to science.

How to get to integrated science into pockets and need to find a way to integrate these things together. There is a package that needs to support together: access, data, and public engagement and need to focus on them.

Citizen science needs to be integrated into all the science and needs to make results.

Session 5. Scientific Excellence re-visited

David Carr (Wellcome Trust, London, UK) Wellcome is committed to providing their research outputs – seeing it as part of good research practice. As a funder, they’ve had a long-standing policy on open accessing publications (since 2005) and other research outputs. Need to have also the costs of carrying out public engagement, and open access publications should be part of the funding framework. Also asking reviewers to recognise and value a wide range of research outputs. There are still need to think of reward and assessment structures, the sustaining of the infrastructures that are needed, and the need to create data specialists and managing the process to increase it. There are concerns by the research community about open access. Wellcome established open research team – looking at funder led and community-led activities, and also policy leadership. They now have the “WellcomeOpenResearch.org publishing platform” which is using F1000 platform, they also had the open science prize. They also look on policy leadership – e.g. the San Francisco DORA (declaration on research assessment). Also looking at changes to application forms to encourage other forms of outputs and then provide guidance to staff, reviewers and panel members. They also celebrate with applicants when they do open research, and also inform them about the criteria and options. They also carry out effort to evaluate if the open science indeed delivers on the promises through projects in different places – e.g. the McGill project.

Citizen Science & Scientific Crowdsourcing – week 5 – Data quality

This week, in the “Introduction to Citizen Science & Scientific Crowdsourcing“, our focus was on data management, to complete the first part of the course (the second part starts in a week’s time since we have a mid-term “Reading Week” at UCL).

The part that I’ve enjoyed most in developing was the segment that addresses the data quality concerns that are frequently raised about citizen science and geographic crowdsourcing. Here are the slides from this segment, and below them a rationale for the content and detailed notes

I’ve written a lot on this blog about data quality and in many talks that I gave about citizen science and crowdsourced geographic information, the question about data quality is the first one to come up. It is a valid question, and it had led to useful research – for example on OpenStreetMap and I recall the early conversations, 10 years ago, during a journey to the Association for Geographic Information (AGI) conference about the quality and the longevity potential of OSM.

However, when you are being asked the same question again, and again, and again, at some point, you start considering “why am I being asked this question?”. Especially when you know that it’s been over 10 years since it was demonstrated that the quality is beyond “good enough”, and that there are over 50 papers on citizen science quality. So why is the problem so persistent?

Therefore, the purpose of the segment was to explain the concerns about citizen science data quality and their origin, then to explain a core misunderstanding (that the same quality assessment methods that are used in “scarcity” conditions work in “abundance” conditions), and then cover the main approaches to ensure quality (based on my article for the international encyclopedia of geography). The aim is to equip the students with a suitable explanation on why you need to approach citizen science projects differently, and then to inform them of the available methods. Quite a lot for 10 minutes!

So here are the notes from the slides:

[Slide 1] When it comes to citizen science, it is very common to hear suggestions that the data is not good enough and that volunteers cannot collect data at a good quality, because unlike trained researchers, they don’t understand who they are – a perception that we know little about the people that are involved and therefore we don’t know about their ability. There are also perceptions that like Wikipedia, it is all a very loosely coordinate and therefore there are no strict data quality procedures. However, we know that even in the Wikipedia case that when the scientific journal Nature shown over a decade ago (2005) that Wikipedia is resulting with similar quality to Encyclopaedia Britannica, and we will see that OpenStreetMap is producing data of a similar quality to professional services.
In citizen science where sensing and data collection from instruments is included, there are also concerns over the quality of the instruments and their calibration – the ability to compare the results with high-end instruments.
The opening of the Hunter et al. paper (which offers some solutions), summarises the concerned that are raised over data

[Slide 2] Based on conversations with scientists and concerned that are appearing in the literature, there is also a cultural aspect at play which is expressed in many ways – with data quality being used as an outlet to express them. This can be similar to the concerns that were raised in the cult of the amateur (which we’ve seen in week 2 regarding the critique of crowdsourcing) to protect the position of professional scientists and to avoid the need to change practices. There are also special concerns when citizen science is connected to activism, as this seems to “politicise” science or make the data suspicious – we will see next lecture that the story is more complex. Finally, and more kindly, we can also notice that because scientists are used to top-down mechanisms, they find alternative ways of doing data collection and ensuring quality unfamiliar and untested.

[Slide 3] Against this background, it is not surprising to see that checking data quality in citizen science is a popular research topic. Caren Cooper have identified over 50 papers that compare citizen science data with those that were collected by professional – as she points: “To satisfy those who want some nitty gritty about how citizen science projects actually address data quality, here is my medium-length answer, a brief review of the technical aspects of designing and implementing citizen science to ensure the data are fit for intended uses. When it comes to crowd-driven citizen science, it makes sense to assess how those data are handled and used appropriately. Rather than question whether citizen science data quality is low or high, ask whether it is fit or unfit for a given purpose. For example, in studies of species distributions, data on presence-only will fit fewer purposes (like invasive species monitoring) than data on presence and absence, which are more powerful. Designing protocols so that citizen scientists report what they do not see can be challenging which is why some projects place special emphasize on the importance of “zero data.”
It is a misnomer that the quality of each individual data point can be assessed without context. Yet one of the most common way to examine citizen science data quality has been to compare volunteer data to those collected by trained technicians and scientists. Even a few years ago I’d noticed over 50 papers making these types of comparisons and the overwhelming evidence suggested that volunteer data are fine. And in those few instances when volunteer observations did not match those of professionals, that was evidence of poor project design. While these studies can be reassuring, they are not always necessary nor would they ever be sufficient.” (http://blogs.plos.org/citizensci/2016/12/21/quality-and-quantity-with-citizen-science/)

[Slide 4] One way to examine the issue with data quality is to think of the clash between two concepts and systems of thinking on how to address quality issue – we can consider the condition of standard scientific research conditions as ones of scarcity: limited funding, limited number of people with the necessary skills, a limited laboratory space, expensive instruments that need to be used in a very specific way – sometimes unique instruments.
The conditions of citizen science, on the other hand, are of abundance – we have a large number of participants, with multiple skills, but the cost per participant is low, they bring their own instruments, use their own time, and are also distributed in places that we usually don’t get to (backyards, across the country – we talked about it in week 2). Conditions of abundance are different and require different thinking for quality assurance.

[Slide 5] Here some of the differences. Under conditions of scarcity, it is worth investing in long training to ensure that the data collection is as good as possible the first time it is attempted since time is scarce. Also, we would try to maximise the output from each activity that our researcher carried out, and we will put procedures and standards to ensure “once & good” or even “once & best” optimisation. We can also force all the people in the study to use the same equipment and software, as this streamlines the process.
On the other hand, in abundance conditions we need to assume that people are coming with a whole range of skills and that training can be variable – some people will get trained on the activity over a long time, while to start the process we would want people to have light training and join it. We also thinking of activities differently – e.g. conceiving the data collection as micro-tasks. We might also have multiple procedures and even different ways to record information to cater for a different audience. We will also need to expect a whole range of instrumentation, with sometimes limited information about the characteristics of the instruments.
Once we understand the new condition, we can come up with appropriate data collection procedures that ensure data quality that is suitable for this context.

[Slide 6] There are multiple ways of ensuring data quality in citizen science data. Let’s briefly look at each one of these. The first 3 methods were suggested by Mike Goodchild and Lina Li in a paper from 2012.

[Slide 7] The first method for quality assurance is crowdsourcing – the use of multiple people who are carrying out the same work, in fact, doing peer review or replication of the analysis which is desirable across the sciences. As Watson and Floridi argued, using the examine of Zooniverse, the approaches that are being used in crowdsourcing give these methods a stronger claim on accuracy and scientific correct identification because they are comparing multiple observers who work independently.

[Slide 8] The social form of quality assurance is using more and less experienced participants as a way to check the information and ensure that the data is correct. This is fairly common in many areas of biodiversity observations and integrated into iSpot, but also exist in other areas, such as mapping, where some information get moderated (we’ve seen that in Google Local Guides, when a place is deleted).

[Slide 9] The geographical rules are especially relevant to information about mapping and locations. Because we know things about the nature of geography – the most obvious is land and sea in this example – we can use this knowledge to check that the information that is provided makes sense, such as this sample of two bumble bees that are recorded in OPAL in the middle of the sea. While it might be the case that someone seen them while sailing or on some other vessel, we can integrate a rule into our data management system and ask for more details when we get observations in such a location. There are many other such rules – about streams, lakes, slopes and more.

[Slide 10] The ‘domain’ approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate. If we see a monarch butterfly within the marked area, we can assume that it will not harm the dataset even if it was a mistaken identity, while an outlier (temporally, geographically, or in other characteristics) should stand out.

[Slide 11] The ‘instrumental observation’ approach removes some of the subjective aspects of data collection by a human that might make an error, and rely instead on the availability of equipment that the person is using. Because of the increase in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with the ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provides evidence for the quality and accuracy of the information. This is where the metadata of the information becomes very valuable as it provides the necessary evidence.

[Slide 12] Finally, the ‘process oriented’ approach bring citizen science closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include the provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and online resources to learn about data collection and reporting.

[Slide 13]  What is important to be aware of is that methods are not being used alone but in combination. The analysis by Wiggins et al. in 2011 includes a framework that includes 17 different mechanisms for ensuring data quality. It is therefore not surprising that with appropriate design, citizen science projects can provide high-quality data.

 

 

Citizen Science & Scientific Crowdsourcing – week 2 – Google Local Guides

The first week of the “Introduction to Citizen Science and Scientific Crowdsourcing” course was dedicated to an introduction to the field of citizen science using the history, examples and typologies to demonstrate the breadth of the field. The second week was dedicated to the second half of the course name – crowdsourcing in general, and its utilisation in scientific contexts. In the lecture, after a brief introduction to the concepts, I wanted to use a concrete example that shows a maturity in the implementation of commercial crowdsourcing. I also wanted something that is relevant to citizen science and that many parallels can be drawn from, so to learn lessons. This gave me the opportunity to use Google Local Guides as a demonstration.

My interest in Google Local Guides (GLG) come from two core aspects of it. As I pointed in OpenStreetMap studies, I’m increasingly annoyed by claims that OpenStreetMap is the largest Volunteered Geographical Information (VGI) project in the world. It’s not. I guessed that GLG was, and by digging into it, I’m fairly confident that with 50,000,000 contributors (of which most are, as usual, one-timers), Google created the largest VGI project around. The contributions are within my “distributed intelligence” and are voluntary. The second aspect that makes the project is fascinating for me is linked to a talk from 2007 in one of the early OSM conferences about the usability barriers that OSM (or more general VGI) need to cross to reach a wide group of contributors – basically about user-centred design. The design of GLG is outstanding and shows how much was learned by the Google Maps and more generally by Google about crowdsourcing. I had very little information from Google about the project (Ed Parsons gave me several helpful comments on the final slide set), but by experiencing it as a participant who can notice the design decisions and implementation, it is hugely impressive to see how VGI is being implemented professionally.

As a demonstration project, it provides examples for recruitment, nudging participants to contribute, intrinsic and extrinsic motivation, participation inequality, micro-tasks and longer tasks, incentives, basic principles of crowdsourcing such as “open call” that support flexibility, location and context aware alerts, and much more. Below is the segment from the lecture that focuses on Google Local Guides, and I hope to provide a more detailed analysis in a future post.

The rest of the lecture is available on UCLeXtend.

Caren Cooper’s Citizen Science: How Ordinary People are Changing the Face of Discovery

Today, Caren Cooper new book Citizen Science: How Ordinary People are Changing the Face of Discovery is going on sale in the UK. The book has been out in the USA for about a year, and it is a good point to review it.

The library of citizen science books is growing – there are the more literary books such as a diary of a citizen scientist, or citizen scientist, and a growing set of books that are edited collections such as Dickinson and Bonney Citizen Science: Public Participation in Environmental Research or the accessible The Rightful Place of Science: citizen science

Caren Cooper book is adding to this collection something important – a popular science book that provides an overview of the field, phenomena, and the movement of citizen science. As I was reading the book, I recognised the major challenge that she faced. Introducing citizen science is a complex issue: it happens in many areas of science that don’t always relate to each other, it got different structures and relationships between the scientists and the participants, and it can be close and personal, or remote and involving many thousands of people in online activities. In addition to this, citizen science can lead to many outcomes: improving education, contributing to a scientific project, self-empowerment and learning, addressing a local environmental problem and community cohesion, to name but a few. Packing it all into an accessible and engaging book is quite a feat.

Cooper has the experience in communicating citizen science through various blog posts that she published over the past 5 years and some of them have set the ground for this excellent book. The way she balanced the different aspects of citizen science is by taking different scientific fields as the main classification for the chapters, with 10 chapters covering different areas where citizen science have been used – from meteorology to public health. Each chapter provides both the factual information about the type of citizen science that is being used in it, as well as engaging stories and descriptions of the participants in them so we have a real and concrete image of how citizen science is being practiced.

Through the chapters, the reader is becoming familiar with the different modes and mechanisms that are being used in citizen science. For example, she uses the Brony@home project as a way to introduce volunteer computing and showing how the interactions around it can be meaningful and engaging, thus not marginalising this form of citizen science. Another example is the discussions in a later chapter on the use of Patients Like Me as a platform for citizen science, and the way that some of its experiment are challenging common medical practices in the ALS study on the impact of lithium.

One fantastic aspect of the book is the way that it respects and values all the forms of citizen science and the participants, and provide the reader with an opportunity to understand that it can come in many shapes, and she describes the difficulties and triumphs that came out from different studies, different forms of engagement, and different disciplines. She is providing a clear thread to link all these cases through the progression that she makes throughout the book from scientist-led projects (opening with Whewell tide study) and moving towards community-led studies towards the end, with examples from environmental justice campaigns. All these cases are described with warmth and humour that makes the material accessible and enjoyable to read.

Throughout the book, Cooper is making it clear that she sees citizen science as a critical part of the current landscape of science and society relationship, and she addresses some of the issues that are being argued about citizen science – for example, data quality – heads on. The book is making a strong advocacy for scientists and people who are involved in science communication to use citizen science as a way to improve the linkage between society and science.

The book is focusing, mostly, on American projects, case studies and practices – including social and cultural ones, but not to a degree that it makes it difficult to a reader from outside the US to understand. Only in a handful of cases I had to check on Wikipedia what a term or a phrase mean.

Overall, the book is engaging, enjoyable and informative. If you want an up-to-date introduction to citizen science, this book will open up the field to you. If you are working in a citizen science project or involved in developing one, you will learn new things – I did! 

 

 

 

Chapter in ‘Understanding Spatial Media’ on VGI & Citizen Science

77906_9781473949683[1]The book ‘Understanding Spatial Media‘ came out earlier this year. The project is the result of joint effort of the editors Rob Kitchin (NUI Maynooth, Ireland), Tracey P. Lauriault (Carleton University, Canada), and Matthew W. Wilson (University of Kentucky, USA).

The book is filling the need to review and explain what happened in the part 20 years, with the increase use of digital geographic information that then became widespread and can be considered as a media – something that Daniel Sui and Mike Goodchild noted in 2001. The book chapters are covering the underlying technologies, the sources of the data and media that are part of this area, and the implications – from smart cities to surveillance and privacy.

My contribution to this book is in a chapter that belong to the middle section – spatial data and spatial media – and that provides an introduction to Volunteered Geographic Information and Citizen Science. If you’re interested, you can read the chapter here.