Citizen Science Data & Service Infrastructure

Following the ECSA meeting, the Data & tools working group workshop was dedicated to progressing the agenda on data & infrastructure.

Jaume Piera (chair, Data and Tools working group of ECSA) covered the area of citizen science data – moving from ideas, to particular solutions, to global proposals – from separate platforms (iNaturalist, iSpot, GBIF, eBird) but the creation of different citizen science associations and the evolution of ideas for interoperability, can allow us to consider the ‘Internet of People# which is about participatory sharing of data. We can work in similar way to standards development in the area of the internet, and starting to consider the layers: interoperability, privacy/security, data reliability, infrastructure sustainability, data management, intellectual property rights, engagement, Human-Computer Interaction, Reference models and testing. By considering these multiple layers, we can develop a roadmap for development and consider a range of solutions at different ‘layers’. The idea is to open it to other communities – and aim to have solutions that are discussed globally.

Arne Berra explained the CITI-SENSE platform. There is a paper that explains the architecture of CITI-SENSE on the project site. He proposed that we use the European Interoperability Framework — legal, organisational, semantic and technical. in the technical area, we can use ISO 19119 and OGC – with 6 areas: boundary, processing/analytics, data/model management, communication, systems. We can use reference models. Also suggested considering the INSPIRE life cycle model. There is a challenge of adapting standards into the context of citizen science, so in many ways we need to look at it as conceptual framework to consider the different issues and consider points about the issues. In CITI-SENSE they developed a life cycle that looked at human sensor data services, as well as the hardware sensor application platform.


Ingo Simonis (OGC) – a standardised encoding to exchange citizen science data. He describe work that OGC is doing in sensor web for citizen science, and they collected data from different projects. Through citizen science data, information come from different surveys, in different forms and structures. The requirements are to have citizens + environment + sensor. Who did particular measurement? We want to know about the environment – e.g. that it was rainy while they collected the data, and then know about the sensor. So OGC O&M citizen observatories model is conceptual. It’s an observation model – assigning a value to a property – they also look at standards for sensors – OGC SensorML. He used the ISO 19100 series of standards. The observation model is trying to address issues of observations that are happening offline and then being shared. The model also deal with stationary and mobile sensing activities, and allowing for flexibility – for example having ad-hoc record that is not following specific process.


Alex Steblin – The Citclops project includes applications such as Eye on Water ( The Citclops have a challenge of maintaining the project’s data once the project finished.

Veljo Runnel covered EU BON work ( – mobilising biodiversity ata is challenges. They want a registry of online tools for citizen science projects – tool that will allow people who work with citizen science to record information about the project as related to biodiversity – such as link to GBIF, recording DNA, use of mobile app. Finding the person that run the tool is difficult. On EU BON they have ‘data mobilization helpdesk’, the elements of the standard were discussed within the the EU BON consortium and how they are going to explore how to provide further input.

JRC is exploring the possibility of providing infrastructure for citizen science data – both metadata and the data itself.

Translation of technical information into a language that is accessible is valuable for the people who will be using it. We need to find ways to make information more accessible and digestible. The aim is to start developing reference material and building on existing experiences – sub divide the working group to specific area. There are many sub communities that are not represented within the data groups (and in ECSA) and we need to reach out to different communities and have including more groups. There are also issues about linking the US activities, and activities from the small-scale (neighbourhoods) to large organisations. As we work through information, we need to be careful about technical language, and we need to be able to share information in an accessible way.

Eye on Earth (Day 1 – afternoon) – policy making demand for data and knowledge for healthy living

The afternoon of the first day of Eye on Earth (see previous post for an opening ceremony and the morning sessions) had multiple tracks. I selected to attend Addressing policy making demand for data; dialogue between decision makers and providers

wpid-wp-1444139631192.jpgThe speakers were asked to address four points that address issues of data quality control and assurance, identify the major challenges facing data quality for decision-making in the context of crowd-sourcing and citizen science. Felix Dodds  who chaired the session noted that – the process of deciding on indicators for SDGs is managed through the UN Inter-agency group, and these indicators and standards of measurements need to last for 15 years.  There is now also ‘World Forum on Sustainable Development Data’ and review of the World Summit on Information Society (WSIS) is also coming. The speakers are asked to think about  coordination mechanisms and QA to ensure good quality data? How accessible is the data? Finally, what is the role of citizen science within this government information? We need to address the requirements of the data – at international, regional, and national levels.

Nawal Alhosany (MASDAR institute): Data is very important ingredient in making policy when you try to make policy on facts and hard evidence. Masdar is active throughout the sustainability chain, with a focus on energy. The question how to ensure that data is of good quality, and Masdar recognised gap in availability of data 10 years ago. For example, some prediction tools for solar power were not taking into account local conditions, as well as quality assurance that is suitable to local needed. Therefore, they developed local measurement and modelling tools (ReCREMA). In terms of capacity building, they see issues in human capacity across the region, and try to address it (e.g. lack of open source culture). In Masdar, they see a role for citizen science – and they make steps towards it through STEM initiatives such as Young Future Energy Leaders and other activities.

David Rhind (Nuffiled Foundation): many of the data sets that we want cover national boundaries – e.g. radioactive plum from Chernobyl. When we want to mix population and environment, we need to deal with mixing boundaries and complex problems with data integrity. There are also serious problem with validity – there are 21 sub-Saharan countries that haven’t done household survey sine 2006, so how can we know about levels of poverty today? There is a fundamental question of what is quality, and how can we define it in any meaningful sense. Mixing data from different sources is creating a problem of what quality mean. Some cases can rely on international agreements – e.g. N principles, or the UK regulatory authority to check statistics. Maybe we should think of international standards like in accountancy. In terms of gaps in capacity, there is a quick change due to need for analysis and data scientists are becoming available in the UK, but there is issue with policy makers who do not have the skills to understand the information. Accessible data is becoming common with the open data approach, but many countries make official data less open for security. However, data need some characteristics – need to be re-use , easy to distribute, public and with open licensing. The issue about the citizen science – there are reasons to see it as an opportunity – e.g. OpenStreetMap, but there are many factors that make its integration challenging. There is a need for proper communication – e.g. the miscommunication in L’Aquila

Kathrine Brekke (ICLEI) – perspective from local government. Local government need data for decision-making. Data also make it the city suitable for investment, insurance, and improve transparency and accountability. There are issues of capacity in terms of collecting the data, sharing it, and it is even down to language skills (if it is not available in English, international comparison is difficult). There are initiatives such as to allow sharing of city data. There are 100 sustainability indicators that are common across cities and can be shared. In terms of data quality we can also include crowdsourcing – but then need to ensure that it the data will be systematic and comparable. The standards and consistency are key – e.g. greenhouse registry is important and therefore there is global protocol for collecting the data.

Ingrid Dillo (DANS, Netherlands) there is data deluge with a lot of potential, but there are challenges about the quality of the data and trust. Quality is about fitness for use. DANS aim is to ensure archiving of data from research projects in the Netherlands. Data quality in science – made of scientific data quality but also technical. Scientific integrity is about the values of science – standards of conduct within science. There are issues with fraud in science that require better conduct. Data management in small projects lack checks and balances, with peer pressure as major driver to ensure quality – so open science is one way to deal with that. There are also technical issues such as metadata and data management so it can be used and stored in certified trustworthy digital repository.

Robert Gurney (University of Reading) -in environmental science there is the Belmont Forum e-Infrastructures & data management. The Belmont forum is association of environmental science funders from across the world. The initiative is to deal with the huge increase in data. Scientists are early adopters of technology and some of the lessons can be used from what scientists are doing by other people in the environmental sector. The aim is to deliver knowledge that is needed for action. The infrastructure is needed to meet global environmental challenges. This require working with many supercomputers – the problems are volume, variety, veracity, velocity (Big Data) – we getting many petabytes – can reach 100 Petabytes by 2020. The problem is that data is in deep silos – even between Earth Observation archives. The need to make data open and sharable. There will be 10% of funding going towards e-infrastructure. They created data principles and want to have the principle of open by default.

Marcos Silva (CITES)  Cites is about the trade in engendered species . CITES (since mid 1970s)  regulate trade in multi-billion dollar business with 850,000 permits a year. Each permits say that it’s OK to export a specimen without harming the population. It is data driven. CITES data can help understanding outliers and noticing trends. There are issues of ontologies, schema, quality etc. between the signatories – similar to environmental information. They would like to track what happen to the species across the world. They are thinking about a standard about all the transactions with specimen which will create huge amount of data. Even dealing with illegal poaching and protection of animals, there is a need for interoperable data.

Discussion: Data Shift for citizen generated data for SDG goals. Is there data that is already used? How we are going to integrate data against other types of data? We risk filtering citizen science data out because it follow different framework. Rhind – statisticians are concerned about citizen science data, and will take traditional view, and not use the data. There is a need to have quality assurance not just at the end. The management of indicators and their standards will require inclusion of suitable data. Marcos ask what is considered citizen science data? e.g. reporting of data by citizens is used in CITES and there are things to learn – how the quality of the data can be integrated with traditional process that enforcement agencies use. Science is not just data collection and analysis, such as  and multiple people can analyse information. Katherine talked about crowdsourcing – e.g. reporting of trees in certain cities  so there is also dialogue of deciding which trees to plant. Ingrid – disagree that data collection on its own is not science. Nawal – doing projects with schools about energy, which open participation in science. Rhind – raised the issue of the need for huge data repository and the question if governments are ready to invest. Gurney – need to coordinate multiple groups and organisations that are dealing with data organisations. There is a huge shortage of people in environmental science with advanced computing skills.

wpid-wp-1444166132788.jpgThe second session that I attended explored Building knowledge for healthy lives opened by Jacqueline McGlade – the context of data need to focus on the SDGs, and health is underpinning more goals then environmental issues. UNEP Live is aimed to allow access UN data – from country data, to citizen science data – so it can be found. The panel will explore many relations to health: climate change, and its impact on people’s life and health. heatwaves and issues of vulnerability to extreme events. Over 40 countries want to use the new air quality monitoring that UNEP developed, including the community in Kibera.

wpid-wp-1444166114783.jpgHayat Sindi is the CEO of i2Institute, exploring social innovations. Our ignorance about the world is profound. We are teaching children about foundation theories without questioning science heroes and theories, as if things are static. We are elevating ideas from the past and don’t question them. We ignore the evidence. The fuel for science is observation. We need to continue and create technology to improve life. Social innovation is important – and she learn it from diagnostic for all (DFA) from MIT. The DFA is low cost, portable, easy to use and safely disposable. The full potential of social innovation is not fulfilled. True scientists need to talk with people, understand their need, and work with them

Maria Neira (WHO) – all the SDGs are linked to health. A core question is what are the environmental determinants of health. Climate change, air quality – all these are part of addressing health and wellbeing. Need to provide evidence based guidelines, and the WHO also promote health impact assessment for major development projects. There are different sectors – housing, access to water, electricity – some healthcare facility lack access to reliable source of energy. Air pollution is a major issue that the WHO recognise as a challenge – killing 7m people a year. With air quality we don’t have a choice with a warning like we do with tobacco. The WHO offering indicators who offer that the access to energy require to measure exposure to air pollution. There is a call for strong collaboration with other organisation. There is a global platform on air quality and health that is being developed. Aim to enhance estimation of the impacts from air quality.

Joni Seager (GGEO coordinating lead author) talking about gender and global environmental outlook. She looks at how gender is not included in health and environmental data. First example – collect gender data and then hide it. Gender analysis can provide better information can help in decision making and policy formation.  Second method – dealing with households – they don’t have agency in education, access to car or food security, but in reality there is no evidence that food security is household level attribute – men and women have different experience of coping strategies – significant different between men or women. Household data is the view of the men and not the real information. Household data make women especially invisible. There are also cases where data is not collected. In some areas – e.g. sanitation, information is not collected. If we building knowledge for healthy life, we should ask who’s knowledge and who’s life?

Parrys Raines (Climate Girl) grown in Australia and want to protect the environment – heard about climate change as 6 years old and then seek to research and learn about the data – information is not accessible to young girls. She built close relationships with UNEP. There are different impacts on young people. She is also sharing information about air quality and pollution to allow people to include youth in the discussion and solutions. Youth need to be seen as a resource across different levels – sharing generation, global thinking. There is need for intergenerational communication – critical. knowledge of data is critical for the 21st century. Need organisations to go out and support youth – from mentoring to monetary support.

wpid-wp-1444166106561.jpgIman Nuwayhid talking about the health and ecological sustainability in the Arab world. There are many Millennium Development Goals MDGs that have been achieved, but most of the countries fell short of achieving them. In ecological sustainability, the picture is gloomy in the Arab world – many countries don’t have access to water. Demand for food is beyond the capacity of the region to produce. Population is expected to double in next 30 years. Poorer countries have high fertility – lots of displacement: war, economic and environmental. Development – there are striking inequities in the region – some of the wealthiest countries and the poorest countries in the world. Distribution of water need to consider which sector should use it. In comparison of health vs military expenditure, the Arab world spend much more on military than on health. There is interaction between environment, population and development. The region ecological footprint is highest and increasing. There are also issues of political instability that can be caused by environmental stresses. Displacement of people between countries create new stresses and question the value of state based analysis. Uncertainty is a major context for the region and science in general.

Discussion: the air quality issue – monitoring is not enough without understanding the toxicity, dispersion. Air pollution are impacted also by activities such as stone quarries. Need to balance monitoring efforts with accuracy and the costs of acting. Need to develop models and methods to think about it’s use. Some urban area of light and noise have also impacts not just on death but on quality of life and mental problems.

Two side events of interest run in parallel:

wpid-wp-1444166098477.jpgThe European Environmental Bureau presented a side event on collaborative research and activist knowledge on environmental justice. Pressure on resources mean extractive industries operate in the south with the outcomes used in the North. There is an increased level of conflicts in the south. The EJOLT project is a network of 23 partners in 23 countries. It’s collaborative research of scientists, grass roots organisations, NGOs and legal organisations. They had a whole set results. A visible result is the Atlas of environmental justice. There is plenty to say about citizen science and how important is that information come from people who are closed to the ground. They work with team in ecological economics, that created a moderated process for collecting and sharing information. The atlas allow to look at information according to different categories, and this is link to stories about the conflict and it’s history – as well as further details about it. The atlas is a tool to map conflicts but also to try and resolve them. The EEB see the atlas as an ongoing work and they want to continue and develop sources of information and reporting. Updating and maintaining the tool is a challenge that the organisation face.

At the same time, the best practice guidelines Putting Principle 10 into action was launched, building on the experience from Aarhus guide, there are plenty of case studies and information and it will be available at on the UNEP website

wpid-wp-1444166160281.jpgThe gala dinner included an award to the sensable city lab project in Singapore, demonstrating the development of personalise travel plans that can help avoiding pollution and based on 30-40 participants who collected data using cheap sensors.

Spatial Data Infrastructures, Crowdsourcing and VGI

The Spatial Data Infrastructure Magazine ( is a relatively new e-zine dedicated to the development of spatial  data infrastructures around the world. Roger Longhorn, the editor of the magazine, conducted an email interview with me, which is now published.

In the interview, we are covering the problematic terminology used to describe a wider range of activities; the need to consider social and technical aspects as well as goals of the participants; and, of course, the role of the information that is produced through crowdsourcing, citizen science, VGI with spatial data infrastructures.

The full interview can be found here.