A CRI-Muséum national d’Histoire naturelle workshop: created by Anshu (CRI long term fellow) and Simon (MNHN), from a meeting at the Galaxy community in Freiburg. I joined the design process and it was structured so the museum and the CRI present the systems that are being developed, with a scope for a discussion about lessons and collaboration. Here are the details of the workshop on the CRI website. These are the rough notes from the workshop.
Ariel Lindner – since the first major grant of the CRI (Citizen Cyberlab), there is an interest at the CRI in digital platforms for engagement. At the same time, they received a grant to innovate in education, and since then the CRI has become a centre for learning sciences and research with a link between learning, digital sciences, and life sciences. The principles are mentorship/empowerment, the right to err, and share. For CRI, open science means transparency and collaboration. Few of the important things for the day: gaps – distances between public and research which can grow and there is distrust, but on the other hand, kids are going to the street with an issue that is scientific. There are digital gaps, instrumentation in labs that are more complex and not available within the scientific community. We need to consider how we address the gaps – how a collaborative approach can help us to progress.
I covered the ExCiteS platforms and some of my experience from different collaborative platforms that we developed in ExCiteS. The slides are provided below.
Romain Julliard: citizen science: [
Big] quality data and [ Artificial] collective intelligence. The museum created over 15 years of experience, over 15 projects, with over 15,000 active participants a year. All part of the National Museum of Natural History role to the monitoring of biodiversity through citizen science. They see the projects as volunteers, scientific experts and NGOs facilitators. There are projects such as Spipoll which is the photography of insects pollinators – it is difficult to take a picture: quite challenging. The. A positive correlation between longevity of engagement and data quality. Finding the zone of flow as in computer game. Identify the skills that are required from the participants and communicate with them. The second lesson is the importance of the social platform and communication among the project participants to improve data quality control – participants are “policing” each other and guide the process of improvement of data quality. There is a comparative study that demonstrated that the visibility of data and the ability of participants to learn from each other is critical in term of following the protocol and producing relevant data. They learned that making data visible by all allows imitation and more homogenous data. Comments and discussion allow advice and help and quality control. There is also an improvement that is made by the contributor through versioning. There are differences to textbook statements: e.g. that data observations should be independent of each other, that there is a need to train participants in advance. They recommend imitation, allow participants to engage with each other and have shared a part in the QA. The project 65 Millions d’observateurs is a project with major funding and they are creating a common system for data collection. They have a common approach across projects – they are currently working on shared infrastructure for citizen science projects. One project is an open observatory for all species with over 146 different sub-projects. They are creating a new service unit MOSaic with Sorbonne to provide ongoing skils on technology for citizen science, with over 15 people covering a range of skills.
Simon Bénateau / Galaxy-Bricks: Toward collaborative data analysis – creating tools for analysing the data. The tool is aiming to allow share and make errs, and the aim to create communities. The citizen science is diverse – from high schools students to experts, working on environmental issues and on organisms. Some people with very little knowledge to quite a high level of expertise. The process includes in the network that Museum that works through protocols and data with participants. There is also for researchers and partners in the scientific community. It allows for new ways to participate and ask questions of the data. They also want to help in teaching the scientific approach, and data literacy. Choosing Galaxy mean that there is an existing development community, they support sharing of the methodology, it is FAIR, open-source, and even provide access to high-performance computing. Their aim is to simplify the UI and allow to simplify the process of constructing an analysis workflow. Using Scratch which is a development of an analysis process that is suitable for learning. The process includes following the structure of scientific research: setting your research question, import data, process data, visualise, carry out statistical tests, and reach conclusions.
Eric Cherel: The Learning Planet – the team at CRI trying to build tools that can help a model campus digital infrastructure – from tactile information screens and other tools that can be used elsewhere. There are learning tools that are supposed to be empowering the community. The project system on the CRI is used to present the project – who you are working with, what you are working on, linking to different tools. The tools that are used to create descriptions of projects: from small to large and help to relate projects. The global project WeLearn is to catalyse learning. currently a browser extension – when you come across it, you mark a source as a learning source. The system tries to extract the concept from the page, but also with crowdsourcing and it creates a global map (currently in French and English). It also creates a profile of the learner, so it might be able to match learners with the material. A lot of potential to map learning resources on a massive scale. They use cartography of concepts as a way to present to people their topics and learning. They use Wikipedia to train an ML model and analysing a way to extract concepts. They work with people from data4good who helped. Linking to EdTech companies to share ontologies and abilities to manage concepts. Integrating the use of smartphone can allow capturing of books and other not online learning resources and events. Aim to add more information to support reflexivity, recommendation, self-documented learning. Hope to reach out to EdTechm Wikipedia and open science platform.
Anshu Bhardwaj: Collaborative Tools to Accelerate Infectious Disease Research. The projects that she aims at are researchers, undergrad, industry – they will have some knowledge in the area before joining the project. In particular, she works on drug discovery. TB is an example of the issue with antibiotic resistance. Drug discovery is a complex, risky process with a high attrition rate. It takes 12-15 years from idea to drug and it is very expensive. There is a need for a wide range of skills. Within the pharma industry is that failures are not shared. Within an open-source drug discovery information and failures are shared and allowing learning. The open innovation model allows for creating a collaborative platform. Sysborg 2.0 – point of contact for idea, data, result and peer-review platform that allow for improvement. It allows a project management system, a social network to find peers. There are 13 functions and a social network type page. There is also a need to manage micro-attribution – to allow recognising small contributions. They created the portal from a range of open-source tools – Galaxy, DoProject, Moodle, etc. It includes collaboration with Infosys because of the technical complexity of developing such a project. On each project, they have developed metadata that is recorded in the system, but they created a flat hierarchy that allows anyone to update information with version control in case that people changed information that the project manager wants to change. They also have an OSDDCHem – and open chemistry initiative and that because of the complexity of following compounds as they go through the process. The system also helped in recording the structures and the molecules and different diagrams and putting diagrams in the style of chemistry communication. They have seen self-organisation of groups of students and also been able to analyse 45,000 publications. So far, they integrated 84 PIs with 88 projects and identified 11 compounds that can lead to drugs.
Marc Santolini & Thomas Landrain: Just One Giant Lab – learning and solving together. JOGL is about opening up the process of involvement in research and designing projects to people outside academia. It also links itself to the SDGs. The background to it is the experience of an open laboratory in Paris by Thomas (La Paillasse), but to get out of the physical space and collaboration. The next stage was to create collaboration online in epidemiological research (with support from Roche). An open science platform can bring people on a level playing way – from specialists, data scientists, patients etc. There are many problems that are not suitable for business problem-solving. Many don’t have such an opportunity. We need to consider the agile space of communities that don’t sustain their involvement but need to document and pass their experience forward. The challenge is that we have – about 10m active contributors to science, but 1 billion people with higher education. We need researchers without being within the formal research system. The existing collaborative research systems (Academia.edu researchgate…) are locking data and output and work by exploiting the vanity of contributors, not on collaboration. The idea of Jogl is that research/entrepreneurs/civil servants/activists might have their own problems that they need to solve, and on the other hand, there are students, patients, citizens that can contribute and build experience through participation in real projects.
Marc – there is a growth in science: increase collaboration and publication. No one can be in control of an area, so need to have designed serendipity (from Michael Nielsen). They look at team success, science innovation, open-source community, and collaborative learning. iGEM is a synthbio competition of over 300 teams, everything is on a wiki lab book network. The analysis looked at features that can help in understanding the competition, for example, team size, experience, mentorship but also with a network analysis. There is a collaboration core that can predict success.
Bastian Greshake Tzovaras: OpenHumans – sharing very personal data to use for research in a way that protects our privacy. The idea is that there is one system that stores the data safely and securely: GPS location, DNA data, Google Search History and Tweets. The first thing that it allows is analysing the data with notebooks of research that is coming out of it (predict eye colour on 23and Me data – it can allow you to try and run shared open notebook on your data without sharing it. The notebooks just share analysis and not the data. There are also projects that are using the data. An example is Dana Lewis insulin pumps that are using information about continuous glucose monitoring (nightscout) with patients controlling their data. Another example is nobism which is working on cluster headaches – they share data with code academy that know how to analyse the data. some of the reports by the students are shared and patient-led experiments. There are big issues of governance and trust. The OpenHumans foundation is a not-for-profit. Community is participating in the approval of a project which is proposed on the system. The community discussed it for a long time. The community is also asked to participate in the nomination of the board by anyone in the community. There is some mechanism to deal with the community seats
Valerie Lerouyer: BioLab, a future collaborative and experimental space at the Cite de Sciences et Industrie. Biolab should allow linking people to biology and the environment. Aiming for partnership with INRA towards research on soil and fermentation. The aim is to help with understanding the ecological transition. Aim for a different audience – children, adults. They want people to discover the microscopic world, and conduct collaborative about ecological transition and set participatory projects. The aim is to create a dynamic process and that is an issue with communication – the central aspects of the plan is as an entry to the right to dialogue, to share the results, to research, to find out out about things – create. They are going to explore living organisms in the part and the canal in different ecosystems. and ask the public to sample from their gardens and their areas. Focus on microbiology and biotechnologies and developing partnerships with secondary schools. Thinking about DIY – e.g. fermentation which is impossible to do in a lab (e.g. Kefir) to collect observations from different places. The exhibit will open in April.
Anirudh Krishnakumar: Dynamic Digital Drivers for Open Collaborative Science – MindLogger is a data collection platform that is aimed to build apps for citizen science without any programming. Allowing different data collection: a survey that allows people to create different response option, collecting different types of information (audio, video) and sensors features. It provides different elements – markdown text, slider, date, time range, table counter. Allowing people to give information in different ways – e.g. a set of fields that allow data entry. There is an option of active geolocation but actively elected by participants. They want to provide support with a wider library of citizen science projects – so if someone created a survey, someone else can pick it up. There is a thought about integrating MindLonger with ETH Zurich/ Citizen Cyberlab SDG toolkit. They would like to see different use cases and experimentation with the tool.
Joel Chevrier: Look at your hand when you write. Recently started research neuromotor in handwriting in children. Joel is using sensors – the interest in how you can measure movement with accelerometers and some examples of assessing movement and understanding movements. You can teach the system on different gestures, and the system is learning the link between colour and letter. The system is linked to Centre Pompidou. The fact that we can work with devices can also help in providing more accuracy to the assessment of the way people are moving (e.g. for patients with motoric issues). Research questions include the degree in which we can use movement and monitoring of grasping actions that allow us to understand the handwriting of children.
Some general insights: use of open source library is valuable, and there is a need to pay special attention to software packages that are used outside your discipline, but then also consider where the knowledge on how to use it will come from. There is a clear need for a community manager and someone who will continue to encourage activities with the system. OpenHumans is a good example that is based on minimal development. Use of APIs is a good way to interact and not on integration and complex connections.
The workshop was supported by my short term fellowship at the CRI in Paris.