New paper: The epistemology(s) of volunteered geographic information: a critique

Considering how long Reneé Sieber  (McGill University) and I know each other, and working in similar areas (participatory GIS, participatory geoweb, open data, socio-technical aspects of GIS, environmental information), I’m very pleased that a collaborative paper that we developed together is finally published.

The paper ‘The epistemology(s) of volunteered geographic information: a critique‘ took some time to evolve. We started jotting ideas in late 2011, and slowly developed the paper until it was ready, after several rounds of peer review, for publication in early 2014, but various delays led to its publication only now. What is pleasing is that the long development time did not reduced the paper relevancy – we hope! (we kept updating it as we went along). Because the paper is looking at philosophical aspects of GIScience, we needed periods of reflection and re-reading to make sure that the whole paper come together, and I’m pleased with the way ideas are presented and discussed in it. Now that it’s out, we will need to wait and see how it will be received.

The abstract of the paper is:

Numerous exegeses have been written about the epistemologies of volunteered geographic information (VGI). We contend that VGI is itself a socially constructed epistemology crafted in the discipline of geography, which when re-examined, does not sit comfortably with either GIScience or critical GIS scholarship. Using insights from Albert Borgmann’s philosophy of technology we offer a critique that, rather than appreciating the contours of this new form of data, truth appears to derive from traditional analytic views of information found within GIScience. This is assisted by structures that enable VGI to be treated as independent of the process that led to its creation. Allusions to individual emancipation further hamper VGI and problematise participatory practices in mapping/geospatial technologies (e.g. public participation geographic information systems). The paper concludes with implications of this epistemological turn and prescriptions for designing systems and advancing the field to ensure nuanced views of participation within the core conceptualisation of VGI.

The paper is open access (so anyone can download it) and it is available in the Geo website . 

Environmental Citizen Science overview and interview with Tom Wakeford

This short video (6 minutes) give an introduction to the findings from a recent report on environmental citizen science and discussion with Tom Wakeford (Coventry University) about core aspects of citizen science and its potential in terms of policy, especially when it relates to environmental issues. The report can be found on the European Commission website, and is part of the work of the Science Communication Unit at the University of the West of England.
I was very pleased to see that my classification of levels of engagement in citizen science appear in this video (and in the report).

Citizen Cyberlab Summit (day 2)

DSCN1165The second day of the Citizen Cyberlab Summit followed the same pattern of the first day: Two half day sessions, in each one short presentations from guest speakers from outside the project consortium, followed by two demonstrations of specific platform, tool, pilot or learning, and ending with discussion in groups, which were then shared back.

The first session started with History of Citizen Sciences – Bruno Strasser (Uni Geneva) – looking at both practical citizen science and the way it is integrated into the history of science. The Bioscope is a place in Geneva that allowing different public facing activities in the medical and life science: biodiversity, genetic research etc. They are developing new ways of doing microscopy – a microscope which is sharing the imagery with the whole room so it is seen on devices and on turning the microscope from solitary experience to shared one. They are involved in biodiversity research that is aimed to bar-coding DNA of different insects and animals. People collect data, extract DNA and sequence it, and then share it in a national database. Another device that they are using is a simple add-on that turns a smartphone can be turned into powerful macro camera, so children can share images on instagram with bioscope hashtag. They also do ‘Sushi night’ where they tell people what fish you ate if at all…
This link to a European Research Council (ERC) project  – the rise of citizen sciences – on the history of the movement. Is there something like ‘citizen sciences’? From history of science perspective, in the early 20c the amateur scientist is passing and professionals are replacing it. He use a definition of citizen science as amateurs producing scientific knowledge – he is not interested in doing science without the production of knowledge. He noted that there are a lot of names that are used in citizen science research. In particular, the project focus is on experimental sciences – and that because of the laboratory revolution of the 1930s which dominated the 20th century. The lab science created the divide between the sciences and the public (Frankenstein as a pivotal imagery is relevant here). Science popularisation was trying to bridge the gap to the public, but the rise in experimental sciences was coupled with decline of public participation. His classification looks at DIYbio to volunteer computing – identifying observers, analysers etc. and how they become authors of scientific papers. Citizen science is taken by the shift in science policy to science with and for society. Interest in the promises that are attached to it: scientific, educational (learning more about science) and political (more democratic). It’s interesting because it’s an answer to ‘big data’, to the contract of science and society, expertise, participation and democratisation. The difference is demonstrated in the French response following Chernobyl in 1986, with presentation by a leading scientists in France that the particle will stop at the border of France, compared that to Deep Horizon in 2010 with participatory mapping through public lab activities that ‘tell a different story’. In the project, there are 4 core research question: how citizen science transform the relationship between science and society? who are the participants in the ‘citizen sciences’ – we have some demographic data, but no big picture – collective biography of people who are involved in it. Next, what is the ‘moral economies’ that sustain the citizen sciences? such as the give and take that people get out of project and what they want. Motivations and rewards. Finally, how do citizen sciences impact the production of knowledge? What is possible and what is not. He plan to use approaches from digital humanities process. He will build up the database about the area of citizen science, and look at Europe, US and Asia. He is considering how to run it as participatory project. Issues of moral economies are demonstrated in the BOINC use in commercial project. 

Lifelong learning & DIY AFM – En-Te Hwu (Edwin) from Academia Sinica, Taiwan). There are different ways of doing microscopy at different scales – in the past 100 years, we have the concept of seeing is believing, but what about things that we can’t see because of the focused light of the microscope – e.g. under 1 micron. This is possible with scanning electron microscope which costs 500K to 2M USD, and can use only conductive samples, which require manipulation of the sample. The Atomic Force Microscope (AFM) is more affordable 50K to 500K USD but still out of reach to many. This can be used to examine nanofeatures – e.g. carbon nanotubes – we are starting to have higher time and spatial resolution with the more advanced systems. Since 2013, the LEGO2NANO project started – using the DVD head to monitor the prob and other parts to make the AFM affordable. They put an instructable prototype that was mentioned by the press and they called it DIY AFM. They created an augmented reality tool to guide people how to put the device together, and it can be assembled by early high school students – moving from the clean room to the class room.  The tool is being used to look at leafs, CDs – area of 8×8 microns and more. The AFM data can be used with 3D printing – they run a summer school in 2015 and now they have a link to LEGO foundation. They are going through a process of reinventing the DIY AFM, because of patenting and intellectual property rights (IPR) – there is a need to rethink how to do it. They started to rethink the scanner, the control and other parts. They share the development process (using building process platform of MIT media lab). There is a specific application of using the AFM for measuring air pollution at PM2.5. using a DVD – exposing the DVD by removing the protection layer, exposing it for a period of time and then bringing it and measuring the results. They combined the measurements to crowdcrafting for analysis. The concept behind the AFM is done by using LEGO parts, and scanning the Lego points as a demonstration, so students can understand the process. 

wpid-wp-1442566370890.jpgThe morning session included two demonstrations. First, Creativity in Citizen Cyberscience – Charlene Jennett  (UCLIC, UCL) – Charlene is interested in psychological aspects of HCI. Creativity is a challenge in the field of psychology. Different ideas of what is creativity – one view is that it’s about eureka moment as demonstrated in Foldit breakthrough. However, an alternative is to notice everyday creativity of doing thing that are different, or not thought off original. In cyberlab, we are looking at different projects that use technologies and different context. In the first year, the team run interviews with BOINC, Eyewire, transcribe Bentham, Bat Detective, Zooniverse and Mapping for Change – a wide range of citizen science projects. They found many examples  – volunteers drawing pictures of the ships that they were transcribing in Old Weather, or identifying the Green Peas in Galaxy zoo which was a new type of galaxy. There are also creation of chatbots about their work -e.g. in EyeWire to answer questions, visualisation of information, creating dictionaries and further information. The finding showed that the link was about motivation leading to creativity to help the community or the project. They created the model of linking motivation, learning through participation, and volunteer identity that lead to creativity. The tips for projects include: feedback on project progress at individual and project level, having regular communication – forum and social media, community events – e.g. competitions in BOINC, and role management – if you can see someone is doing well, then encourage them to take more responsibility. The looked at the different pilots of Cyberlab – GeoTag-X, Virtual Atom Smasher, Synthetic Biology through iGEM and Extreme Citizen Science. They interview 100 volunteers. Preliminary results – in GeoTag-X, the design of the app is seen as the creative part, while for the analysts there are some of the harder tasks – e.g. the georeferencing of images and sharing techniques which lead to creative solutions. In the iGEM case they’ve seen people develop games and video. in the ExCiteS cases, there is DIY and writing of blog posts and participants being expressive about their own work. There are examples of people creating t-Shirt, or creating maps that are appropriate for their needs.They are asking questions about other projects and how to design for creativity. It is interesting to compare the results of the project to the definition of creativity in the original call for the project. The cyberlab project is opening up questions about creativity more than answering them. 

wpid-wp-1442679548581.jpgPreliminary Results from creativity and learning survey – Laure Kloetzer (university of Geneva). One of the aims of Citizen Cyberlab was to look at different aspects of creativity. The project provided a lot of information from a questionnaire about learning and creativity in citizen science. The general design of the questionnaire was to learn the learning outcomes. Need to remember that out of the whole population, small group participate in citizen science – and within each project, there is a tiny group of people that do most of the work (down to 16 in Transcribed Bentham) and the question of how people turn from the majority, who do very little work to highly active participants is unknown, yet. In Citizen Cyberlab we carried out interviews with participants in citizen science projects, which led to a typology of learning outcomes – which are lot wider than those that are usually expected or discussed in the literature – but they didn’t understand what people actually learn. The hypothesis is that people who engage with the community can learn more than those that doesn’t – the final questionnaire of the project try to quantify learning outcomes (informal learning in citizen science – ILICS survey). The questionnaire was tested in partial pilot. Sent to people in volunteer computing, volunteer thinking and others types. They had about 700 responses, and the analysis only started. Results – age group of participants is diverse from 20-70, but need to analyse it further according to projects. Gender – 2/3 male, third female, and 20% of people just have high school level of education, with 40% with master degree or more – large minority of people have university degree. They got people from 64 countries – US, UK, Germany and France are the main ones (the survey was translated to French). Science is important to most, and a passion for half, and integrated in their profession (25% of participants). Time per week – third of people spend less than 1 hour, and 70% spend 1-5 hours – so the questionnaire captured mostly active people. Results on learning – explore feeling, what people learn, how they learn and confidence (based on the typology from previous stages of the project). The results show that – people who say that they learn something to a lot, and most people accept that they learn on-topic knowledge (about the domain itself – 88%), scientific skills (80%), technological skills (61%), technical skills (58%), with political, collaboration skills and communication skills in about 50% of the cases. The how question – people learn most from project documentation (75%) but also by external resources (70%). Regarding social engagement, about 11% take part in the community, and for 61% it’s the first time in their life that they took such a role. There are different roles – translation, moderating forums with other things in the community that were not recognised in the questionnaire. 25% said that they met people online to share scientific interests – opportunity to share and meet new people. Learning dimensions and types of learners – some people feel that they learn quite a lot about various things, while others focus on specific types of learning. wpid-wp-1442679528037.jpgPrincipal Component Analysis show that learner types correlate with different forms of engagement – more time spent correlate to specific type of learner. There are different dimensions of learning that are not necessarily correlate. The cluster analysis show about 10 groups – people who learn a lot on-topic and about science with increase self-confidence. Second group learn on topic but not much confidence. Group 3, like 2 but less perception of learning. Group 4 don’t seem to learn much but prefer looking at resources. 5 learn somewhat esp about computers. 6 learn through other means. 7 learn by writing and communicating, collaborating and some science. 8 learn only about tools, but have general feeling of learning. 9 learn on topic but not transferable and 10 learn a lot on collaboration and communication – need to work more on this, but these are showing the results and the raw data will be shared in December. 

DSCN1160Following the presentation, the group discussion first explored examples of creativity from a range of projects. In crowdcrafting, when people are not active for a month, they get email with telling them that they will be deleted – one participant created activities that link to the project – e.g. tweeting from a transcriptions from WW I exactly 100 years after it happen. In Cornell Lab of Ornithology, volunteers suggest new protocols and tasks about the project – new ways of modifying things. In the games of ScienceatHome are targeted specifically to explore when problem solving become creative – using the tools and explaining to the researchers how they solve issues. In WCG one volunteered that create graphics from the API that other volunteers use and expect now to see it as part of the project. There is a challenge to project coordinators what to do with such volunteers – should they be part of the core project?
Next, there are questions about roles – giving the end users enough possibilities is one option, while another way is to construct modularising choices, to allow people to combine them in different ways. In ScienceatHome they have decided to put people into specific modes so consciously changing activities. There is wide variety of participants – some want to be fairly passive and low involvement, while other might want to do much more. Also creativity can express itself in different forms, which are not always seem linked to the project. The learning from Citizen Cyberlab is that there isn’t simple way of linking creativity and capture it in computer software, but that you need organisational structure and most importantly, awareness to look out for it and foster it to help it develop. Having complementarity – e.g. bringing game people and science people to interact together is important to creativity. Another point is to consider is to what degree people progress across citizen science projects and type of activities – the example of that without the hackspace it was not possible to make things happen. So it’s volunteers + infrastructure and support that allow for creativity to happen. There are also risks – creating something that you didn’t know before – ignorance – in music there isn’t much risk, but in medical or synthetic biology there can be risks and need to ask if people are stopping their creativity when they see perceived risks.

wpid-wp-1442679513070.jpgThe final session of the summit was dedicated to Evaluation and Sustainability. Starting with The DEVISE project – Tina Philips (Cornell Lab of Ornithology). Tina is involved in the public engagement part of Cornell Lab of Ornithology . Starting from the work on the 2009 of the Public Participation in Scientific Research (PPSR) report – the finding from the CAISE project that scarcity of evaluations, higher engagement suggested deeper learning, and need for a more sensitive measures and lack of overall finding that relate to many projects. The DEVISE project (Developing, Validating, and Implementing Situated Evaluation Instruments) focused on evaluation in citizen science overall – identifying goals and outcomes, building professional opportunities for people in the field of informal learning, and creating a community of practice around this area. Evaluation is about improving the overall effectiveness of programmes and projects. Evaluation is different from research as it is trying to understand strengths and weaknesses of the specific case and is less about universal rules – it’s the localised learning that matter. In DEVISE, they particularly focused on individual learning outcomes. The project used literature review, interviews  with participants, project leaders and practitioners to understand their experience. They looked at a set of different theories of learning. This led to a framework for evaluating PPSR learning outcomes. The framework includes aspects such as interest in science & the environment, self efficacy, motivation, knowledge of the nature of science, skills of science inquiry, and behaviour & stewardship. They also develop scales – short surveys that allow to examine specific tools – e.g. survey about interest in science and nature or survey about self-efficacy for science. There is a user guide for project evaluators that allow to have plan, implement and share guidance. There is a logic model for evaluation that includes Inputs, activities, outputs, short-term and long-term impacts. It is important to note that out of these, usually short and long terms outcomes are not being evaluated. Tina’s research looked at citizen science engagement, and understand how they construct science identity. Together with Heidi Ballard, they looked at contributory, collaborative and co-created projects – including Nestwatch, CoCoRaHS, and Global Community Monitor. They had 83 interviews from low , medium and high contributors and information from project leaders. The data analysis is using qualitative analysis methods and tools (e.g. Nvivo). The interview asked about engagement and what keep participants involved and asking about memorable aspects of their research involvement. There are all sort of extra activities that people bring into interviews – in GCM people say ‘it completely changes the way that they respond to us and actually how much time they even give us because previously without that data, without something tangible’ – powerful experiences through science. The interviews that were coded show that data collection, communicating with others and learning protocols are very common learning outcomes. About two-third of interviewees are also involved in exploring the data, but smaller group analyse and interpret it. Majority of people came with high interest in science, apart of the people who are focused on local environmental issues of water or air quality. Lower engagers tend to feel less connected to the project – and some crave more social outlets. The participants have a strong understanding of citizen science and their role in it. Data transparency is both a barrier and facilitator – participants want to know what is done with their data. QA/QC is important personally and organisationally important. Participants are engaged in wide range of activities beyond the project itself. Group projects may have more impact than individual projects.
Following the presentation, the discussion explore the issue of data – people are concerned about how the data is used, and what is done with it even if they won’t analyse it themselves. In eBird, you can get your raw data, and checking the people that used the data there is the issue of the level in which those who download the data understand how to use it in an appropriate way. 

wpid-wp-1442679499689.jpgThe final guest presentation was Agroecology as citizen science – Peter Hanappe (Sony Computer Science Lab, Paris).  Peter is interested in sustainability, and in previous projects he was involved in working on accessibility issues for people who use wheelchair, the development of NoiseTube, or porting ClimatePrediction BOINC framework to PlayStation, and reducing energy consumption in volunteer computing. In his current work he looks at sustainability in food systems. Agroecology is the science of sustainable agriculture, through reducing reliance on external inputs – trying to design productive ecosystems that produce food. Core issues include soil health and biodiversity, with different ways of implementing systems that will keep them productive. The standard methods of agriculture don’t apply, and need to understand local conditions and the practice of agroecology is very knowledge intensive. Best practices are not always studied scientifically – with many farms in the world that are small (below 2 hectares, 475 millions farms across the world). There are more than 100M households around the world that grow food.  This provide the opportunity for citizen science – each season can be seen as an experiment, with engaging more people and asking them to share information so the knowledge slowly develops to provide all the needed details. Part of his aim is to develop new, free tools and instruments to facilitate the study of agroecology. This can be a basic set with information about temperature and humidity or more complex. The idea to have local community and remote community that share information on a wiki to learn how to improve. Together with a group of enthusiasts that he recruited in Paris, they run CitizenSeeds where they tried different seeds in a systematic way – for example, with a fixed calendar of planting and capturing information People took images and shared information online. The information included how much sunlight plants get and how much humidity the soil have. on they can see information in a calendar form. They had 80 participants this year. Opportunity for citizen science – challenges include community building, figuring out how much of it is documentation of what worked, compared to experimentation – what are the right way to carry out simple, relevant, reproducible experiments. Also if there is focus on soil health, we need multi-year experiments.  

I opened the last two Demonstrations of the session with a description of the 
Extreme Citizen Science pilots – starting similarly to the first presentation of the day, it is useful to notice the three major period in science (with regard to public participation). First, the early period of science when you needed to be wealthy to participate – although there are examples like Mary Anning, who. for gender, religion and class reasons was not accepted within the emerging scientific establishment as an equal, and it is justified to describe her as citizen scientists, although in full time capacity. However, she’s the exception that point to the rule. More generally, not only science was understood by few, but also the general population had very limited literacy, so it was difficult to engage with them in joint projects. During the period of professional science, there are a whole host of examples for volunteer data collection – from phenology to meteorology and more. As science became more professional, the role of volunteered diminished, and scientists looked for automatic sensors as more reliable mean to collect information. At the same time, until the late 20th century, most of the population had limited education – up to high school mostly, so the tasks that they were asked to perform were limited to data collection. In the last ten years, there are many more people with higher education – especially in industrialised societies, and that is part of the opening of citizen science that we see now. They can participate much more deeply in projects.
Yet, with all these advances, citizen science is still mostly about data collection and basic analysis, and also targeted at the higher levels of education within the population. Therefore, Extreme Citizen Science is about the extremities of citizen science practice – engage people in the whole scientific process, allow them to shape data collection protocols, collect and analyse the data, and use it in ways that suit their goals. It is also important to engage people from all levels of literacy, and to extend it geographically across the world.
The Extreme Citizen Science (ExCiteS) group is developing methodologies that are aimed at facilitating this vision. Tool like GeoKey, which is part of the Cyberlab project, facilitate community control over the data and decision what information is shared and with whom. Community Maps, which are based on GeoKey are way to allow community data collection and visualisation, although there is also a link to EpiCollect, so mobile data collection is possible and then GeoKey managed the information.
These tools can be used for community air quality monitoring, using affordable and accessible methods (diffusion tubes and borrowed black carbon monitors), but also the potential of creating a system that will be suitable for people with low level of literacy. Another pilot project that was carried out in Cyberlab included playshops and exploration of scientific concepts through engagement and play. This also include techniques from Public Lab such as kite and balloon mapping, with potential of linking the outputs to community maps through GeoKey. 

 Finally, CCL Tracker was presented by Jose Luis Fernandez-Marquez (CERN) – the motivations to create the CCL tracker is the need to understand more about participants in citizen cyberscience projects and what they learn. Usual web analytics  provide information about who is visiting the site, how they are visiting and what they are doing. Tools like Google analytics – are not measuring what people do on websites. We want to understand how the 20% of the users doing 80% of the work in citizen cyberscience projects and that require much more information. Using an example of Google Analytics from volunteer computing project, we can see about 16K sessions, 8000 users, from 108 countries and 400 sessions per day. Can see that most are males – we can tell which route they arrived to the website, etc. CCL tracker help to understand the actions performed in the site and measure participants contribution. Need to be able to make the analytics data public and create advanced data aggregation – clustering it so it is not disclosing unwanted details about participants. CCL tracker library work together with Google tag manager and Google analytics. There is also Google Super Proxy to share the information. 

Citizen Cyberlab Summit (day 1)

wpid-wp-1442503181050.jpgThe Citizen Cyberlab Summit is the final event of the Citizen Cyberlab project. The name might sound grand, but the event itself was fairly intimate and focused, with about 40 participants from across the world. The aim of the event was to share the learning from the project and compare them to similar activities around the world. It also provided an opportunity to consider, with experts from different areas, the directions that the project partners should progress beyond the specific ‘deliverables’ (outcomes and outputs) of the project. The meeting was held in the Confucius institute of the University of Geneva which has a mission to improve scientific diplomacy and international links between researchers, so it was a suitable venue for the such international scientific meeting.

 Introduction to Citizen Cyberlab was provided by Ariel Lindner (UPD) who is the main project leader. He noted that the starting point of citizen cyberlab is that we know that people learn better by doing, and that working with the public is also beneficial for scientists – both for becoming aware of public concerns as well as the moral obligation to share the results of research with those who fund it.  The citizen cyberlab project, which is in its final months, was based on 3 parts – platforms, pilots, and tools. Platforms that are aimed at lowering the barriers for participation for scientists and citizens (computation and participation platforms). The platforms are tested through pilot projects, which are then evaluated for creativity and learning – exploring learning behaviour, creativity and community engagement. We aim to share the successful experiences but also the challenges that emerged through the various activities. In the computation platforms, we developed CitizenGrid is aimed to allow running cloud-based projects; RedWire, a new way to consider game design – creating an open source game engine with open game analytics (the ability to measure what people do with the games). Example of this was in the development of science games; GeoKey is the final platform, and it allow people to share their concerns and control information. The project pilots included Virtual Atom Smasher which is about learning particle physics and helping scientists; GeoTag-X at UNITAR helping in disaster response; SynBio4All which open up synthetic biology to wider audience – with games such as Hero Coli and a MOOC on DIY synthetic biology (through iGEM) – with activities around ‘the smell of us’ about the odour that people emit and identifying the bacteria that influence it. L’Oréal is interested in developing this research further; There are several Extreme Citizen Science pilots, too. The tools that were developed in the project included creativity tools such as to explore and develop ideas, monitoring learning (CCL-Tracker), and EpiCollect+ system to allow data collection for a wide range of projects.
Aspects of creativity and understanding what people learn are both complex tasks – understanding the learning had to be done on other communities in citizen science, finally there is specific effort on community engagement through social media and media outlets (YouTube and Audio).

The rest of the event was structured as follows: after two short presentations from guest speakers from outside the project consortium, two demonstrations of specific platform, tool, pilot or learning was followed, and the session ended with discussion in groups, which were then shared back. In all, the summit had 4 such sessions.

wpid-wp-1442502888908.jpgFollowing this introduction, two guests gave Short Talks, first about World Community Grid (WCG) – Juan Hindo (IBM). Juan provided details of WCG which is part of IBM corporate citizenship group. WCG is philanthropic programme that support participation in science through distributed computing to allow scientists to access large scale computing by using unused processing in computers and mobile devices. The projects can be ‘the biggest and most fundamentally important activities in labs’ according to researchers who participate in the programme. Examples of success include new solar materials from Harvard university researchers, with thousands of candidate materials. Other breakthroughs happened in childhood cancer research and computing for clean water that was led by Tshinghua University in China – exploring the use of nano-tubes for water filtration. WCG are promoting Open Science – ask researcher to make the data publicly available, focus on humanitarian research, real tangible science, with IBM support. Using the corporate ability, they get lots of attention in media. They try to engage volunteers as much as possible – they carried out an extensive volunteers study 2 years ago. Demographic – mostly man, technical background, 20-40, who usually volunteer for 5 years, and people join because they want to help science. Learning about the science is a reason to stay. People want to understand the impact of the computations that they perform – beyond just statics and asking information to be understandable. WCG are trying now to build a more diverse volunteer base, more approachable scientific content and articulating the value of contribution. They see opportunity to reach out to young people, women and they try to engage people through the story about the science, and ensuring people that the process is safe – evaluating experience and design to take a short time. They also want to leverage existing volunteers – they set up a recruitment competition for existing volunteers – that led to very few new people joined. They also do use of social media on Twitter, YouTube and Facebook. There is growing engagement with social media, but not enough conversion to volunteering. They also deal with layering of information with researchers, ask for consistent and regular updating on the research and give volunteer control over communication that they receive. Articulating contribution value is to highlight research stories – not just computations and number of volunteers and celebrating and promote scientific success – they lean on networks in IBM to share the word out. The campaign helped in doubling the registration rate to the system. They want to reach more volunteers, and they follow conversion rate – they are missing stories from volunteers and have a volunteer voice, remove barriers to entry, recruitment drive didn’t create. They want to expand research portfolio and want other areas that it can support. 

In the discussion that followed the important of IP, treating volunteers as individuals came up as a topic that worth exploring with volunteer computing project.

wpid-wp-1442566393048.jpgThe next presentation was Science@home –  by Jacob Sherson (University of Aarhus, Denmark). Jacob noted that in citizen science there are different difficulty level and opportunity to user innovation. In Science@home they are trying to extend the range of citizen science involvement with students. They are talking about the creativity research – trying to evaluate creativity with a positivist empirical framework – controlling different variables and evaluating creativity of output according to it. They run – with 3000 people participating in projects, with experiments ranging from cognitive science, to quantum physics, and business administration – and they have an interdisciplinary team from different areas of research to support the development of the system. An example for the type of project that they deal with is quantum computing – manipulations of electrons – they are sloshing around between states when moving them with laser beams. Using analogies to high school curriculum was useful way to engage participants and make it relevant to their studies. They have discovered that students can understand quantum physics in a phenomenological way through a game interface. They discover that gamers find areas of good region for solutions. The players localised area of the big parameters space – faster than computer simulation. They also studying the formation of strategies in people mind – Quantum Minds. With this programme, they are studying the process of learning the project and mastering it. They looked at the way to people who learn how to solve problems – to see if early performance help to predict the ability to learn the topic. Other games include trying to understand innovations in the Alien Game. They also have behavioural economics game about forming of groups. The educational part is about creativity – thinking of motivations for curriculum and fun with different resources. Game based education is assumed to improve the curriculum and can increase the motivation to learn. The general approach is to provide personalised online learning trajectories – identify types of students and learners and then correlate them and create personalised learning experience. Also want to train researchers to help them explore. 

The next part of the morning session were the 2 Demonstrations starting with EpiCollect – David Aanensen (Imperial College). EpiCollect was created to deal with infectious disease – who, what, where and when – getting the information about genetic make-up of diseases. They realised that there is a generic issue of metadata gathering and the tool evolved into generic forms collection and visualisation tool. The current use of EpiCollect includes a lot of projects in veterinary as GPS monitoring of animals is easier in terms of ethics. It was also used by the Food and Agricultural Organisation (FAO) to monitor the provision of food to communities in different parts of the world. Also used in education projects in Bath university in field courses (building on evolution megalab project to collect information about snails) with students building questionnaire based on the information sheets of the project. They are starting to build longitudinal data. There are projects that link EpiCollect to other systems – such as GeoKey and CartoDB for visualisation.  

Red Wire  was presented by Jesse Himmelstein (University Paris Descartes) -Red Wire is a platform that is aimed at reducing the barrier of creating games for citizen science through a mash-up approach – code and games are open access to encourage reuse. It use functional programming language approach – in a visual programming environment. They are taking metaphors from electronics. There are examples of games that student developed during recent summer schools and other activities. 

CitizenGrid was discussed by John Darlington (Imperial College, London). Citizen Grid is a platform that enables replicating projects on cloud computing, specifically for volunteer computing projects. It can allow unified support to volunteer computing – support for the scientists who are setting a project, but also to the volunteers who want to link to the project. The scientists can map their resources through creation of both client and server virtual machines and register the application. They demonstrated it with projects that also use games – allowing to install the application on local machines or cloud computing.   

wpid-wp-1442502824236.jpgIn the breakout groups, participants discussed the complexity of the platforms and what are the next steps to make them more accessible. For Epicollect, there are challenges of identifying who are the users – they the both the coordinators and the data collectors, and helping them in setting useful project is challenging, especially with the need for usability and user experience expertise. Dealing with usability and user experience is a challenge that is common to such projects. For RedWire, there is a need to help people who do not have any programming experience to develop games, so these are scientists and teachers. Maybe even gemify the game engine with credits to successful game designers who create components that can be remixed. For citizen grid, there is a need for examples of use cases, with currently Virtual Atom Smasher as the main demonstrator.

The afternoon session explored Pilot Projects. CERN@School – Becky Parker (Langton Star Centre) described how she developed, with her students and collaboration with scientists the ability to do science at school. The project is a demonstration how students and teachers can become part of the science community. The project started years ago with students contributing to astrophysics research. The school is involved in fundamental research, with a 17 years old student publishing scientific paper based on theoretical physics research problem that was presented to the students from professional scientists. Her students also put together to put an instrument to detect cosmic rays on the satellite TDS-1. They can see where is their experiment through visualisation over Google Maps that the students developed themselves. Students also created analysis tools for the data. Students can contribute to NASA research on the impact of cosmic rays on International Space Station staff. CERN@School also include experiment in collecting radiation reading which help to map background radiation in the UK (by students at 14-15). Through their work, they discovered that there aren’t many radiation reading in the ocean, and they will do that by mounting a radiation sensor to sea UAV. All this helps students to learn to be scientists. They created the monopole-quest project within the zooniverse projects. It is possible to get young people involved in large scale science projects. It also help to encourage science teachers and to ensure job satisfaction for teachers. The involvement of girls in the project also lead to more participation in science and engineering after school with the school having a disproportionate share of the number of young women who go to study such topics in the UK. – From Volunteers to Scientists – Michael Weber (Uni Marburg). Michael describe how volunteers turned to scientists in the area of volunteer computing. Rechenkraft started in 2005 with a forum dedicated to all distributed computing projects around the world, and sharing the information about them among German speaking volunteers. Projects are now being translated to other languages, too. This led to the creation of an organisation, which is now involved in many projects, including  volunteers also created monitoring programmes that indicate the process and provide statistics about contributions. They also have yearly face to face gathering of volunteers from across Germany and beyond, with results of creating their own data processing racks and other initiative. Started in electronic sports league but then realised that there are opportunities to assist scientists in developing new projects, and that led to Yoyo@home that will allow the community to help scientists in developing BOINC projects. They regularly participate in conferences and exhibitions to promote the opportunity to other people interested in technology, and they became part of Quake-catcher network. They receive significant press coverage – eventually the city of Marburg (Germany) offered the organisation physical pace that became the Hackspace of the city. Once there is a steady place, they created more sophisticated cluster computers. They also set up the WLAN in the local refugee camp. Finally, they also develop their own scientific project- RNA world which is completely internal project. They encountered problems with very large output files from simulations so they are learning about running distributed computing projects as scientists who use the results and not only as volunteers. They also starting to run different projects about tree health with data recording such as location, photo and plant material.   Similarly, they map protected flowers – all this on volunteer basis. They participate in the effort of developing citizen science strategy 2020 for Germany, and they would like funding to be available to average person so they can participate in projects. There is risk that citizen science will be co-opted by scientists – need to leave space for grass-roots initiatives. There are also barriers for publications. The need for lab results in addition to the simulation encouraged the creation of the wet lab. 

The last short guest talk came from Bernard Revaz who suggested to create Massive Multiplayer Online Science – using game environments like WoW (World of Warcraft) to do science. His aim is inject science into projects such as Eve online – at a given time there are 40,000 users, median age 35, with 50% with degree in science. In Eve online they design an element from the human protein atlas that the gamers will help to classify. The stakeholders in their discussion include scientists,  the gaming company and players and all are very positive about the prospect. In Eve online there are many communities – they are creating a new community of scientists so people join it voluntarily. Working on matching the science tasks to the game narrative and to the game reward system.

After these two guest talks, there were two Demos. 

wpid-wp-1442502761020.jpgFirst, Virtual Atom Smasher (VAS) – Ioannis Charalampidis (CERN) – the VAS is about the way CERN develop the science cycle -observe the situation, lead to theory by theoretical physicists and then carry out experiments to test them. The process includes computer simulations that are explored against experimental data. They are trying to adjust the models until the model reflect the results.VAS evolved from a project by  15 years old student in 2010, who managed to create the best fitting results of a simulation. The VAS is about real cutting edge science, but it is also very challenging and created a game (but don’t use the word game – it’s a simulation). The VAS use CitizenGrid and RedWire for the game and CCL tracker to understand the way people use the platform. The analytics show the impact of training to the desired flow of the game. The VAS combines exploration with opportunities for learning. 

Geotag-X – Eleanor Rusack (UNITAR). This is a platform to crowdsource the analysis of images in humanitarian crises. They usually use satellite imagery to deal with crises, but there are limitations to some images – roofs, clouds etc., and there is a need to know what is going on the ground. The idea is to harvest photos coming from disaster , then analyse them and share the knowledge. A lot of information in photos can be very useful – it’s possible to extract structural information and other details in the image. They got a workflow, who set projects, they then develop the structure of the processing and tutorials, and tools for photo collection tools (from Flickr, Twitter, EpiCollect and Chrome extension). The photos are added to the analysis pool. They have created a project to allow people deal with Yemeni Cultural Heritage at risk as  a result of the way that is happening there. The syste is mostly based on self learning. Geotagging photo is a challenging tasks. It’s a specially an area that need more work. The experts are professionals or academics in specific domain who can help people to design the process, while participants are coming from different backgrounds. They are recruiting people through SciStarter, Mozilla science etc. The keep in touch with online volunteer groups – people who come from SciStarter tend to stay. Digital volunteers also help a lot and they encourage volunteering through presentation, but most important are data sprints. They use evaluation of agreement between analysts – agreement show easy to agree. There is a range of responses to agreement across standard deviation: they identify 3 groups – easy (high  agreement, low standard deviation), mid (high std div and median agreement) and complex (low agreement, low std div). Analysis of images against these agreement level help to improve designs. The want to move the questions up the curve and how to train large number of analysts when project leaders have limited time? 

The follow up discussion explored improvements to VAS – such as integrating arts or linking a BOINC project that will contribute computing resources to the VAS. For Geotag-X, the discussion explored the issue of training – with ideas about involving volunteers in getting the training right, run virtual focus groups or exploring design aspects and collaborations between volunteers.

Data and the City workshop (day 2)

The second day of the Data and City Workshop (here are the notes from day 1) started with the session Data Models and the City.

Pouria Amirian started with Service Oriented Design and Polyglot Binding for Efficient Sharing and Analysing of Data in Cities. The starting point is that management of the city need data, and therefore technologies to handle data are necessary. In traditional pipeline, we start from sources, then using tools to move them to data warehouse, and then doing the analytics. The problems in the traditional approach is the size of data – the management of the data warehouse is very difficult, and need to deal with real-time data that need to answer very fast and finally new data types – from sensors, social media and cloud-born data that is happening outside the organisation. Therefore, it is imperative to stop moving data around but analyse them where they are. Big Data technologies aim to resolve these issues – e.g. from the development of Google distributed file system that led to Hadoop to similar technologies. Big Data relate to the technologies that are being used to manage and analyse it. The stack for managing big data include now over 40 projects to support different aspects of the governance, data management, analysis etc. Data Science is including many areas: statistics, machine learning, visualisation and so on – and no one expert can know all these areas (such expert exist as much as unicorns exist). There is interaction between data science researchers and domain experts and that is necessary for ensuring reasonable analysis. In the city context, these technologies can be used for different purposes – for example deciding on the allocation of bikes in the city using real-time information that include social media (Barcelona). We can think of data scientists as active actors, but there are also opportunities for citizen data scientists using tools and technologies to perform the analysis. Citizen data scientists need data and tools – such as visual analysis language (AzureML) that allow them to create models graphically and set a process in motion. Access to data is required to facilitate finding the data and accessing it – interoperability is important. Service oriented architecture (which use web services) is an enabling technology for this, and the current Open Geospatial Consortium (OGC) standards require some further development and changes to make them relevant to this environment. Different services can provided to different users with different needs [comment: but that increase in maintenance and complexity]. No single stack provides all the needs.

Next Mike Batty talked about Data about Cities: Redefining Big, Recasting Small (his paper is available here) – exploring how Big Data was always there: locations can be seen are bundles of interactions – flows in systems. However, visualisation of flows is very difficult, and make it challenging to understand the results, and check them. The core issue is that in N locations there are N^2 interactions, and the exponential growth with the growth of N is a continuing challenge in understanding and managing cities. In 1964, Brian Berry suggested a system on location, attributes and time – but temporal dimension was suppressed for a long time. With Big Data, the temporal dimension is becoming very important. An example of how understanding data is difficult is demonstrated with understanding travel flows – the more regions are included, the bigger the interaction matrix, but it is then difficult to show and make sense of all these interactions. Even trying to create scatter plots is complex and not helping to reveal much.

The final talk was from Jo Walsh titled Putting Out Data Fires; life with the OpenStreetMap Data Working Group (DWG) Jo noted that she’s talking from a position of volunteer in OSM, and recall that 10 years ago she gave a talk about technological determinism but not completely a utopian picture about cities , in which OpenStreetMap (OSM) was considered as part of the picture. Now, in order to review the current state of OSM activities relevant for her talk, she asked in the OSM mailing list for examples. She also highlighted that OSM is big, but it’s not Big Data- it can still fit to one PostGres installation. There is no anonymity in the system – you can find quite a lot about people from their activity and that is built into the system. There are all sort of projects that demonstrate how OSM data is relevant to cities – such as OSM building to create 3D building from the database, or use OSM in 3D modelling data such as DTM. OSM provide support for editing in the browser or with offline editor (JOSM). Importantly it’s not only a map, but OSM is also a database (like the new OSi database) – as can be shawn by running searches on the database from web interface. There are unexpected projects, such as custom clothing from maps, or Dressmap. More serious surprises are projects like the humanitarian OSM team and the Missing Maps projects – there are issues with the quality of the data, but also in the fact that mapping is imposed on an area that is not mapped from the outside, and some elements of colonial thinking in it (see Gwilym Eddes critique) . The InaSAFE project is an example of disaster modeling with OSM. In Poland, they extend the model to mark details of road areas and other details. All these are demonstrating that OSM is getting close to the next level of using geographic information, and there are current experimentations with it. Projects such as UTC of Mappa Marcia is linking OSM to transport simulations. Another activity is the use of historical maps – .
One of the roles that Jo play in OSM is part of the data working group, and she joined it following a discussion about diversity in OSM within the community. The DWG need some help, and their role is geodata thought police/Janitorial judicial service/social work arm of the volunteer fire force. DWG clean up messy imports, deal with vandalisms, but also deal with dispute resolutions. They are similar to volunteer fire service when something happens and you can see how the sys admins sparking into action to deal with an emerging issue. Example, someone from Ozbekistan saying that they found corruption with some new information, so you need to find out the changeset, asking people to annotate more, say what they are changing and why. OSM is self policing and self regulating – but different people have different ideas about what they are doing. For example, different groups see the view of what they want to do. There are also clashes between armchair mapping and surveying mappers – a discussion between someone who is doing things remotely, and the local person say that know the road and asking to change the editing of classification. DWG doesn’t have a legal basis, and some issues come up because of the global cases – so for example translated names that does not reflect local practices. There are tensions between commercial actors that do work on OSM compared to a normal volunteer mappers. OSM doesn’t have privileges over other users – so the DWG is recognised by the community and gathering authority through consensus.

The discussion that follows this session explored examples of OSM, there are conflicted areas such as Crimea nad other contested territories. Pouria explained that distributed computing in the current models, there are data nodes, and keeping the data static, but transferring the code instead of data. There is a growing bottleneck in network latency due to the amount of data. There are hierarchy of packaging system that you need to use in order to work with distributed web system, so tightening up code is an issue.
Rob – there are limited of Big Data such as hardware and software, as well as the analytics of information. The limits in which you can foster community when the size is very large and the organisation is managed by volunteers. Mike – the quality of big data is rather different in terms of its problem from traditional data, so while things are automated, making sense of it is difficult – e.g. tap in but without tap out in the Oyster data. The bigger the dataset, there might be bigger issues with it. The level of knowledge that we get is heterogeneity in time and transfer the focus to the routine. But evidence is important to policy making and making cases. Martijn – how to move the technical systems to allow the move to focal community practice? Mike – the transport modelling is based on promoting digital technology use by the funders, and it can be done for a specific place, and the question is who are the users? There is no clear view of who they are and there is wide variety, different users playing different roles – first, ‘policy analysts’ are the first users of models – they are domain experts who advise policy people. less thinking of informed citizens. How people react to big infrastructure projects – the articulations of the policy is different from what is coming out of the models. there are projects who got open and closed mandate. Jo – OSM got a tradition of mapping parties are bringing people together, and it need a critical mass already there – and how to bootstrap this process, such as how to support a single mapper in Houston, Texas. For cases of companies using the data while local people used historical information and created conflict in the way that people use them. There are cases that the tension is going very high but it does need negotiation. Rob – issues about data citizens and digital citizenship concepts. Jo – in terms of community governance, the OSM foundation is very hands off, and there isn’t detailed process for dealing with corporate employees who are mapping in their job. Evelyn – the conventions are matters of dispute and negotiation between participants. The conventions are being challenged all the time. One of the challenges of dealing with citizenship is to challenge the boundaries and protocols that go beyond the state. Retain the term to separate it from the subject.

The last session in the workshop focused on Data Issues: surveillance and crime 

David Wood talked about Smart City, Surveillance City: human flourishing in a data-driven urban world. The consideration is of the smart cities as an archetype of the surveillance society. Especially trying to think because it’s part of Surveillance Society, so one way to deal with it is to consider resistance and abolishing it to allow human flourishing. His interest is in rights – beyond privacy. What is that we really want for human being in this data driven environment? We want all to flourish, and that mean starting from the most marginalised, at the bottom of the social order. The idea of flourishing is coming from Spinoza and also Luciano Floridi – his anti-enthropic information principle. Starting with the smart cities – business and government are dependent on large quant of data, and increase surveillance. Social Science ignore that these technology provide the ground for social life. The smart city concept include multiple visions, for example, a European vision that is about government first – how to make good government in cities, with technology as part of a wider whole. The US approach is about how can we use information management for complex urban systems? this rely on other technologies – pervasive computing, IoT and things that are weaved into the fabric of life. The third vision is Smart Security vision – technology used in order to control urban terrain, with use of military techniques to be used in cities (also used in war zones), for example biometrics systems for refugees in Afghanistan which is also for control and provision of services. The history going back to cybernetics and policing initiatives from the colonial era. The visions overlap – security is not overtly about it (apart from military actors). Smart Cities are inevitably surveillance cities – a collection of data for purposeful control of population. Specific concerns of researchers – is the targeting of people that fit a profile of a certain kind of people, aggregation of private data for profit on the expense of those that are involved. The critique of surveillance is the issue of sorting, unfair treatment of people etc. Beyond that – as discussed in the special issue on surveillance and empowerment– there are positive potentials. Many of these systems have a role for the common good. Need to think about the city within neoliberal capitalism, separate people in space along specific lines and areas, from borders to building. Trying to make the city into a tamed zone – but the danger parts of city life are also source for opportunities and creativity. The smart city fit well to this aspect – stopping the city from being disorderly. There is a paper from 1995 critique pervasive computing as surveillance and reduce the distance between us and things, the more the world become a surveillance device and stop us from acting on it politically. In many of the visions of the human in pervasive computing is actually marginalised. This is still the case. There are opportunities for social empowerment, say to allow elderly to move to areas that they stop exploring, or use it to overcome disability. Participation, however, is flawed – who can participate in what, where and how? additional questions are that participation in highly technical people is limited to a very small group, participation can also become instrumental – ‘sensors on legs’. The smart city could enable to discover the beach under the pavement (a concept from the situationists) – and some are being hardened. The problem is corporate ‘wall garden’ systems and we need to remember that we might need to bring them down.

Next Francisco Klauser talked about Michel Foucault and the smart city: power dynamics inherent in contemporary governing through code. Interested in power dynamics of governing through data. Taking from Foucault the concept of understanding how we can explain power put into actions. Also thinking about different modes of power: Referentiality – how security relate to governing? Normativity – looking at what is the norm and where it is came from? Spatiality – how discipline and security is spread across space. Discipline is how to impose model of behaviour on others (panopticon). Security work in another way – it is free things up within the limits. So the two modes work together. Power start from the study of given reality. Data is about the management of flows. The specific relevance to data in cities is done by looking at refrigerated warehouses that are used within the framework of smart grid to balance energy consumption – storing and releasing energy that is preserved in them. The whole warehouse has been objectified and quantified – down to specific product and opening and closing doors. He see the core of the control through connections, processes and flows. Think of liquid surveillance – beyond the human.

Finally, Teresa Scassa explored Crime Data and Analytics: Accounting for Crime in the City. Crime data is used in planning, allocation of resources, public policy making – broad range of uses. Part of oppositional social justice narratives, and it is an artefact of the interaction of citizen and state, as understood and recorded by the agents of the state operating within particular institutional cultures. Looking at crime statistics that are provided to the public as open data – derived from police files under some guidelines, and also emergency call data which made from calls to the policy to provide crime maps. The data that use in visualisation about the city is not the same data that is used for official crime statistics. There are limits to the data – institutional factors: it measure the performance of the police, not crime. It’s how police are doing their job – and there are lots of acts of ‘massaging’ the data by those that are observed. The stats are manipulated to produce the results that are requested. The police are the sensors, and there is unreporting of crime according to the opinion of police person – e.g. sexual assault, and also the privatisation of policing who don’t report. Crime maps are offered by private sector companies that sell analytics, and then provide public facing option – the narrative is controlled – what will be shared and how. Crime maps are declared as ‘public awareness or civic engagement’ but not transparency or accountability. Focus on property offence and not white collar one. There are ‘alternalytics’ – using other sources, such as victimisation survey, legislation, data from hospital, sexual assault crisis centres, and crowdsourcing. Example of the reporting bottom up is harrassmap to report cases that started in Egypt. Legal questions are how relationship between private and public sector data affect ownership, access and control. Another one is how the state structure affect data comparability and interoperability. Also there is a question about how does law prescribe and limit what data points can be collected or reported.

The session closed with a discussion that explored some examples of solutionism  like crowdsourcing that ask the most vulnerable people in society to contribute data about assault against them which is highly problematic. The crime data is popular in portals such as the London one, but it is mixed into multiple  concerns such as property price. David – The utopian concept of platform independence, and assuming that platforms are without values is inherently wrong.

The workshop closed with a discussion of the main ideas that emerged from it and lessons. How are all these things playing out. Some questions that started emerging are questions on how crowdsourcing can be bottom up (OSM) and sometime top-down, with issues about data cultures in Citizen Science, for example. There are questions on to what degree the political aspects of citizenship and subjectivity are playing out in citizen science. Re-engineering information in new ways, and rural/urban divide are issues that bodies such as Ordnance Survey need to face, there are conflicts within data that is an interesting piece, and to ensure that the data is useful. The sensors on legs is a concept that can be relevant to bodies such as Ordnance Survey. The concept of stack – it also relevant to where we position our research and what different researchers do: starting from the technical aspects to how people engage, and the workshop gave a slicing through these layers. An issue that is left outside is the business aspect – who will use it, how it is paid. We need the public libraries with the information, but also the skills to do things with these data. The data economy is important and some data will only produced by the state, but there are issues with the data practices within the data agencies within the state – and it is not ready to get out. If data is garbage, you can’t do much with it – there is no economy that can be based on it. An open questions is when data produce software? when does it fail? Can we produce data with and without connection to software? There is also the physical presence and the environmental impacts. Citizen engagement about infrastructure is lacking and how we tease out how things open to people to get involved. There was also need to be nuanced about the city the same way that we focused on data. Try to think about the way the city is framed: as a site to activities, subjectivity, practices; city as a source for data – mined; city as political jurisdiction; city as aspiration – the city of tomorrow; city as concentration of flows; city as a social-cultural system; city as a scale for analysis/ laboratory. The title and data and the city – is it for a city? Back to environmental issues – data is not ephemeral and does have tangible impacts (e.g. energy use in blockchain, inefficient algorithms, electronic WEEE that is left in the city). There are also issues of access and control – huge volumes of data. Issues are covered in papers such as device democracy. Wider issues that are making link between technology and wider systems of thought and considerations.

Data and the City workshop (day 1)

The workshop, which is part of the Programmable City project (which is funded by the European Research Council), is held in Maynooth on today and tomorrow. The papers and discussions touched multiple current aspects of technology and the city: Big Data, Open Data, crowdsourcing, and critical studies of data and software. The notes below are focusing on aspects that are relevant to Volunteered Geographic Information (VGI), Citizen Science and participatory sensing – aspects of Big Data/Open data are noted more briefly.

Rob Kitchin opened with a talk to frame the workshop, highlighting the history of city data (see his paper on which the talk is based). We are witnessing a transformation from data-informed cities to data-driven cities. Within these data streams we can include Big Data, official data, sensors, drones and other sources. The sources also include volunteered information such as social media, mapping, and citizen science. Cities are becoming instrumented and networked and the data is assembled through urban informatics (focusing on interaction and visualisation) and urban science (which focus on modelling and analysis( . There is a lot of critique – with relations to data, there are questions about the politics of urban data, corporatisation of governance, the use of buggy, brittle and hackable urban systems, and social and ethical aspects.  Examples to these issues include politics: accepting that data is not value free or objective and influenced by organisations with specific interest and goals. Another issue is the corporatisation of data, with questions about data ownership and data control. Further issues of data security and data integrity when systems are buggy and brittle – there have been cases of hacking into a city systems already. Social, Political, and ethical aspects include data protection and privacy, dataveillance/surveillance, social sorting through algorithms, control creep, dynamic pricing and anticipatory governance (expecting someone to be a criminal). There are also technical questions: coverage, integration between systems, data quality and governance (and the communication of information about quality), and skills and organisational capabilities to deal with the data.
The workshop is to think critically about the data, and asking questions on how this data is constructed and run.

The talk by Jim Thatcher & Craig Dalton – explored provenance models of data. A core question is how to demonstrate that data is what is saying it is and where it came from. In particular, they consider how provenance applies to urban data. There is an epistemological leap from an individual (person) to a data point(s) – per person there can be up to 1500 data attribute per person in corporate database. City governance require more provenance in information than commercial imperatives. They suggest that data user and producers need to be aware of the data and how it is used.

Evelyn Ruppert asked where are the data citizens? Discuss the politics in data, and thinking about the people as subjects in data – seeing people as actors who are intentional and political in their acts of creating data. Being digital mediates between people and technology and what they do. There are myriad forms of subjectivation – there are issues of rights and how people exercise these rights. Being a digital citizens – there is not just recipient of rights but also the ability to take and assert rights. She used the concept of cyberspace as it is useful for understanding rights of the people who use it, while being careful about what it means. There is conflation of cyberspace and the Internet and failures to see it as completely separate space. She sees Cyberspace is the set of relations and engagements that are happening over the Internet. She referred to her recent book ‘Being Digital Citizens‘. Cyberspace has relationships to real space – in relations to Lefebvre concepts of space. She use speech-act theory that explore the ability to act through saying things, and there is a theoretical possibility of performativity in speech. We are not in command of what will happen with speech and what will be the act. We can assert acts through the things we do, and not only in the thing we say and that’s what is happening with how people use the Internet and construct cyberspace.

Jo Bates talked about data cultures and power in the city. Starting from hierarchy in dat and information. Data can be thought as ‘alleged evidence’ (Buckland) – data can be thought as material, they are specific things – data have dimensionality, weight and texture and it is existing something. Cox, in 1981, view the relationship between ideas, institutions and material capabilities – and the tensions between them – institutions are being seen as stabilising force compare to ideas and material capabilities, although the institutions may be outdated. She noted that sites of data cultures are historically constituted but also dynamic and porous – but need to look at who participate and how data move.

The session followed by a discussion, some of the issues: I’ve raised the point of the impact of methodological individualism on Evelyn and Jim analysis – for Evelyn, the digital citizenship is for collectives, and for Jim, the provenance and use of devices is done as part of collectives and data cultures. Jo explored the idea of “progressive data culture” and suggested that we don’t understand what are the conditions for it yet – the inclusive, participatory culture is not there. For Evelyn, data is only possible through the action of people who are involved in its making, and the private ownership of this data does not necessarily make sense in the long run. Regarding hybrid space view of cyberspace/urban spaces – they are overlapping and it is not helpful to try and separate them. Progressive data cultures require organisational change at government and other organisations. Tracey asked about work on indigenous data, and the way it is owned by the collective – and  noted that there are examples in the arctic with a whole setup for changing practices towards traditional and local knowledge. The provenance goes all the way to the community, the Arctic Spatial Data Infrastructure there are lots of issues with integrating indigenous knowledge into the general data culture of the system. The discussion ended with exploration of the special case of urban/rural – noting to the code/space nature of agricultural spaces, such as the remote control of John Deere tractors, use of precision agriculture, control over space (so people can’t get into it), tagged livestock as well as variable access to the Internet, speed of broadband etc.

The second session looked at Data Infrastructure and platforms, starting with Till Straube who looked at Situating Data Infrastructure. He highlighted that Git (GitHub) blurs the lines between code and data, which is also in functional programming – code is data and data is code. He also looked at software or conceptual technology stacks, and hardware is at the bottom. He therefore use the concept of topology from Science and Technology Studies and Actor-Network Theory to understand the interactions.

Tracey Lauriaultontologizing the city – her research looked at the transition of Ordnance Survey Ireland (OSi) with their core GIS – the move towards object-oriented and rules based database. How is the city translated into data and how the code influence the city? She looked at OSi, and the way it produce the data for the island, and providing infrastructure for other bodies (infrastructure). OSi started as colonial projects, and moved from cartographical maps and digital data model to a full object-oriented structure. The change is about understanding and conceptualising the mapping process. The ontology is what are the things that are important for OSi to record and encode – and the way in which the new model allows to reconceptualise space – she had access to a lot of information about the engineering, tendering and implementation process, and also follow some specific places in Dublin. She explore her analysis methods and the problems of trying to understand how the process work even when you have access to information.

The discussion that follows explored the concept of ‘stack’ but also ideas of considering the stack at planetary scale. The stack is pervading other ways of thinking – stack is more than a metaphor: it’s a way of thinking about IT development, but it can be flatten. It gets people to think how things are inter-relations between different parts. Tracey: it is difficult to separate the different parts of the system because there is so much interconnection. Evelyn suggested that we can think about the way maps were assembled and for what purpose, and understanding how the new system is aiming to give certain outcomes. To which Tracey responded that the system moved from a map to a database, Ian Hacking approach to classification system need to be tweaked to make it relevant and effective for understanding systems like the one that she’s exploring. The discussion expanded to questions about how large systems are developed and what methodologies can be used to create systems that can deal with urban data, including discussion of software engineering approaches, organisational and people change over time, ‘war stories’ of building and implementing different systems, etc.

The third and last session was about data analytics and the city – although the content wasn’t exactly that!

Gavin McArdle covered his and Rob Kitchin paper on the veracity of open and real-time urban data. He highlighted the value of open data – from claims of transparency and enlighten citizens to very large estimation of the business value. Yet, while data portals are opening in many cities, there are issues with the veracity of the data – metadata is not provided along the data. He covered spatial data quality indicators from ISO, ICA and transport systems, but questioned if the typical standard for data are relevant in the context of urban data, and maybe need to reconsider how to record it. By looking at 2 case studies, he demonstrated that data is problematic (e.g. indicating travel in the city of 6km in 30 sec). Communicating the changes in the data to other users is an issue, as well as getting information from the data providers – maybe possible to have meta-data catalogue that add information about a dataset and explanation on how to report veracity. There are facilities in Paris and Washington DC, but they are not used extensively

Next, Chris Speed talked about blockchain city – spatial, social and cognitive ledgers, exploring the potential of distributed recording of information as a way to create all forms of markets in information that can be controlled by different actors.

I have closed the session with a talk that is based on my paper for the workshop, and the slides are available below.

The discussion that followed explored aspects of representation and noise (produced by people who are monitored, instruments or ‘dirty’ open data), and some clarification of the link between the citizen science part and the philosophy of technology part of my talk – highlighting that Borgmann use of ‘natural’,’cultural’ and ‘technological’ information should not be confused with the everyday use of these words.

‘Nature’ Editorial on Citizen Science

The journal Nature published today an editorial on citizen science, titled ‘Rise of the citizen scientist’. It is very good editorial that addresses, head-on, some of the concerns that are raised about citizen science, but it is also have a problematic ending.

On the positive side, the editorial recognises that citizen scientists can do more than just data collection. The writer also demonstrated an inclusive understanding of citizen science that encompass both online and offline forms of participation. It also include volunteered computing in the list (with the reference for SETI@Home) and not dismiss it as outside the scope of citizen science.

It then show that concerns about the ability of citizen scientists to produce high quality data are not supported by research findings and as Caren Cooper noted, there are many other examples across multiple fields. My own minor contribution to this literature is to demonstrate that this is true for OpenStreetMap mappers. It also recognises the important of one of the common data assurance methods – the reliance on instrument reading as a reason to trust the data.

Finally, it recognise the need to credit citizen scientists properly, and the need to deal with their personal details (and location) carefully. So far, so good. 

Then, the article ends with rather a poor paragraph about ‘conflicts of interest’ and citizen science:

More troubling, perhaps, is the potential for conflicts of interest. One reason that some citizen scientists volunteer is to advance their political objectives. Opponents of fracking, for example, might help to track possible pollution because they want to gather evidence of harmful effects. When Australian scientists asked people who had volunteered to monitor koala populations how the animals should be managed, they found that the citizen scientists had strong views on protection that did not reflect broader public opinion.

Checking for air qualityI have already written here about the attitude of questioning activism and citizen science in specific local issues, but it seem that motivations especially irk scientists and science writers when they look at citizen science. So here some of the reasons that I think the claim above is contradictory.

There are two reasons for this: first, that scientists themselves have a complex set of motivations and are under the same ‘conflict of interests’ and secondly, if motivations having such an impact on science in general, than this is true for every science, not just citizen science.

Let’s start with the most obvious one – the whole point in the scientific method is that it investigates facts and conditions regardless of the motivation of the specific person that is carrying out the research. I have a reminder of that every day when I go to my office, at UCL’s Pearson Building. The building is named after Karl Pearson (known to any scientist because of the Pearson correlation), who was one of the leaders of Eugenics, which was the motivation for parts of his work. While I don’t like the motivation (to say the least) it doesn’t change the factual observations and analysis of the results though it surely change the interpretation of them, which we today reject. We therefore continue to use Pearson’s methods and science since they are useful despite of the motivation. We have detached the motivations from the science.

More generally, scientists like to believe that they are following Mertonian Norms and that they are ‘disinterested’ in their research – but listen to some of the episodes of the BBC Life Scientific and you discover that what keep them motivated to apply for research grants against the odds and to carry out long stretches of boring work are very deep personal motivations. They wouldn’t do it otherwise! Therefore, according to the paragraph above we should consider them conflicted.

Citizen Scientists are, of course, motivated by specific interests – they wouldn’t volunteer their free time otherwise. Look at the OED definition of citizen science at the sources of the term, and you discover that the first modern use of the term ‘citizen scientists‘ was in a report about the Audubon effort to campaign about acid rain. The fact that it was activism did not influence the very careful data collection and analysis operation. Or take the Royal Society for the Protection of Birds (RSPB) in which ‘Campaign with us‘ is the top option of ‘what we do’, and yet they run the valuable Big Garden Bird Watch with results used in scientific papers and for policy. The source of the activism, again, does not influence the outcomes, or the quality of the science.

Is it some forms of activism that Nature have a problem with?

The value of using citizen science in cases such as fracking, air quality or noise is that the scientific method support a systematic, disinterested, and objective data collection and analysis. It therefore allows to evaluate concerns about a specific issue and check if they are justified and supported by the evidence or not. In the same way that the environmental impact assessment and report from the fracking operators are created from a point of conflicts of interest, so does the data that come from the people who oppose it. As long as the data is being collected in a rigorous way, with evidence to back that it was done this way (e.g. timestamp from the smartphone, as the article noted) the scientific approach can provide evidence if the level of pollution from the fracking site (or planned site) is acceptable or not. Arguably, the risk of falsifying the data or pressure to drop inconvenient observations is actually greater, in my view, from the more powerful side of the equation.

My conclusion is that you can’t have it both ways: either science work regardless of motivations or the motivations and conflicts of interest are central to every other piece of science that Nature report on.