Following the two previous assertions, namely that:
‘you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’ (original post here)
‘All information sources are heterogeneous, but some are more honest about it than others’ (original post here)
The third assertion is about pattern of participation. It is one that I’ve mentioned before and in some way it is a corollary of the two assertions above.
‘When looking at crowdsourced information, always keep participation inequality in mind’
Because crowdsourced information, either Volunteered Geographic Information or Citizen Science, is created through a socio-technical process, all too often it is easy to forget the social side – especially when you are looking at the information without the metadata of who collected it and when. So when working with OpenStreetMap data, or viewing the distribution of bird species in eBird (below), even though the data source is expected to be heterogeneous, each observation is treated as similar to other observation and assumed to be produced in a similar way.
Yet, data is not only heterogeneous in terms of consistency and coverage, it is also highly heterogeneous in terms of contribution. One of the most persistence findings from studies of various systems – for example in Wikipedia , OpenStreetMap and even in volunteer computing is that there is a very distinctive heterogeneity in contribution. The phenomena was term ‘Participation Inequality‘ by Jakob Nielsn in 2006 and it is summarised succinctly in the diagram below (from Visual Liberation blog) – very small number of contributors add most of the content, while most of the people that are involved in using the information will not contribute at all. Even when examining only those that actually contribute, in some project over 70% contribute only once, with a tiny minority contributing most of the information.
Therefore, when looking at sources of information that were created through such process, it is critical to remember the nature of contribution. This has far reaching implications on quality as it is dependent on the expertise of the heavy contributors, on their spatial and temporal engagement, and even on their social interaction and practices (e.g. abrasive behaviour towards other participants).
Because of these factors, it is critical to remember the impact and implications of participation inequality on the analysis of the information. There will be some analysis to which it will have less impact and some where it will have major one. In either cases, it need to be taken into account.
Following the last post, which focused on an assertion about crowdsourced geographic information and citizen science I continue with another observation. As was noted in the previous post, these can be treated as ‘laws’ as they seem to emerge as common patterns from multiple projects in different areas of activity – from citizen science to crowdsourced geographic information. The first assertion was about the relationship between the number of volunteers who can participate in an activity and the amount of time and effort that they are expect to contribute.
This time, I look at one aspect of data quality, which is about consistency and coverage. Here the following assertion applies:
‘All information sources are heterogeneous, but some are more honest about it than others’
What I mean by that is the on-going argument about authoritative and crowdsourced information sources (Flanagin and Metzger 2008 frequently come up in this context), which was also at the root of the Wikipedia vs. Britannica debate, and the mistrust in citizen science observations and the constant questioning if they can do ‘real research’.
There are many aspects for these concerns, so the assertion deals with the aspects of comprehensiveness and consistency which are used as a reason to dismiss crowdsourced information when comparing them to authoritative data. However, at a closer look we can see that all these information sources are fundamentally heterogeneous. Despite of all the effort to define precisely standards for data collection in authoritative data, heterogeneity creeps in because of budget and time limitations, decisions about what is worthy to collect and how, and the clash between reality and the specifications. Here are two examples:
Take one of the Ordnance Survey Open Data sources – the map present themselves as consistent and covering the whole country in an orderly way. However, dig in to the details for the mapping, and you discover that the Ordnance Survey uses different standards for mapping urban, rural and remote areas. Yet, the derived products that are generalised and manipulated in various ways, such as Meridian or Vector Map District, do not provide a clear indication which parts originated from which scale – so the heterogeneity of the source disappeared in the final product.
The census is also heterogeneous, and it is a good case of specifications vs. reality. Not everyone fill in the forms and even with the best effort of enumerators it is impossible to collect all the data, and therefore statistical analysis and manipulation of the results are required to produce a well reasoned assessment of the population. This is expected, even though it is not always understood.
Therefore, even the best information sources that we accept as authoritative are heterogeneous, but as I’ve stated, they just not completely honest about it. The ONS doesn’t release the full original set of data before all the manipulations, nor completely disclose all the assumptions that went into reaching the final value. The Ordnance Survey doesn’t tag every line with metadata about the date of collection and scale.
Somewhat counter-intuitively, exactly because crowdsourced information is expected to be inconsistent, we approach it as such and ask questions about its fitness for use. So in that way it is more honest about the inherent heterogeneity.
Importantly, the assertion should not be taken to be dismissive of authoritative sources, or ignoring that the heterogeneity within crowdsources information sources is likely to be much higher than in authoritative ones. Of course all the investment in making things consistent and the effort to get universal coverage is indeed worth it, and it will be foolish and counterproductive to consider that such sources of information can be replaced as is suggest for the census or that it’s not worth investing in the Ordnance Survey to update the authoritative data sets.
Moreover, when commercial interests meet crowdsourced geographic information or citizen science, the ‘honesty’ disappear. For example, even though we know that Google Map Maker is now used in many part
s of the world (see the figure), even in cases when access to vector data is provided by Google, you cannot find out about who contribute, when and where. It is also presented as an authoritative source of information.
Despite the risk of misinterpretation, the assertion can be useful as a reminder that the differences between authoritative and crowdsourced information are not as big as it may seem.
Looking across the range of crowdsourced geographic information activities, some regular patterns are emerging and it might be useful to start notice them as a way to think about what is possible or not possible to do in this area. Since I don’t like the concept of ‘laws’ – as in Tobler’s first law of geography which is stated as ‘Everything is related to everything else, but near things are more related than distant things.’ – I would call them assertions. There is also something nice about using the word ‘assertion’ in the context of crowdsourced geographic information, as it echos Mike Goodchild’s differentiation between asserted and authoritative information. So not laws, just assertions or even observations.
The first one, is rephrasing a famous quote:
‘you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’
So the Christmas Bird Count can have tens of thousands of participants for a short time, while the number of people who operate weather observation stations will be much smaller. Same thing is true for OpenStreetMap – for crisis mapping, which is a short term task, you can get many contributors but for the regular updating of an area under usual conditions, there will be only few.
The exception for the assertion is the case for passive data collection, where information is collected automatically through the logging of information from a sensor – for example the recording of GPS track to improve navigation information.
During the symposium “The Future of PGIS: Learning from Practice?” which was held at ITC-University of Twente, 26 June 2013, I gave a talk titled ‘Keeping the spirit alive’ – preservations of participatory GIS values in the Geoweb, which explored what was are the important values in participatory GIS and how they translate to the Geoweb, Volunteered Geographic Information and current interests in crowdsourcing. You can watch the talk below.
To see the rest of the presentations during the day, see https://vimeo.com/album/2475389 and details of the event are available here http://www.itc.nl/Pub/Events-Conferences/2013/2013-June/Participatory-GIS-Symposium.html
8 July, 2013
The term ‘Citizen Science’ is clearly gaining more recognition and use. It is now get mentioned in radio and television broadcasts, social media channels as well as conferences and workshops. Some of the clearer signs for the growing attention include discussion of citizen science in policy oriented conferences such as UNESCO’s World Summit on Information Society (WSIS+10) review meeting discussion papers (see page ), or the Eye on Earth users conference (see the talks here) or the launch of the European Citizen Science Association in the recent EU Green Week conference.
Another aspect of the expanding world of citizen science is the emerging questions from those who are involved in such projects or study them about the efficacy of the term. As is very common with general terms, some reflections on the accuracy of the term are coming to the fore – so Rick Bonney and colleagues suggest to use ‘Public Participation in Scientific Research‘ (significantly, Bonney was the first to use ‘Citizen Science’ in 1995); Francois Grey coined Citizen Cyberscience to describe projects that are dependent on the Internet; recently Chris Lintott discussed some doubts about the term in the context of Zooniverse; and Katherine Mathieson asks if Citizen Science is just a passing fad. In our own group, there are also questions about the correct terminology, with Cindy Regalado suggestions to focus on ‘Publicly Initiated Scientific Research (PIScR)‘, and discussion on the meaning of ‘Extreme Citizen Science‘.
One way to explore what is going on is to consider the evolution of the ‘hype’ around citizen science through ‘Gartner’s Hype Cycle‘ which can be seen as a way to consider the way technologies are being adopted in a world of rapid communication and inflated expectations from technologies. leaving aside Gartner own hype, the story that the model is trying to tell is that once a new approach (technology) emerges because it is possible or someone reconfigured existing elements and claim that it’s a new thing (e.g. Web 2.0), it will go through a rapid growth in terms of attention and publicity. This will go on until it reaches the ‘peak of inflated expectations’ where the expectations from the technology are unrealistic (e.g. that it will revolutionize the way we use our fridges). This must follow by a slump, as more and more failures come to light and the promises are not fulfilled. At this stage, the disillusionment is so deep that even the useful aspects of the technology are forgotten. However, if it passes this stage, then after the realisation of what is possible, the technology is integrated into everyday life and practices and being used productively.
So does the hype cycle apply to citizen science?
If we look at Gartner cycle from last September, Crowdsourcing is near the ‘peak of inflated expectations’ and some descriptions of citizen science as scientific crowdsourcing clearly match the same mindset.
There is a growing evidence of academic researchers entering citizen science out of opportunism, without paying attention to the commitment and work that is require to carry out such projects. With some, it seems like that they decided that they can also join in because someone around know how to make an app for smartphones or a website that will work like Galaxy Zoo (failing to notice the need all the social aspects that Arfon Smith highlights in his talks). When you look around at the emerging projects, you can start guessing which projects will succeed or fail by looking at the expertise and approach that the people behind it take.
Another cause of concern are the expectations that I noticed in the more policy oriented events about the ability of citizen science to solve all sort of issues – from raising awareness to behaviour change with limited professional involvement, or that it will reduce the resources that are needed for activities such as environmental monitoring, but without an understanding that significant sustained investment is required – community coordinator, technical support and other aspects are needed here just as much. This concern is heightened by statements that promote citizen science as a mechanism to reduce the costs of research, creating a source of free labour etc.
On the other hand, it can be argued that the hype cycle doesn’t apply to citizen science because of history. Citizen science existed for many years, as Caren Cooper describe in her blog posts. Therefore, conceptualising it as a new technology is wrong as there are already mechanisms, practices and institutions to support it.
In addition, and unlike the technologies that are on Gartner chart, academic projects within which citizen science happen benefit from access to what is sometime termed patient capital without expectations for quick returns on investment. Even with the increasing expectations of research funding bodies for explanations on how the research will lead to an impact on wider society, they have no expectations that the impact will be immediate (5-10 years is usually fine) and funding come in chunks that cover 3-5 years, which provides the breathing space to overcome the ‘through of disillusionment’ that is likely to happen within the technology sector regarding crowdsourcing.
And yet, I would guess that citizen science will suffer some examples of disillusionment from badly designed and executed projects – to get these projects right you need to have a combination of domain knowledge in the specific scientific discipline, science communication to tell the story in an accessible way, technical ability to build mobile and web infrastructure, understanding of user interaction and user experience to to build an engaging interfaces, community management ability to nurture and develop your communities and we can add further skills to the list (e.g. if you want gamification elements, you need experts in games and not to do it amateurishly). In short, it need to be taken seriously, with careful considerations and design. This is not a call for gatekeepers , more a realisation that the successful projects and groups are stating similar things.
Which bring us back to the issue of the definition of citizen science and terminology. I have been following terminology arguments in my own discipline for over 20 years. I have seen people arguing about a data storage format for GIS and should it be raster or vector (answer: it doesn’t matter). Or arguing if GIS is tool or science. Or unhappy with Geographic Information Science and resolutely calling it geoinformation, geoinformatics etc. Even in the minute sub-discipline that deals with participation and computerised maps that are arguments about Public Participation GIS (PPGIS) or Participatory GIS (PGIS). Most recently, we are debating the right term for mass-contribution of geographic information as volunteered geographic information (VGI), Crowdsourced geographic information or user-generated geographic information.
It’s not that terminology and precision in definition is not useful, on the contrary. However, I’ve noticed that in most cases the more inclusive and, importantly, vague and broad church definition won the day. Broad terminologies, especially when they are evocative (such as citizen science), are especially powerful. They convey a good message and are therefore useful. As long as we don’t try to force a canonical definition and allow people to decide what they include in the term and express clearly why what they are doing is falling within citizen science, it should be fine. Some broad principles are useful and will help all those that are committed to working in this area to sail through the hype cycle safely.
18 March, 2013
The Consumers’ Association Which? magazine is probably not the first place to turn to when you look for usability studies. Especially not if you’re interested in computer technology – for that, there are sources such as PC Magazine on the consumer side, and professional magazines such as Interactions from Association for Computing Machinery (ACM) Special Interest Group on Computer-Human Interaction (SIGCHI).
Over the past few years, Which? is reviewing, testing and recommending Satnavs (also known Personal Navigation Devices – PNDs). Which? is an interesting case because it reaches over 600,000 households and because of the level of trust that it enjoys. If you look at their methodology for testing satnavs , you’ll find that it does resemble usability testing – click on the image to see the video from Which? about their methodology. The methodology is more about everyday use and the opinion of the assessors seems to play an important role.
Professionals in geographical information science or human-computer interaction might dismiss the study as unrepresentative, or not fitting their ways of evaluating technologies, but we need to remember that Which? is providing an insight into the experience of the people who are outside our usual professional and social context – people who go to a high street shop or download an app and start using it straightaway. Therefore, it’s worth understanding how they review the different systems and what the experience is like when you try to think like a consumer, with limited technical knowledge and understanding of maps.
There are also aspects that puncture the ‘filter bubble‘ of geoweb people – Google Maps are now probably the most used maps on the web, but the satnav application using Google Maps was described as ‘bad, useful for getting around on foot, but traffic information and audio instructions are limited and there’s no speed limit or speed camera data‘. Waze, the crowdsourced application received especially low marks and the magazine noted that it ‘lets users share traffic and road info, but we found its routes and maps are inaccurate and audio is poor‘ (both citations from Which? Nov 2012, p. 38). It is also worth reading their description of OpenStreetMap when discussing map updates, and also the opinions on the willingness to pay for map updates.
There are many ways to receive information about the usability and the nature of interaction with geographical technologies, and some of them, while not traditional, can provide useful insights.
20 October, 2012
The Spatial Data Infrastructure Magazine (SDIMag.com) is a relatively new e-zine dedicated to the development of spatial data infrastructures around the world. Roger Longhorn, the editor of the magazine, conducted an email interview with me, which is now published.
In the interview, we are covering the problematic terminology used to describe a wider range of activities; the need to consider social and technical aspects as well as goals of the participants; and, of course, the role of the information that is produced through crowdsourcing, citizen science, VGI with spatial data infrastructures.
19 December, 2011
As noted in the previous post, which focused on the linkage between GIS and Environmental Information Systems, the Eye on Earth Summit took place in Abu Dhabi on the 12 to 15 December 2011, and focused on ‘the crucial importance of environmental and societal information and networking to decision-making’. Throughout the summit, two aspects of public were discussed extensively. On the one hand, Principle 10 of the Rio declaration from 1992 which call for public access to information, participation in decision making and access to justice was frequently mentioned including the need to continue and extend its implementation across the world. On the other, the growing importance of citizen science and crowdsourced environmental information was highlighted as a way to engage the wider public in environmental issues and contribute to the monitoring and understanding of the environment. They were not presented or discussed as mutually exclusive approaches to public involvement in environmental decision making, and yet, they do not fit together without a snag – so it is worth minding the gap.
As I have noted in several talks over the past 3 years (e.g. at the Oxford Transport Research Unit from which the slides above were taken), it is now possible to define 3 eras of public access to environmental information. During the first era, between the first UN environmental conference, held in Stockholm in 1972, were the UN Environmental Programme (UNEP) was established, and the Earth conference in Rio in 1992, environmental information was collected by experts, to be analysed by experts, and to be accessed by experts. The public was expected to accept the authoritative conclusions of the experts. The second period, between 1990s and until the mid 2000s and the emergence of Web 2.0, the focus turned to the provision of access to the information that was collected and processed by experts. This is top-down delivery of information that is at the centre of Principle 10:
‘Environmental issues are best handled with participation of all concerned citizens, at the relevant level. At the national level, each individual shall have appropriate access to information concerning the environment that is held by public authorities, including information on hazardous materials and activities in their communities, and the opportunity to participate in decision-making processes. States shall facilitate and encourage public awareness and participation by making information widely available. Effective access to judicial and administrative proceedings, including redress and remedy, shall be provided’
Notice the two emphasised sections which focus on passive provision of information to the public – there is no expectation that the public will be involved in creating it.
With the growth of the interactive web (or Web 2.0), and the increase awareness to citizen or community science , new modes of data collection started to emerge, in which the information is being produced by the public. Air pollution monitoring, noise samples or traffic surveys – all been carried out independently by communities using available cheap sensors or in collaboration with scientists and experts. This is a third era of access to environmental information: produced by experts and the public, to be used by both.
Thus, we can identify 3 eras of access to environmental information: authoritative (1970s-1990s), top-down (1990s-2005) and collaborative (2005 onward).
The collaborative era presents new challenges. As in previous periods, the information needs to be at the required standards, reliable and valid. This can be challenging for citizen science information. It also need to be analysed, and many communities don’t have access to the required expertise (see my presentation from the Open Knowledge Foundation Conference in 2008 that deals with this issue). Merging information from citizen science studies with official information is challenging. These and other issues must be explored, and – as shown above – the language of Principle 10 might need revision to account for this new era of environmental information.
The previous post focused on citizen science as participatory science. This post is discussing the meaning of this differentiation. It is the final part of the chapter that will appear in the book:
The typology of participation can be used across the range of citizen science activities, and one project should not be classified only in one category. For example, in volunteer computing projects most of the participants will be at the bottom level, while participants that become committed to the project might move to the second level and assist other volunteers when they encounter technical problems. Highly committed participants might move to a higher level and communicate with the scientist who coordinates the project to discuss the results of the analysis and suggest new research directions.
This typology exposes how citizen science integrates and challenges the way in which science discovers and produces knowledge. Questions about the way in which knowledge is produced and truths are discovered are part of the epistemology of science. As noted above, throughout the 20th century, as science became more specialised, it also became professionalised. While certain people were employed as scientists in government, industry and research institutes, the rest of the population – even if they graduated from a top university with top marks in a scientific discipline – were not regarded as scientists or as participants in the scientific endeavour unless they were employed professionally to do so. In rare cases, and following the tradition of ‘gentlemen/women scientists’, wealthy individuals could participate in this work by becoming an ‘honorary fellow’ or affiliated to a research institute that, inherently, brought them into the fold. This separation of ‘scientists’ and ‘public’ was justified by the need to access specialist equipment, knowledge and other privileges such as a well-stocked library. It might be the case that the need to maintain this separation is a third reason that practising scientists shy away from explicitly mentioning the contribution of citizen scientists to their work in addition to those identified by Silvertown (2009).
However, similarly to other knowledge professionals who operate in the public sphere, such as medical experts or journalists, scientists need to adjust to a new environment that is fostered by the Web. Recent changes in communication technologies, combined with the increased availability of open access information and the factors that were noted above, mean that processes of knowledge production and dissemination are opening up in many areas of social and cultural activities (Shirky 2008). Therefore, some of the elitist aspects of scientific practice are being challenged by citizen science, such as the notion that only dedicated, full-time researchers can produce scientific knowledge. For example, surely it should be professional scientists who can solve complex scientific problems such as long-standing protein-structure prediction of viruses. Yet, this exact problem was recently solved through a collaboration of scientists working with amateurs who were playing the computer game Foldit (Khatib et al. 2011). Another aspect of the elitist view of science can be witnessed in interaction between scientists and the public, where the assumption is of unidirectional ‘transfer of knowledge’ from the expert to lay people. Of course, as in the other areas mentioned above, it is a grave mistake to argue that experts are unnecessary and can be replaced by amateurs, as Keen (2007) eloquently argued. Nor is it suggested that, because of citizen science, the need for professionalised science will diminish, as, in citizen science projects, the participants accept the difference in knowledge and expertise of the scientists who are involved in these projects. At the same time, the scientists need to develop respect towards those who help them beyond the realisation that they provide free labour, which was noted above.
Given this tension, the participation hierarchy can be seen to be moving from a ‘business as usual’ scientific epistemology at the bottom, to a more egalitarian approach to scientific knowledge production at the top. The bottom level, where the participants are contributing resources without cognitive engagement, keeps the hierarchical division of scientists and the public. The public is volunteering its time or resources to help scientists while the scientists explain the work that is to be done but without expectation that any participant will contribute intellectually to the project. Arguably, even at this level, the scientists will be challenged by questions and suggestions from the participants and, if they do not respond to them in a sensitive manner, they will risk alienating participants. Intermediaries such as the IBM World Community Grid, where a dedicated team is in touch with scientists who want to run projects and a community of volunteered computing providers, are cases of ‘outsourcing’ the community management and thus allowing, to an extent, the maintenance of the separation of scientists and the public.
As we move up the ladder to a higher level of participation, the need for direct engagement between the scientist and the public increases. At the highest level, the participants are assumed to be on equal footing with the scientists in terms of scientific knowledge production. This requires a different epistemological understanding of the process, in which it is accepted that the production of scientific insights is open to any participant while maintaining scientific standards and practices such as systematic observations or rigorous statistical analysis to verify that the results are significant. The belief that, given suitable tools, many lay people are capable of such endeavours is challenging to some scientists who view their skills as unique. As the case of the computer game that helped in the discovery of new protein formations (Khatib et al. 2011) demonstrated, such collaboration can be fruitful even in cutting-edge areas of science. However, it can be expected that the more mundane and applied areas of science will lend themselves more easily to the fuller sense of collaborative science in which participants and scientists identify problems and develop solutions together. This is because the level of knowledge required in cutting-edge areas of science is so demanding.
Another aspect in which the ‘extreme’ level challenges scientific culture is that it requires scientists to become citizen scientists in the sense that Irwin (1995), Wilsdon, Wynne and Stilgoe (2005) and Stilgoe (2009) advocated (Notice Stilgoe’s title: Citizen Scientists). In this interpretation of the phrase, the emphasis is not on the citizen as a scientist, but on the scientist as a citizen. It requires the scientists to engage with the social and ethical aspects of their work at a very deep level. Stilgoe (2009, p.7) suggested that, in some cases, it will not be possible to draw the line between the professional scientific activities, the responsibilities towards society and a fuller consideration of how a scientific project integrates with wider ethical and societal concerns. However, as all these authors noted, this way of conceptualising and practising science is not widely accepted in the current culture of science.
Therefore, we can conclude that this form of participatory and collaborative science will be challenging in many areas of science. This will not be because of technical or intellectual difficulties, but mostly because of the cultural aspects. This might end up being the most important outcome of citizen science as a whole, as it might eventually catalyse the education of scientists to engage more fully with society.