After a very full first day, the second day opened with a breakfast that provided opportunity to meet the board of the Citizen Science Association (CSA), and to talk and welcome people who got up early (starting at 7am) for another full day of citizen science. Around the breakfast tables, new connections were emerging. Similarly to the registration queue in the first day, people where open and friendly, starting conversations with new acquaintances, and sharing their interest in citizen science. An indication to the enthusiasm was that people continued talking as they departed to the morning sessions. CSA breakfast

5A Symposium: Linking Citizen Science and Indigenous Knowledge: an avenue to sustainable development 

The session explored the use of different data collection tools to capture and share traditional knowledge. Dawn Wright, Esri chief scientist started with Emerging Citizen Science Initiatives at Esri. Dawn started with Esri view of science – beyond fundamental science understanding, it is important to see science as protecting life, enabling stewardship and to share information about how the Earth works, how it should look (geodesign) and how we should look at the Earth. As we capture the data with various mobile devices – from mobile phones to watches and sensors we are becoming more geoaware and geoenabled. The area of geotechnologies that enable it – are apps and abilities such as storytelling are very valuable. Esri views geoliteracy as combination of understanding geography and scientific data – issues are more compelling when they are mapped and visualised. The Collector for ArcGIS provide the ability to collect data in the field, and it has been used by scouts as well as in Malawi where it is used by indigenous farmers to help in managing local agriculture. There are also abilities to collect information in the browser with ‘GeoForm’ that support such data collection. Maps were used to collect information about street light coverage and buffering the range that is covered. A third method is a that allow to tell information with a narrative. Snap2Map is an app that allow to link data collection and put it directly to story-maps. There is also a that allow collection of information directly from the browser.

Michalis Vitos, UCL – Sapelli, a data collection platform for non-literate, citizen-scientists in the rainforest. Michalis described the Extreme Citizen Science group – which was set up with the aim to provide tools for communities all over the world. In the Congo-basin communities face challenges from illegal logging and poaching , but forest people have direct competition for resources such as the trees that they use, and with the FLEGT obligations in the Republic of Congo, some protection is emerging. The team collaborate with a local NGOs which works with local communities, and there are challenges including literacy, energy, and communication. Sapelli collector is an application work with different levels that allow the data collection area. The Sapelli launcher locks the interface of the phone, and allow specific functions to be exposed to the user. The issue of connectivity was address in communication procedures that use SMS. The issue of providing electricity can be done in different ways – including while cooking. There is a procedure for engaging with a community – starting with Free and Prior Informed Consent, and the process start with icons, using them in printed form and make sure that the icons are understood – after the agreement on the icons, there is an introduction to the smartphones – how to touch, how to tap and the rest of the basics. The next stage is to try it in the field. Sapelli is now available in Google Play – the next stage is to ensure that we can show the participants what they collected, but as satellite images are difficult to obtain, the group is experimenting with drone imagery and mapping to provide the information back to the community. In terms of the results to the community, the project is moving from development to deployment with a logging company. The development of the icons is based on working with anthropologists who discuss the issues with the community and lead the development of the icons. Not all the icons work and sometime need to be change. The process involved compensating the community for the time and effort that they put in.

Sam Sudar, University of Washington – Collecting data with Open-Data-Kit (ODK) - Sam gave a background on the tool – the current version and the coming ODK 2.0. ODK is information management tools for collecting and storing data and making it usable, targeted at resource-constrained environment – anywhere where there is limited connectivity, without assuming smartphone literacy. It is used all over the world. It is being used in Kenya, and by Jane Goodall Institute (JGI) in Tanzania, the Surui tribe use it in Brazil to gain carbon credits, and the Carter Center in Egypt for election monitoring, as well as WWF in Rwanda. The technology is used in very diverse ways. Need to consider how technology empowers data collection. The ODK workflow is first, build the form, collect the data, and finally aggregate the results. ODK build / ODK XLSform is the way to build it in Excel, then there is ODK collect to render the forms, and finally ODK aggregate can run locally or on Google App Engine. There is a strong community around ODK with much support for it. In ODK 1.0 there is no data update on the mobile device, as it replicated the paper process. There is limitation for customisation of the interface, or linking to sensors. ODK 2.0 can provide better abilities and it allow syncing of information even it is done on the cloud. The ODK survey replacing ODK collect, and the ODK tables is a way to interact with data on the device. The intention is to make it possible to interact with the data in an easier way.

A question from the audience asked if local communities worries about the data collected about them? ODK work with a lot of medical information, but the team doesn’t goes on the ground so it is left to whoever use the system to ensure ethical guidelines are followed. Michalis noted that there are not only problems with external body, but also cultural sensitivities about what data should be seen by whom, and there is an effort to develop tools that are responsive to it.

Tanya Birch, Google Earth – Outreach Community-based field data collection and Google mapping tools the video include Jane Goodall work in Tanzania with Chimpanzee, due to habitat lost, there are less than 300,000 chimpanzee left in the wild. In the video, Lillian Pintea (JGI) noted the importance of satellite images that demonstrate all the bare hills in the area of Tanzania. That lead to improve the life of the local villagers so they become partners in conservation. The local communities are essential – they share the status of the work with the people in the village. The forest monitor role is to work across the area, collect data and monitor it to ensure that they can collected data with ODK. Location information is easier in tablet and then upload it to Google, and then it is shared with global effort to monitor forests. Gombe national park is the laboratory for scaling up across the area of habitat of Chimpanzees and using Google abilities and reach to share it widely.

Another question that came up was: How you have used the tools with youth or challenges of working with young people? Dawn noted that the engagement with youth, the term digital native is true and they end teaching the teachers on how to improve the apps. The presentations discussed the simplicity in technology so you don’t need to know what is going on in the background. Another question is: do people want to change the scale of analysis – standing in the point and taking a picture of a mountain, and how to address different scales? Dawn noted that the map as part of the collection tool allow people to see it as they collect the data and for example allow them to indicate the scale of what they viewed. Michalis noted that there is also the option in Sapelli to measure scale in football pitches, and Luis noted that in CyberTracker, there is an option to indicate that the information was collected in a different place to where the observer is. Data sharing is something that is important, but make sure that it can be exported in something as simple as

6E Symposium: Human-Centred Technologies for Citizen Science 

Kevin Crowston (Syracuse U.) & Andrea Wiggins (U. Maryland  & symposium convener): Project diversity and design implications describe a survey in which most attention was paid to small projects, and by surveying a wider range of projects they discover different practices. To evaluate the design implication they suggested that we need to understand what the goal of the project, the participation activities – from science, conservation, to photography – different things that people are doing, with observations is the most common type of contribution (see First Monday paper). Data quality come up in all the projects and there are different strategies to deal with it. There are diversities of engagement – from conference and meetings to social media. There are also rewards for participation – some projects are not doing rewards at all, others provide volunteer appreciation, training , equipment and another approach is to provide competitive rewards in leaderboards. There are also socialisation – and even formal education. Funding – diverse, from grants, private contributions, to sponsorship and sustainability is an issue.

Mobile and Social Technologies
-Anne Bowser (U. Maryland)  Gamifying phenology with Floracaching app – geocaching for plants – the application focuses on phenology and earlier version was developed for Project BudBurst. Traditional volunteers focus contribution to science, while millennials might be interested in mobile app that is based on games. Embedded maps can be used to create a cache and there is a leader-board and points. Floracaching was created from paper prototyping and focus groups. They found perception of gamification was important to millennials, they also enjoyed competition. Also wanted to be told what to do and feedback on how they’ve done. ‘I’m not going to drive an hour to see a plant bloom’ . Missions can be added to the design and help people to learn the application and the data collection.

-Michalis Vitos (UCL): Sapelli, a mobile data collection platform for non-literate indigenous communities, Michalis covered Sapelli, and the importance of the interface design (see previous session). The design of the icons is being discussed with, effectively, paper prototyping

-Muki Haklay (UCL): Geographical human-computer interaction for citizen science apps (I’ll blog it later!)

-Matt Germonprez, Alan Kolok, U. Nebraska Omaha, & Matt Levy (San Francisco State U.): Enacting citizen science through social media - Matt come from a technology angle – he suggested that social media is providing different form of information, and social media – can it be integrated into a citizen science projects. The science project is to monitor Atrazine which started in 2012, with a process similar to a litmus test, the project worked, but they wanted to use social media in the social setting that they work. Facebook wasn’t used beyond the information, but Twitter and Instagram was used to report observations publicly. The problems – no social conversations, so the next stage they want to maintain social conversation as the next goal. The  project can be found when you search for Lil’ Miss Atrazine.

Developing Infrastructures
-Jen Hammock (Smithsonian Institution): An infrastructure for data distribution and use, the aim of the project of looking at snails – findability problem, a tool that they want to develop is for data search – so following different sources for information, and merging the taxa, location, as well as providing alerts about interests. Notification will be provided to the researcher and to the contributor. There can be knowledge about the person that contribute the information. There are technical and social barriers – will researchers and experienced naturalists be interested in sharing information.

-Yurong He (U. Maryland): Improving biodiversity data sharing among diverse communities. looking at biodiversity – and the encyclopaedia of life. There are content partners who provide the data. She looked at 259 content partners and found 6 types of data providers – and they are professional organisations that operate over time such as IUCN, NHM etc. The second type is repositories, professional database emerge in the 1990s. There are citizen science intiative and communities of interest, such as Xeno-Canto for bird song. Fourth, social media platforms such as wikipedia,  Fifth, education communities who add information while they focus on education and finally subsidiaries. We need to know the practices of the providers more to support sharing of information.

-S. Andrew Sheppard (U. Minnesota & Houston Engineering, Inc.): Facilitating scalability and standardization. Andrew talked about the wq framework. He focused on collection, storage and exchange. Standards are making possible to make projects work together, there are devices, field notes, computers, phones – but it is challenging to coordinate and make them all work together. Web browsers are based on standards are making it possible to work across platforms. Javascript is also supported across platforms. The provide the ability to collect information. The exchange require sharing data from different sources, Need to build the software to adapt to standards – is a platform to allow the creation of multiple links. Use standards, HTML5 and build adaptable tools for data exchange

-Stuart Lynn, Adler Planetarium & Zooniverse: Developing tools for the next scientific data deluge. Stuart discussed about their online community. They have 1.2m users. The challenge in the future is that there are going to be many projects and data sources that give huge amount of data. The aim is to partner with machine learning algorithm developers but how to keep the crowd interested and not just give the most difficult cases with no opportunity to learn or progress slowly. Gamification can be stressful, so they try to give more information and learning. They also try to create a community and discuss the issues. There is huge distribution of comments – and deepening engagement. There is no one size fits all and we need to model and understand them better.

Contributors and Communities
-Jenny Preece (U. Maryland): Motivating and demotivating factors for long-term participation – what motivate people to come back again and again. The different motivational aspects – describing the work of the late Dana Rotman who collected information in the US, India and Costa Rica. 142 surveys from the us, 156 from India and also interviews in the three countries. She used grounded theory approach and developed a framework initial, and for long term impact there are internal and external motivation. Demotivations – time, problems with technology, long commitment with the task.

-Carsten Oesterlund, Gabriel Mugar, & Kevin Crowston (Syracuse U.): Technology features and participant motivations, the heterogeneity and variety of participants – how might we approach them? people change over time? looking at zooniverse – specifically planet hunters, there are annotations, talk and other sources of information. The talk pages – new comers and encouraged to annotate and comment about the image and also looking at what other people have done. They also find people that are more experienced. Use of talk change over time, people start putting in comments, then they go down and stop commenting and then later on started putting more information. There is also role discovery in terms of engagement and what they do in their community.

-Charlene Jennet (UCL): Identifying and promoting creativity – creativity is a puzzling question, which is debated in psychology with some people look for breakthrough moment, while other look at everyday creativity. There are examples of projects that led to creativity – such as foldit, in terms of everyday creativity in citizen cyberscience and conducting interviews with volunteers and results include artwork from the old weather forum or the Galaxy Zoo Peas and eyewire chatbots that were created for members. People who are engaged in the project are contributing more to the project. Providing feedback on progress is important, and alos regular communication and personal feedback in blogs and answering in tweeters. Event help and also need to have ability role management.

-Carl Lagoze (U. Michigan) Inferring participant expertise and data quality – focusing on eBird and there is a paper in big data and society. The standard way is to control the provenance of the data. The library is creating ‘porous zone’ so today there is less control over the who area. There are barriers that break down between novices and experts. How can we tell experts/non experts – this happen across areas, and it is sort of distributed sensor network with weak sensors. are there signal in the data that help you to identify people and the quality of their information.

7C Panel: Citizen Science and Disasters: The Case of OpenStreetMap – 

Robert Soden (University of Colorado, Boulder) described the GFDRR project of Open Cities to collect data for resilience planning and explained the reasons to select OpenStreetMap to use for it. Kathmandu is recognised as at risk place, and there was an aim to identify schools that are at risk, but there was a need to do the basic mapping. There was a local partnership with universities in the area. There was a challenge of figuring out data model – number of stories, usage, roof type, wall type, age. There was a need to make students to collect information that will help in modelling the risk. They produced a lot of training material. The project was successful in collecting the data and enriching the information. The process helped in creating an OpenStreetMap community out of it, and then they launched a local NGO (Kathmandu Living Labs). Trust in the data was important and there was a risk of discrediting the data – to deal with that, they involved targeted users early as well as spot check the data and done a fuller assessment of the data. They launching similar projects in Jamaica. Vietnam and Madagascar. They want to engage people in more than just data collection, and how they can be support to grow the community

Mikel Maron (Humanitarian OpenStreetMap Team) Mikel covered what is OpenStreetMap (OSM), the OSM foundation is a different entity than Wikimedia, which is confusing. OSM are a very wide community of many thousands of people that continue to contribute. Humanitarian OpenStreetMap Team (H.O.T) is following the ‘Cute Cat theory for humanitarian maps’ – use something that is alive and people are used to contribute to, when you need it in emergency situations. OSM is used in many organisation and projects in government. Attempts to map all these organisations is challenging. In Bangladesh, there are 6 OSM projects and require cooperation between agencies – at least all projects contribute to the same database. Organisations find it challenging that they need to support but can’t control. Starting from Gaza in 2009, OSM community started to map the area although there was no specific request. OSM was eventually used to create local tourist map. The community in Gaza didn’t continue – providing long term support is difficult.Haiti 2010 helped in producing the data and it was difficult to coordinate, so that led to the tasking manager. MapGive is providing support through imagery to the crowd – a way to support OSM by utilising the DigitalGlobe database. There are development of linking OSM and citizen science. There is very rich data in OSM and there is need to understand social science and data research.

8E Symposium: Ethical Dimensions of Citizen Science Research
Caren Cooper opened with a list of issues: participation vs exploitation; beneficence, maleficence, autonomy and justice; incentives vs manipulation; IP and data ownership; data misuse, sharing accessiblity; opennes vs privacy and security; cultural competence. 

Holly Menninger led – the project that she focusing on – home microbiom at home. Asking dust samples from home that volunteers share and they look at the content. Volunteers want to understand their home but also the science. There was the issue of reporting back to participants – They want to understand the information, and they provided some information and it was a challenge to translate the scientific information into something useful. People are interested in the information at home, sometime due to personal issues – e.g. request to get the results because someone is ill in the house. There is a lag of 2 years between samples and results, and it need to be explained to the participants. There is also an issue that the science is exploratory, which mean that there are no specific answers that can be answered for participants.

Madhusudan Katti explored the appropriation of citizens knowledge. In the realm of IP in traditional knowledge is discussed a lot. Appropriating local knowledge and then publishing when the information came from local knowledge through interviews – but the scientists get the fame. Collecting information about engendered species where there is risk from local community. he mentioned the film Living with elephants which focus on the conflicts between humans and elephants but that also might help poachers.

Janet Stemwedel highlighted that even participant-led citizen science can be helped with DIY science. DIY science it is self efficacy, and control the process, so if the participants running the show, than what can go wrong? Who better to protect my autonomy than me? The answer that autonomy is tricky and need good information about potential risks and benefits and your current choices can hurt future prospects for choosing freely (don’t use autonomy to get addicted, or what you do with your personal information), finally our exercise of autonomy can impact others’ prospects of free choice (DNA analysis have an impact on your wider family). Institutional Research Board (IRB) is a mechanism to think it through – potential consequence (good and bad), who could be impacted? strategies for answering the question. Reasons to resist IRB – not legally required, and the academic scientists complain about it, as well as no access to an IRB.

The reason to get over the resistance is that unintentional harm is not a good thing, also to get feedback from more eyes helped to know about tools and approach. Ethical objectivity is to go beyond just gut feeling and discuss with other people.

Anne Bowser discussed the ethics of gamification – the use of game design elements in non-game contexts (using leader boards). Old weather had an element of games, and also the floracaching as an example. There is labour/exploitation too – in games such as Civilization II is done for fun, while you learn about history. Online games are using different approaches to extract more from their users. Does contribution to science cleanse the ethical issues because it’s not for motives? crowdsourcing was critique in different ways. There are also tracking and privacy, so it also provide habits and all sort of details about the users (e.g. in foursquare) – salesforce is getting badges to encourage people to act in specific ways as employees. Ethical citizen science: treat participants as collaborators; don’t waste volunteer time; volunteers are not computers (Prestopnik & Cowston 2012). Ethical design allow participants to be aware of the implication and decide if they want gamification or not.

Lea Shanley – covering data privacy – her awareness came from working with Native American tribes, with participatory mapping. Tribes started to use participatory GIS. There were many things they wanted to map – and the participants had difference in views about sharing the data or not. Some places were careful and some was not. In disaster response, there is all the social media curation, and many people that are open data evangelist and they started sharing location of first aiders location and actually risking them. In citizen science, there is lack of attention to location – places were they recorded, and even real time information that risk physical security of participants. Face recognition is possible. Information collected by volunteer can reveal medical information that can harm people prospects. sensitive information, sacred sites location, endangered species. Toxic environments can risk volunteers. There are also issues with who interpret and manage the data. social norms and reinforcing social norms. An emerging area is security of social media – crowdsourcing teams where hacked in DARPA red balloon challenge. There can be issues with deliberate hacking to citizen science from people who don’t like it.

Dianne Quigley – Northeast Ethics Education Partnership, that came from issues of environmental and social justice to improve ethical knowledge of researchers. When researchers start with a community they start with discussion of risk/benefits and consider who is getting something out of it. Training graduate students to know how to work with communities. avoid harming – non-maleficence; also informed consent of working with communities, protecting data; justice is a way to think of linguistic diversity, respect to local knowledge, and also recruitment in a fair way in terms of representation. Data management and protocols. There is a need to learn humility – to respect the needs and practices of the community.

There are ideas to start an ethics group in the CSA and consider code of ethics or participant bill of rights. do we need to extend IRB oversight? co-created common rule? is there a value in code of ethics or will it be a dead word? The discussion explored the need bottom up projects which also need to consider the impacts and outputs, communication with the public and promising what the research will deliver, and the investment of time in citizen science by early career researchers can also impact their career prospect. These are challenges that are common in community participatory research.

9A Panel: The brave new world of citizen science: reflecting critically on notions of citizenship in citizen science

The panel is specifically reflecting on the citizenship aspects of citizen science. Citizen science is a significant phenomena, and feeling that need a critical voice within it. What is the place of the citizen in citizen science? question about governance, methodologies practices and methodologies. How does it connect to wider democratisation of knowledge?

Eugenia Rodrigues (University of Edinburgh, UK) asked: what model of citizenship it promotes? one way is to look at the demographics, but we can ask about the term – possible to use volunteer, amateur, or extended peer community (as in Post-Normal Science). The term citizen include autonomy, creativity, liberty , responsibility, having a stake and other meaning. What are the citizens doing and are we constructing a story that recognises the citizen scientists as a citizen? The story that is appearing in work in North-east of England dealing with water pollution in local woodland, where they noted that the Environment Agency was not doing things satisfactory way, so their need of their local habitat was overlooked. In this case  we have contextual/experiential knowledge and expert monitoring skills to lead to a change. Citizen science can be seen as counter expertise. We need to include – some classification are trying to control the role of the citizens, the need to control levels of participation to improve quality, do not give space for participants to exercise their citizenship fully.

Shannon Dosemagen (Public Lab) – in public lab there are specific attention to environmental monitoring and there is a need to re-imagine the role. In public lab they prefer to use civic science or community science and not citizen science because it can be controversial or different in different places. They also think of scientists and non-scientists not in a supplicant way. Consider how engage people in the whole process. Different roles play out in different ways – they want to be active about it. There are different roles within the community of public lab but it is about egalitarian approach to roles?

Esther Turnhout (Wageningen University) looking at expertise and quality control in citizen science networks for biodiversity knowledge. Biodiversity knowledge is existing in amateur naturalists and they started using the term citizen science. To conceptualise – there are complex relationships with mainstream science. Biodiversity recording been around for a long time and the data is increasing demand for decision making. What it brought with it is demand to professionalise and increase standards and quality. The validation is the complex networks of amateurs, experts, professionals and decision makers – looking at actors in the network. Validation is done in different places with different motivations – there are hierarchical network inside the naturalists groups and enforcing them with novices. The digitise data is compared with existing observation and there is reciprocity between observer and the process of collecting and organise the data. There are lots of things – butterflies, community of observers, the field guide – the process is circular. But increasingly, validation is imposed and procedural. Validation seizes to be collective and the records no longer circulate. The main concern is to keep check where the data go and belong to the observer. The citizenship dependent on not just turning the data into probabilities. There is a need to maintain control over the data.

Rick Hall (Ignite!, UK) there been different learned societies around the country – the learned societies that emerged from the 18th century, the acts of enclosures and the workhouses enslaved large groups in society. Today, we can ask about Internet barons if they are trying to do the same as mill owners. There is a cultural entitlement in the human right declaration. The current president of the Royal Society – finding things for yourself is at the very heart of science. It matter where it takes place – for example in a popup shop that allows community curiosity labs and explore questions that matter to them. Spaces in schools that young people can take ownership over their investigations. In spaces like Lab_13 are spaces to learn how to become a scientist. The issues are asking young people what people want to know know. We need spaces where citizens learn not just science but how to become scientists… We need more community and civic citizen scientists because the world need more curios minds.

Erinma Ochu (University of Manchester, UK) – as a neuroscientist she found her research that it requires empathy and stories as a way the science evolved as powerful and controlling. What happen when you bring science to the public realm? How to ensure that it is inclusive for women and minorities?

For me, the discussion highlighted that it was mostly about collective action and egalitarianism in the production of knowledge -so expertise without hierarchy.

another observer raised the issue of democratisation and what notion of political actions we would like to see within citizen science

The final keynote was from Amy Robinson EyeWire: Why Do Gamers Enjoy Mapping the Brain? demonstrating the game and how it works. Lessons from EyeWire – it’s been running for 2 years and a lot of things that were learned. The idea: if we build it, they will play – that’s not happen. Actually, carefully crafted, slowly built community – creating the tools, learning about how things are used. Media is crucial – 60% of eyewire registration came within 5 days of major media event. Major media event is in facebook, twitter and other social media – suddenly things are coming from media. Facebook page can convert viewers to participants. Media relations are an active engagement, not just waiting for journalist – share all sort of things, and funny things. Reaching out to media also require being prepared to it – and you need to cope with it and capture it. Create internal analytics to understand how the project works. Engagement is also a major issue – there is a huge drop off after two months. By creating games and missions can provide a reason to capture people’s interest. Prestige within the community can work to motivate them – changing the user handle colour can demonstrate the recognition by the project. There are also specific challenges and set their own challenges. Accuracy and efficiency – using the power players in the game to have a bigger role in the project. How do you recognise a potential power players in your game? Design of the entry page is critical – the page is minimalist and reduce the amount of information that you need to enter the system. They have created all sort of interesting collaboration such as fascinating visualisations. There is also need to take risks and see if they are going to work or not.

Abe Miller-Rushing close the conference asking people to share talks and links, as well as posters will come online. We are aiming to create a community and serve the needs. The new board chair, Greg Newman continue with some take aways from the conference which completed the conference.

Another account of the conference is available at

As far as I can tell, Nelson et al. (2006) ‘Towards development of a high quality public domain global roads database‘ and Taylor & Caquard (2006) Cybercartography: Maps and Mapping in the Information Era are the first peer-reviewed papers that mention OpenStreetMap. Since then, OpenStreetMap has received plenty of academic attention. More ‘conservative’ search engines such as ScienceDirect or Scopus find 286 and 236 peer reviewed papers (respectively) that mention the project. The ACM digital library finds 461 papers in the areas that are relevant to computing and electronics, while Microsoft Academic Research finds only 112. Google Scholar lists over 9000 (!). Even with the most conservative version from Microsoft, we can see an impact on fields ranging from social science to engineering and physics. So lots to be proud of as a major contribution to knowledge beyond producing maps.

Michael Goodchild, in his 2007 paper that started the research into Volunteered Geographic Information (VGI), mentioned OpenStreetMap (OSM), and since then there is a lot of conflation of OSM and VGI. In some recent papers you can find statements such as ‘OpenstreetMap is considered as one of the most successful and popular VGI projects‘ or ‘the most prominent VGI project OpenStreetMap so, at some level, the boundary between the two is being blurred. I’m part of the problem – for example, with the title of my 2010 paper ‘How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasetsHowever, the more I think about it, the more uncomfortable I am with this equivalence. I feel that the recent line from Neis & Zielstra (2014) is more accurate: ‘One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM)‘. I’ll explain why.

Let’s look at the whole area of OpenStreetMap studies. Over the past decade, several types of research paper have emerged.

First, there is a whole set of research projects that use OSM data because it’s easy to use and free to access (in computer vision or even string theory). These studies are not part of ‘OSM studies’ or VGI, as, for them, this is just data to be used.

Edward Betts. CC-By-SA 2.0 via Wikimedia Commons

Second, there are studies about OSM data: quality, evolution of objects and other aspects from researchers such as Peter Mooney, Pascal Neis, Alex Zipf  and many others.

Third, there are studies that also look at the interactions between the contribution and the data – for example, in trying to infer trustworthiness.

Fourth, there are studies that look at the wider societal aspects of OpenStreetMap, with people like Martin Dodge, Chris Perkins, and Jo Gerlach contributing in interesting discussions.

Finally, there are studies of the social practices in OpenStreetMap as a project, with the work of Yu-Wei Lin, Nama Budhathoki, Manuela Schmidt and others.

[Unfortunately, due to academic practices and publication outlets, many of these papers are locked behind paywalls, but thatis another issue… ]

In short, there is a significant body of knowledge regarding the nature of the project, the implications of what it produces, and ways to understand the information that emerges from it. Clearly, we now know that OSM produces good data and are ware of the patterns of contribution. What is also clear is that many of these patterns are specific to OSM. Because of the importance of OSM to so many application areas (including illustrative maps in string theory!) these insights are very important. Some of these insights are expected to also be present in other VGI projects (hence my suggestions for assertions about VGI) but this needs to be done carefully, only when there is evidence from other projects that this is the case. In short, we should avoid conflating VGI and OSM.

Today, OpenStreetMap celebrates 10 years of operation as counted from the date of registration. I’ve heard about the project when it was in early stages, mostly because I knew Steve Coast when I was studying for my Ph.D. at UCL.  As a result, I was also able to secured the first ever research grant that focused on OpenStreetMap (and hence Volunteered Geographic Information – VGI) from the Royal Geographical Society in 2005. A lot can be said about being in the right place at the right time!

OSM Interface, 2006 (source: Nick Black)

OSM Interface, 2006 (source: Nick Black)

Having followed the project during this decade, there is much to reflect on – such as thinking about open research questions, things that the academic literature failed to notice about OSM or the things that we do know about OSM and VGI because of the openness of the project. However, as I was preparing the talk for the INSPIRE conference, I was starting to think about the start dates of OSM (2004), TomTom Map Share (2007), Waze (2008), Google Map Maker (2008).  While there are conceptual and operational differences between these projects, in terms of ‘knowledge-based peer production systems’ they are fairly similar: all rely on large number of contributors, all use both large group of contributors who contribute little, and a much smaller group of committed contributors who do the more complex work, and all are about mapping. Yet, OSM started 3 years before these other crowdsourced mapping projects, and all of them have more contributors than OSM.

Since OSM is described  as ‘Wikipedia of maps‘, the analogy that I was starting to think of was that it’s a bit like a parallel history, in which in 2001, as Wikipedia starts, Encarta and Britannica look at the upstart and set up their own crowdsourcing operations so within 3 years they are up and running. By 2011, Wikipedia continues as a copyright free encyclopedia with sizable community, but Encarta and Britannica have more contributors and more visibility.

Knowing OSM closely, I felt that this is not a fair analogy. While there are some organisational and contribution practices that can be used to claim that ‘it’s the fault of the licence’ or ‘it’s because of the project’s culture’ and therefore justify this, not flattering, analogy to OSM, I sensed that there is something else that should be used to explain what is going on.

TripAdvisor FlorenceThen, during my holiday in Italy, I was enjoying the offline TripAdvisor app for Florence, using OSM for navigation (in contrast to Google Maps which are used in the online app) and an answer emerged. Within OSM community, from the start, there was some tension between the ‘map’ and ‘database’ view of the project. Is it about collecting the data so beautiful maps or is it about building a database that can be used for many applications?

Saying that OSM is about the map mean that the analogy is correct, as it is very similar to Wikipedia – you want to share knowledge, you put it online with a system that allow you to display it quickly with tools that support easy editing the information sharing. If, on the other hand, OSM is about a database, then OSM is about something that is used at the back-end of other applications, a lot like DBMS or Operating System. Although there are tools that help you to do things easily and quickly and check the information that you’ve entered (e.g. displaying the information as a map), the main goal is the building of the back-end.

Maybe a better analogy is to think of OSM as ‘Linux of maps’, which mean that it is an infrastructure project which is expected to have a lot of visibility among the professionals who need it (system managers in the case of Linux, GIS/Geoweb developers for OSM), with a strong community that support and contribute to it. The same way that some tech-savvy people know about Linux, but most people don’t, I suspect that TripAdvisor offline users don’t notice that they use OSM, they are just happy to have a map.

The problem with the Linux analogy is that OSM is more than software – it is indeed a database of information about geography from all over the world (and therefore the Wikipedia analogy has its place). Therefore, it is somewhere in between. In a way, it provide a demonstration for the common claim in GIS circles that ‘spatial is special‘. Geographical information is infrastructure in the same way that operating systems or DBMS are, but in this case it’s not enough to create an empty shell that can be filled-in for the specific instance, but there is a need for a significant amount of base information before you are able to start building your own application with additional information. This is also the philosophical difference that make the licensing issues more complex!

In short, both Linux or Wikipedia analogies are inadequate to capture what OSM is. It has been illuminating and fascinating to follow the project over its first decade,  and may it continue successfully for more decades to come.

More or Less‘ is a good programme on BBC Radio 4. Regularly exploring the numbers and the evidence behind news stories and other important things, and checking if they stand out. However, the piece that was broadcast  this week about Golf courses and housing in the UK provides a nice demonstration of when not to use crowdsourced information. The issue that was discussed was how much actual space golf courses occupy, when compared to space that is used for housing. All was well, until they announced in the piece the use of clever software (read GIS) with a statistical superhero to do the analysis. Interestingly, the data that was used for the analysis was OpenStreetMap – and because the news item was about Surrey, they started doing the analysis with it.

For the analysis to be correct, you need to assume that all the building polygons in OpenStreetMap and all the Golf courses have been identified and mapped. My own guess that in Surrey, this could be the case – especially with all the wonderful work of James Rutter catalysed. However, assuming that this is the case for the rest of the country is, well, a bit fancy. I wouldn’t dare to state that OpenStreetMap is complete to such a level, without lots of quality testing which I haven’t seen. There is only the road length analysis of ITO World! and other bits of analysis, but we don’t know how complete OSM is.

While I like OpenStreetMap very much, it is utterly unsuitable for any sort of statistical analysis that works at the building level and then summing up to the country levelbecause of the heterogeneity of the data . For that sort of thing, you have to use a consistent dataset, or at least one that attempts to be consistent, and that data comes from the Ordnance Survey.

As with other statistical affairs, the core case that is made about the assertion as a whole in the rest of the clip is relevant here. First, we should question the unit of analysis (is it right to compare the footprint of a house to the area of Golf courses? Probably not) and what is to be gained by adding up individual building’s footprints to the level of the UK while ignoring roads, gardens, and all the rest of the built environment. Just because it is possible to add up every building’s footprint, doesn’t mean that you should. Second, this analysis is the sort of example of ‘Big Data’ fallacy which goes analyse first, then question (if at all) what the relationship between the data and reality.

At the State of the Map (EU) 2011 conference that was held in Vienna from 15-17 July, I gave a keynote talk on the relationships between the OpenStreetMap  (OSM) community and the GIScience research community. Of course, the relationships are especially important for those researchers who are working on volunteered Geographic Information (VGI), due to the major role of OSM in this area of research.

The talk included an overview of what researchers have discovered about OpenStreetMap over the 5 years since we started to pay attention to OSM. One striking result is that the issue of positional accuracy does not require much more work by researchers. Another important outcome of the research is to understand that quality is impacted by the number of mappers, or that the data can be used with confidence for mainstream geographical applications when some conditions are met. These results are both useful, and of interest to a wide range of groups, but there remain key areas that require further research – for example, specific facets of quality, community characteristics  and how the OSM data is used.

Reflecting on the body of research, we can start to form a ‘code of engagement’ for both academics and mappers who are engaged in researching or using OpenStreetMap. One such guideline would be  that it is both prudent and productive for any researcher do some mapping herself, and understand the process of creating OSM data, if the research is to be relevant and accurate. Other aspects of the proposed ‘code’ are covered in the presentation.

The talk is also available as a video from the TU Wien Matterhorn server



In March 2008, I started comparing OpenStreetMap in England to the Ordnance Survey Meridian 2, as a way to evaluate the completeness of OpenStreetMap coverage. The rational behind the comparison is that Meridian 2 represents a generalised geographic dataset that is widely use in national scale spatial analysis. At the time that the study started, it was not clear that OpenStreetMap volunteers can create highly detailed maps as can be seen on the ‘Best of OpenStreetMap‘ site. Yet even today, Meridian 2 provides a minimum threshold for OpenStreetMap when the question of completeness is asked.

So far, I have carried out 6 evaluations, comparing the two datasets in March 2008, March 2009, October 2009, March 2010, September 2010 and March 2011. While the work on the statistical analysis and verification of the results continues, Oliver O’Brien helped me in taking the results of the analysis for Britain and turn them into an interactive online map that can help in exploring the progression of the coverage over the various time period.

Notice that the visualisation shows the total length of all road objects in OpenStreetMap, so does not discriminate between roads, footpaths and other types of objects. This is the most basic level of completeness evaluation and it is fairly coarse.

The application will allow you to browse the results and to zoom to a specific location, and as Oliver integrated the Ordnance Survey Street View layer, it will allow you to see what information is missing from OpenStreetMap.

Finally, note that for the periods before September 2010, the coverage is for England only.

Some details on the development of the map are available on Oliver’s blog.

This post reviews the two books about OpenStreetMap that appeared late in 2010:  OpenStreetMap: Using and Enhancing the Free Map of the World (by F. Ramm, J. Topf & S. Chilton, 386 pages, £25) and OpenStreetMap: Be your own Cartographer (by J. Bennett, 252 pages, £25). The review was written by Thomas Koukoletsos, with some edits by me. The review first covers the Ramm et al. book, and then compares it to Bennett’s. It is fairly details, so if you want to see the recommendation, scroll all the way down.

OpenStreetMap: Using and Enhancing the Free Map of the World is a comprehensive guide to OpenStreetMap (OSM), aimed at a wide range of readers, from those unfamiliar with the project to those who want to use its information and tools and integrate them with other applications. It is written in accessible language, starting from the basics and presenting things in an appropriate order for the reader to be able to follow, slowly building the necessary knowledge.

Part I, the introduction, covers 3 chapters. It presents the OSM project  generally, while pointing to other chapters wherever further details are provided later on. This includes how the project started, a short description of its main interface, how to export data, and some of its related services such as OpenStreetBugs and OpenRouteService. It concludes with a reference on mapping parties and the OSM foundation. This gives all the necessary information for someone new to OSM to get a general idea, without becoming too technical.

Part II, addressing OSM contributors, follows with chapter 4 focusing on how GPS technology is used for OSM. The balance between the technical detail and accessibility continues, so all the necessary information for mapping is presented in an easily digested way even for those not familiar with mapping science. The following chapter covers the whole mapping process using a very comprehensive case study, through which the reader understands how to work in the field, edit and finally upload the collected data. Based on this overview, the next chapter is slightly more technical, describing the data model followed by OSM. The information provided is necessary to understand how the OSM database is structured.

Chapter 7 moves on to details, describing what objects need to be mapped and how this can be done by using tags. The examples provided help the user to move from simpler to more complicated representations. The importance of this chapter, however, is in emphasising that, although the proposed tagging framework is not compulsory, it would be wise to do it as this will increase the consistency in the OSM database. The chapter ends with a suggestion of mapping priorities, from ‘very important’ objects and attributes to ‘luxury’ ones. Chapter 8 continues with map features, covering all other proposed mapping priorities. The split between the two chapters guides the user gradually from the most important features to those covered by expert OSM users, as otherwise mapping might have been far too difficult a task for new participants.

Chapter 9 describes Potlatch, an online editor which is the most popular. The description is simple and complete, and by the end the user is ready to contribute to the OSM database. The next chapter refers to JOSM, an offline editor designed for advanced users, which is more powerful than Potlatch but more difficult to use – although the extensive instructions make the use of this tool almost as easy as Potlatch. Chapter 11 concludes the review of editors by providing basic information on 5 other editors, suitable for desktop or mobile use. Chapter 12 presents some of the tools for mappers, designed to handle the OSM data or perform quality assurance tests. Among the capabilities described are viewing data in layers, monitoring changes in an area, viewing roads with no names, etc. The second part ends, in Chapter 13, with a description of the OSM licensing framework, giving the reader a detailed view of what source of data should be avoided when updating OSM to save it from copyright violations.

Part III of Ramm et al. is far more technical, beginning with how to use OSM on web pages. After providing the necessary information on tiling used for the OSM map (Mapnik and Tiles@Home servers), chapter 14 moves on to the use of OSM with Google Maps or with OpenLayers. Code is provided to assist the learning process. Chapter 15 provides information on how to download data, including the ability to download only changes and update an already downloaded version, explained further in a following chapter.

The next three chapters dive into cartographic issues, with chapter 16 starting with Osmarender, which helps visualising OSM data. With the help of many examples, the reader is shown how this tool can be used to render maps, and how to customise visualisation rules to create a personal map style. Chapter 17 continues with Mapnik, a more efficient tool than Osmarender for large datasets. Its efficiency is the result of reading the data from a PostgreSQL database. A number of other tools are required to be installed for Mapnik; however, they are all listed with basic installation instructions. The chapter concludes with performance tips, with an example of layers used according to the zooming level so that rendering is faster. The final renderer, described in chapter 18, is Kosmos. It is a more user-friendly application than the previous two, and the only one with a Graphical User Interface (GUI). The rules used to transform OSM data into a map come from the wiki pages, so anyone in need of a personal map style will have to create a wiki page. There is a description of a tiling process using Kosmos, as well as of exporting and printing options. The chapter concludes by mentioning Maperitive, the successor to Kosmos to be released shortly.

Chapter 19 is devoted to mobile use of OSM. After explaining the basics of navigation and route planning, there is a detailed description of how to create and install OSM data on Garmin GPS receivers. Additional applications for various types of devices are briefly presented (iPhones, iPods, Android), as well as other routing applications. Chapter 20 closes the third part of the book with an extensive discussion on licence issues of OSM data and its derivatives. The chapter covers the CC-BY-SA licence framework, as well as a comprehensive presentation of the future licence, without forgetting to mention the difficulties of such a change.

Part IV is the most technical part, aimed at those who want to integrate OSM into their applications. Chapter 21 reveals how OSM works, beginning with the OSM subversion repository, where the software for OSM is managed. Chapter 22 explains how the OSM Application Programming Interface (API) works. Apart from the basic data handling modes (create, retrieve, update or delete objects and GPS tracks), other methods of access are described, as well as how to work with changesets. The chapter ends with OAuth, a method to allow OSM authentication through third party applications keeping the necessary user information. Chapter 23 continues with XAPI, which is a different API that, although offers only read requests and its data may be a few minutes old, it allows more complex queries, returns more data than the standard API (e.g. historic versions) and allows RSS feeds from selected objects. Next, the Name Finder and Nominatim search engines for gazetteer purposes are covered. Lastly, GeoNames is mentioned, which, although not an OSM relative, can be used in combination with other OSM tools.

Chapter 24 presents Osmosis, a tool to filter and convert OSM data. Apart from enabling read and write of XML files, this tool is also able to access PostgreSQL and MySql databases for read and write purposes. It also describes how to create and process change files in order to continually update a local dataset or database from the OSM server. Chapter 25 moves deeper into more advanced editing, presenting the basics of large-scale or other automated changes. As such changes can affect a lot of people and their contributions, the chapter begins with ‘a note of caution’, discussing that, although power editing is available to everyone, a contact and discussion with those whose data is to be changed should be made.

Chapter 26 focuses on imports and exports including some of the programs that are used for specific data types. The final chapter presents a rather more detailed overview of how to run an OSM as well as a tile server, covering the requirements and installation. There is also a presentation of the API schema, and alternatives to the OSM API are also mentioned.

The book ends with the appendix, consisting of two parts, covering geodesy basics, and specifically geographic coordinates, datum definition and projections; and information on local OSM communities for a few selected countries.

Overall, the book is accessible and comprehensive.

Now, we turn to review the second book (Bennett) by focusing on differences between the two books.OpenStreetMap - Bennet

Chapters 1 and 2 give a general description of the OSM project and correspond to the first three chapters of Ramm et al. The history of OSM is more detailed here. The main OSM web page description does not include related websites but, on the other hand, it does describe how to use the slippy map as well as how to interact with data. The chapters also focus on the social aspect of the project, briefly presenting more details on a user’s account (e.g. personalisation of the user’s profile by adding a user photo, home location to enable communication with other users in the area or notification of local events).

Chapter 3 corresponds to chapters 4 and 5 of the first book. There is a more detailed description of how GPS works, as well as of how to configure the receiver; however, the other ways of mapping are less detailed. A typical mapping example and a more comprehensive description of the types of GPS devices suitable for OSM contribution, which are provided in Ramm et al., are missing.

Chapter 4 corresponds to chapters 6, 7 and 8 of the first book. Some less than important aspects are missing, such as the data model history. However, Ramm et al. is much more detailed on how to map objects, classifying them according to their importance and providing practical examples of how to do it, while in this chapter a brief description of tags is provided. Both books succeed in communicating the significance of following the wiki suggestions when it comes to tagging, despite the ‘any tags you like’ freedom. An interesting point, which is missing from the first book, is the importance of avoiding tagging for the renderer, explained here with the use of a comprehensive example.

Chapter 5 describes the editors Potlatch, JOSM and Merkaartor, corresponding with chapters 9, 10, and 11 of Ramm et al. Having the three editors in one chapter allows for a comparison table between them, giving a much quicker insight. A practical example with a GPS trace file helps in understanding the basics operation with these editors. More attention is given to Potlatch, while the other two editors are described only briefly. No other editors are described or mentioned.

Chapter 6 provides a practical example of using the three editors and shows how to map objects, which was covered in chapters 6, 7 and 8 in the first book. While the first book is more detailed and includes a wider range of mapping cases, here the reader becomes more familiar with the editors and learns how to provide the corresponding information. In addition to the material in the first book, here we have an example of finding undocumented tags and using OSMdoc.

Chapter 7 corresponds to chapter 12 of the first book, with a detailed description of the four basic tools to check OSM data for errors. However, Ramm et al. offers a broader view by mentioning or briefly describing seven other error-checking tools.

Chapter 8 deals with map production, similar to chapters 2, 16 and 18 of Ramm et al. The Osmarender tool is described in detail in both books. Kosmos renderer, however, is described in much more detail here, although it is no longer developed. The chapter’s summary here is very useful, as it presents briefly the 3 rendering tools and compares them. What is missing from this book, however, is a description of Mapnik (chapter 17 of Ramm et al.) and also the use of tiling in web mapping.

Chapter 9 corresponds to chapters 15, 22 and 23 of Ramm et al. Regarding planet files, Bennett provides a description of a way to check the planet file’s integrity, which can be useful for automating data integration processes. Moving on to OSM’s API, this book is confined to describing ways of retrieving data from OSM, unlike the first book that also includes operations to create, update or delete data. XAPI, however, is more detailed in this book, including how to filter data. In this chapter’s summary a brief description and comparison of the ways to access data is helpful. On the other hand, Ramm et al. briefly describes additional APIs and web services that are not covered here.

Chapter 10 matches chapter 24 of the first book. In both cases Osmosis is described in detail, with examples of how to filter data. The first book includes a more complete description of command line options, classified according to the data streams (entity or change). This book, on the other hand, is more explanatory on how to access data based on a predefined polygon, and further explaining how to create and use a customised one. The first book mentions additional tasks, such as ‘log progress’, ‘report integrity’, ‘buffer’, ‘sort’, while here only the latter is used during an example. An advantage of Bennett’s book, however, is that the use of Osmosis with a PostgreSQL database, as well as how to update data and how to automate a database update procedure, is explained more comprehensively and extensively.

The last chapter talks about future aspects of OSM. The OSM licence and its future development is explained in a comprehensive way, corresponding to the end of chapter 20 of the first book, with the use of some good examples to show where the present OSM licence is problematic. However, throughout Bennett’s book, licence issues are not covered as well as in Ramm et al. (chapters 13, 20), and the reader needs to reach the end of the book to understand what is allowed and what is not with the OSM data. Moving on, MapCSS, a common stylesheet language for OSM, is explained in detail, while in the first book it is simply mentioned at the end of chapter 9 during a discussion of Potlatch 2. The book ends with Mapzen POI collector for iPhone, covered in chapter 11 of the first book.

When compared to the first book, what is missing here is the use of OSM for navigation in mobile devices (chapter 19), large-scale editing (chapter 25), writing or finding software for OSM (chapter 21) and how to run an OSM server (chapter 27). Another drawback is the lack of coloured images; in some cases (e.g. chapter 7 – the NoName layer) it is difficult to understand them.

So which book is for me?

Both the books more or less deal with the same information, as shown by the chapters’ comparison and sequence.

Although there are areas where the two books are complementary, in most cases Ramm et al. provides a better understanding of the matters discussed, using a broader and more extensive view. It addresses a wide range of readers, from those unfamiliar with OSM to the advanced programmers who want to utilise it elsewhere, and is written with a progressive build-up of knowledge, which helps in the learning process. It also benefits from the dedicated website where updates are provided.  Bennett’s book, on the other hand, would be comparably more difficult to read for someone who has not heard of OSM, as well as for those in need of using it but who are not programming experts. There is a hidden assumption that the reader is fairly technically literate. It suffers somewhat from not being introductory enough, while at the same time not being in-depth and detailed.

As the two books are sold at a similar price point, we liked the Ramm et al. book much more and would recommend it to our students.


Get every new post delivered to your Inbox.

Join 2,839 other followers