28 January, 2015
What does addresses got to do with economic theory and political dogma? turn out that quite a lot. As I was looking at the latest press release from the cabinet office, proudly announcing that the government is investing in (yet another) UK address database, I realised that the handling of UK addresses, those deceivingly simple ‘221b Baker St NW1 6XE‘ provide a parable for the stupidity of neoliberalism.
To avoid doubt: this is not about Open Addresses UK. It’s about the systemic failures of the past 20 years.
Also for avoidance of doubt, my views are similar to Richard Murphy about the joy of tax. I see collective action and common investment in national assets through taxation as a wonderful thing, and I don’t mind R&D investment being spent on infrastructure that might fail – it’s true for Beagle 2 as much as it’s true for a national address database. So you won’t see here ‘this is a waste of taxpayers money’. It’s the systemic issues that I question here.
Finally, If I got some specific details of the history of the development wrong – I’m happy to be stand corrected!
The starting point must be to understand what is the point in address database. The best explanation is from one of the top UK experts on this issue – Bob Barr (OBE). Bob identified ‘Core Reference Geographies‘ which have the following characteristics: Definitive; Should be collected and maintained once and used many times; Are Natural monopolies; Have variable value in different applications; and, Have highly elastic demand. We can also call these things ‘Commons‘ because the way we want people to be able to share them while protecting their future – and ideally avoid ‘tragedy of the commons‘.
Addresses are such ‘core reference geography’. Think about all the applications for a single, definitive database of all UK addresses – it can be used to send the post, plan the census, dispatch emergency services, deliver a broadband link to the right property, check for fraud during purchase transactions, and much more. To make sense of the address above, you need to have geographical location, street name and house number and postcode. Ordnance Survey map can be used to set the location, the street name is set by the local authority and the postcode by the Royal Mail. Merge these sources with a few other bits of information and in principle, you can have a definitive set. Do it for the whole country and you have this ‘core reference geography’, which sounds simple…
The story is a bit more complex – as long as information was not digitised and linked, mismatches between addresses from different sources was not a huge problem, but in the mid 1990s, because of the use of digital records and databases, it became important to have a common way to link them. By that time, the Post Office Postal Address File (PAF) became the de facto definitive address database. Actually, it’s been around since the 1970s, used by the Post Office not as a definitive address database, but to serve internal needs of mail delivery. However, in the absence of any other source, people started to using it – for example, in statistical studies (e.g. this paper from 1988). While I can’t find a specific source for the history of PAF, I guess that at some point, it became a product that is shared with other organisations and sold for direct marketing companies and other users. Naturally, it wouldn’t be what you would design as the definitive source if you start all over again, but it was there, and it was good enough, so people used it.
Without raising false nostalgia about the alternatives, imagine that the need for definitive address database happened at a time when all the entities that are responsible for the elements of an address were part of the public sector. There would be plenty of power struggles, feet dragging, probably cross-departmental animosity and all sort of other obstacles. However, as been proven time and again – when it is all inside the sphere of government control, reorganisation is possible. So you could imagine that at the end of the day, you’d get ‘address directorate’ that manage addresses as national commons.
Now, we can get to the core of the story. Let’s look at the definition of neoliberalism that I want to use here. The definition is from a very good article on the Daily Kos that uses the definition ‘Neoliberalism is a free market economic philosophy that favors the deregulation of markets and industries, the diminution of taxes and tariffs, and the privatization of government functions, passing them over to private business.’ In terms of the political dogma that came with it, it is seeing market solutions as the only solution to societal issues. In the UK, this form of thinking started in the 1980s.
By the time that GIS proliferated and the need for a definitive address database became clear, the neoliberal approach was in full gear. The different entities that need to share information in order to create this common address database were pushed out of government and were asked to act in quasi-commercial way, at which point, the people who run them are instructed to maximise the self-interest of the entity and market their products at prices that ‘the market will bare’. However, with no alternatives and necessity to use definitive information, pricing is tricky. In terms of sharing information and creating a common product, such entities started bickering over payments, intellectual property and control. The Ordnance Survey had Address-Point, the Post Office/Royal Mail had the PAF, and while being still de facto datasets, no satisfactory definitive database emerged. You couldn’t get beyond this point as the orgnaisational structure requires each organisation to hold to their ‘property’, so while the need became clearer, the solution was now more difficult.
In the second round, what looks like a good bottom-up approach was proposed. The idea was the local authorities are the best source of information to create a definitive address database (National Land and Property Gazetteer) because they are the closest to the changes on the ground and can manage them. However, we are under neoliberal dogma, so the whole thing need to operate commercially, and you go for a public/private partnership for that. Guess what? It didn’t work.
Third round, you merge the company from the second round with entity from the first round to create another commercial partnership. And you are still stuck, because fundamentally, there is still the demand to control assets in order to sell them in the market.
Fourth and something that deserve as the most idiotic step in the story is the privatisation of the Royal Mail, which need to maintain ‘assets’ in order to be ‘attractive for investors’ so you sell the PAF with it. It all work within neoliberal logic but the implications is that instead of just dealing with a network of public owned bodies which it is possible to dictate what they should do, you now have it in the private sector, where intellectual property is sacred.
In the final stage, you think: oh, I got a solution, let’s create a new entity that will crowdsource/reuse open data, however, you are a good neoliberal and you therefore ask it to come up with a business model. This time it will surely work, ignoring the huge effort to build business models and all the effort that was invested into trying to pay for a sustainable address databases in the past 20 years. This time it’s going to work.
Let’s ask then, if we do believe in markets so much, should we expect to see a competitor address database to PAF/Address-Point/NLPG appearing by now? Here we can argue that it’s an example for ‘market failure‘ – the most obvious kind is when you can see lack of investment or interest from ‘participants in the market’ to even start trading.
If indeed it was all about free markets and private entrepreneurial spirit, you might expect to see several database providers competing with one another, until, eventually, one or two will become the dominant (the ‘natural monopoly’ above) and everyone use their services. Building such a database in the era of crowdsourcing should be possible. Just like with the early days of OpenStreetMap, you don’t want ‘contamination’ by copying information from a source that holds database rights or copyright over the information that you use. So we want cases of people voluntarily typing in their addresses, while the provider collate the raw data. Inherently, the same way that Google crowdsource queries because people are typing it and giving the text to Google for use, so does anyone who type their delivery address in Amazon.co.uk. This is crowdsourced addresses – not copied from an external dataset, so even if, for the aim of error checking the entry is tested against PAF, they are not derivatives. Take all these addresses, clean and organise them, and you should have a PAF competitor that was created by your clients.
So Amazon is already an obvious candidate for creating it from ‘passive crowdsourcing’ as a side effect of their day to day operations. Who else might have a database that came from people inputting addresses in the UK to a degree that the body can create a fairly good address database? It doesn’t take a lot of thinking to realise that there are plenty. Companies that are operating at a scale like Amazon probably got a very high percentage of addresses in the UK. I’d guess that also Experian will have it for their credit checks, and Landmark is in a very good place because of all the property searches. You can surely come with many more. None of these companies is offering a competition to PAF, so that tells you that commercially, no private sector company is willing to take the risk and innovate with a product. That’s understandable, as there is the litigation risk from all the messy group of quasi-public and private bodies that see addresses as their intellectual property. The end result: there is private sector provision of address database.
And all the while, nobody is daring to think about nationalising the database, force, by regulation and law that all these quasi-commercial bodies work together regardless of their ways of thinking. And it’s not that nationalisation is impossible – just check how miraculously Circle Healthcare is ‘exit private contract‘ (because the word nationalisation is prohibited in neoliberal dogma).
To avoid trolling from open data advocates: I wish the best to Open Addresses UK. I think that it’s a super tough task and it will be great to see how it evolves. If, like OSM, one of the companies that can crowdsource addresses can give them their dirty data, it is possible that they build a database fast. This post is not a criticism of Open Address UK, but all the neolibral dogmatic people who can’t simply go for the most obvious solution: take the PAF out of Royal Mail and give it to Open Addresses. Considering the underselling of the shares, there is an absolute financial justification to do so, but that’s why I pointed the sanctity of private companies assets…
So the end result: huge investment by government, failing again and again (and again) because they insist on neoliberal solutions instead of the obvious treatment of commons – hold them by government and fund them properly.
16 January, 2015
Thanks to invitations from UNIGIS and Edinburgh Earth Observatory / AGI Scotland, I had an opportunity to reflect on how Geographic Information Science (GIScience) can contribute to citizen science, and what citizen science can contribute to GIScience.
Despite the fact that it’s 8 years since the term Volunteers Geographic Information (VGI) was coined, I didn’t assume that all the audience is aware of how it came about or the range of sources of VGI. I also didn’t assume knowledge of citizen science, which is far less familiar term for a GIScience audience. Therefore, before going into a discussion about the relationship between the two areas, I opened with a short introduction to both, starting with VGI, and then moving to citizen science. After introduction to the two areas, I’m suggesting the relationships between them – there are types of citizen science that are overlapping VGI – biological recording and environmental observations, as well as community (or civic) science, while other types, such as volunteer thinking includes many projects that are non-geographical (think EyeWire or Galaxy Zoo).
However, I don’t just list a catalogue of VGI and citizen science activities. Personally, I found trends a useful way to make sense of what happen. I’ve learned that from the writing of Thomas Friedman, who used it in several of his books to help the reader understand where the changes that he covers came from. Trends are, of course, speculative, as it is very difficult to demonstrate causality or to be certain about the contribution of each trends to the end result. With these caveats in mind, there are several technological and societal trends that I used in the talk to explain how VGI (and the VGI element of citizen science) came from.
Of all these trends, I keep coming back to one technical and one societal that I see as critical. The removal of selective availability of GPS in May 2000 is my top technical change, as the cascading effect from it led to the deluge of good enough location data which is behind VGI and citizen science. On the societal side, it is the Flynn effect as a signifier of the educational shift in the past 50 years that explains how the ability to participate in scientific projects have increased.
In terms of the reciprocal contributions between the fields, I suggest the following:
GIScience can support citizen science by considering data quality assurance methods that are emerging in VGI, there are also plenty of Spatial Analysis methods that take into account heterogeneity and therefore useful for citizen science data. The areas of geovisualisation and human-computer interaction studies in GIS can assist in developing more effective and useful applications for citizen scientists and people who use their data. There is also plenty to do in considering semantics, ontologies, interoperability and standards. Finally, since critical GIScientists have been looking for a long time into the societal aspects of geographical technologies such as privacy, trust, inclusiveness, and empowerment, they have plenty to contribute to citizen science activities in how to do them in more participatory ways.
On the other hand, citizen science can contribute to GIScience, and especially VGI research, in several ways. First, citizen science can demonstrate longevity of VGI data sources with some projects going back hundreds of years. It provides challenging datasets in terms of their complexity, ontology, heterogeneity and size. It can bring questions about Scale and how to deal with large, medium and local activities, while merging them to a coherent dataset. It also provide opportunities for GIScientists to contribute to critical societal issues such as climate change adaptation or biodiversity loss. It provides some of the most interesting usability challenges such as tools for non-literate users, and finally, plenty of opportunities for interdisciplinary collaborations.
The slides from the talk are available below.
Notes from the second day of the BES/sfé annual meeting (see first day notes here)
Several talks in sessions that attracted my attention:
Daniel Richards (National University of Singapore) looked at cultural ecosystem services from social media sources. He mentioned previous study by Casalegno at al 2013 study on social media and ecosystem services . In Singapore they carry out a study for the few green spaces that are used for leisure and nature reserves – the rest of the place is famously highly urbanised. There are patches of coastal habitat that are important locally. The analysis looked at Flickr photos to reveal interest. There are 4 study sites, with 760 photos that were returned and of them 683 related to coastal habitat. They use classification of content, with 8 people analysing the photos. Analysis of Flickr showed different aspects – landscape in one site, and wildlife in another site. In one site there are research photos due to the way it is used locally. Looking closely to one coastal site, focal points in the route where people stopped to take a picture stood out, and landscape photos. All the photos follow the boardwalk in the area of Changi which is the only route. Simulation showed that after 70 photos they can get a good indication of the nature of the place, no need to look through all the images.
Barbara Smith explored the role of indigenous and local knowledge as part of a multiple evidence base for pollinator conservation. The context is India in agricultural area – looking at places where there is more extensive agriculture and less. The project aim is to record pollinators and then explore the impact of landscape and crop productivity . In this study, the starting point was the belief that traditional knowledge has a lot of value, and it is a knowledge that can be integrated with scientific information. She mentioned Tengo et al 2013 discussion paper in IPBES on the value of local knowledge, and also Sutherland et al 2014 paper in Oryx about the need to integrate indigenous knowledge in ecological assessment. The aim to collate knowledge of trends, they created a local peer-review process to validate local knowledge. Understanding factual data collection and separate it from inferences which are sometime wrong. They carry out small group discussions, in which they involved 5-7 farmers, in each of the 3 study area they had 3 groups. They asked questions that are evidence gathering (which crop you grow?) and also verification (how do you know?) they also ask opinion scoping (perceptions ) and then ‘why did you observed the change?’. In the discussions with the farmers they structured in around questions that can be explored together. After the first session, the created declarations – so ‘yields have fallen by 25%’ or crop yield declined because of the poor soil’ the statements were accepted or rejected through discussion with the farmers – local peer-review. Not all farmers can identify pollinators, and as the size goes down, there is less identification and also confusion about pests and pollinators. The farmers identified critical pollinators in their area and also suggestions on why the decline happen.
In the workshop on ‘Ecosystem assessments – concepts, tools and governance‘ there was various discussion on tools that are used for such purposes, but it became clear to me that GIS is playing a major role, and that many of the fundamental discussions in GIScience around the different types of modelling – from overlaying to process oriented modelling – can play a critical role in making sense of the way maps and GIS outputs travel through the decision making. It can be an interesting area to critically analysed – To what degree the theoretical and philosophical aspects of the modelling are taken into account in policy processes? The discussion in the workshop moved to issues of scientific uncertainty and communication with policy makers. The role of researchers in the process and the way they discuss uncertainty.
In the computational ecology session, Yoseph Araya presented a talk that was about the use of citizen science data, but instead he shared his experience and provide an interesting introduction to a researcher perspective on citizen science. He looked at the data that is coming from citizen science and the problem of getting good data. Citizen Science gaining attention – e.g. Ash die-back and other environmental issues are leading to attention. Citizens are bridging science, governance and participation. Citizen Science is needed for data at temporal, spatial and social scales and we should not forget that it is also about social capital, and of course fun and enjoyment. There is an increase in citizen science awareness in the literature. He is building on experience from many projects that he participated in include Evolution Megalab, world water monitoring day, floodplain meadows partnership, iSpot and OPAL, and CREW – Custodians of Rare and Endangered Windflowers (that’s a seriously impressive set of projects!). There are plenty of challenges – recruitment, motivation; costs and who pays; consideration of who run it; data validation and analysis and others. Data issues include data accuracy, completeness, reliability, precision and currency. He identified sources of errors – personnel, technical and statistical. The personal – skills, fitness and mistakes and others. Potential solutions – training with fully employed personnel, then also monitor individual and also run an online quiz. Technically, there is the option of designing protocols and statistically, it is possible to use recounts (15%), protocols that allow ‘no data’ and other methods.
The poster session included a poster from Valentine Seymour, about her work linking wellbeing and green volunteering
19 September, 2014
The Association of American Geographers is coordinating an effort to create an International Encyclopedia of Geography. Plans started in 2010, with an aim to see the 15 volumes project published in 2015 or 2016. Interestingly, this shows that publishers and scholars are still seeing the value in creating subject-specific encyclopedias. On the other hand, the weird decision by Wikipedians that Geographic Information Science doesn’t exist outside GIS, show that geographers need a place to define their practice by themselves. You can find more information about the AAG International Encyclopedia project in an interview with Doug Richardson from 2012.
As part of this effort, I was asked to write an entry on ‘Volunteered Geographic Information, Quality Assurance‘ as a short piece of about 3000 words. To do this, I have looked around for mechanisms that are used in VGI and in Citizen Science. This are covered in OpenStreetMap studies and similar work in GIScience, and in the area of citizen science, there are reviews such as the one by Andrea Wiggins and colleagues of mechanisms to ensure data quality in citizen science projects, which clearly demonstrated that projects are using multiple methods to ensure data quality.
Below you’ll find an abridged version of the entry (but still long). The citation for this entry will be:
Haklay, M., Forthcoming. Volunteered geographic information, quality assurance. in D. Richardson, N. Castree, M. Goodchild, W. Liu, A. Kobayashi, & R. Marston (Eds.) The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. Hoboken, NJ: Wiley/AAG
In the entry, I have identified 6 types of mechanisms that are used to ensure quality assurance when the data has a geographical component, either VGI or citizen science. If I have missed a type of quality assurance mechanism, please let me know!
Here is the entry:
Volunteered geographic information, quality assurance
Volunteered Geographic Information (VGI) originate outside the realm of professional data collection by scientists, surveyors and geographers. Quality assurance of such information is important for people who want to use it, as they need to identify if it is fit-for-purpose. Goodchild and Li (2012) identified three approaches for VGI quality assurance , ‘crowdsourcing‘ and that rely on the number of people that edited the information, ‘social’ approach that is based on gatekeepers and moderators, and ‘geographic’ approach which uses broader geographic knowledge to verify that the information fit into existing understanding of the natural world. In addition to the approaches that Goodchild and li identified, there are also ‘domain’ approach that relate to the understanding of the knowledge domain of the information, ‘instrumental observation’ that rely on technology, and ‘process oriented’ approach that brings VGI closer to industrialised procedures. First we need to understand the nature of VGI and the source of concern with quality assurance.
While the term volunteered geographic information (VGI) is relatively new (Goodchild 2007), the activities that this term described are not. Another relatively recent term, citizen science (Bonney 1996), which describes the participation of volunteers in collecting, analysing and sharing scientific information, provide the historical context. While the term is relatively new, the collection of accurate information by non-professional participants turn out to be an integral part of scientific activity since the 17th century and likely before (Bonney et al 2013). Therefore, when approaching the question of quality assurance of VGI, it is critical to see it within the wider context of scientific data collection and not to fall to the trap of novelty, and to consider that it is without precedent.
Yet, this integration need to take into account the insights that emerged within geographic information science (GIScience) research over the past decades. Within GIScience, it is the body of research on spatial data quality that provide the framing for VGI quality assurance. Van Oort’s (2006) comprehensive synthesis of various quality standards identifies the following elements of spatial data quality discussions:
- Lineage – description of the history of the dataset,
- Positional accuracy – how well the coordinate value of an object in the database relates to the reality on the ground.
- Attribute accuracy – as objects in a geographical database are represented not only by their geometrical shape but also by additional attributes.
- Logical consistency – the internal consistency of the dataset,
- Completeness – how many objects are expected to be found in the database but are missing as well as an assessment of excess data that should not be included.
- Usage, purpose and constraints – this is a fitness-for-purpose declaration that should help potential users in deciding how the data should be used.
- Temporal quality – this is a measure of the validity of changes in the database in relation to real-world changes and also the rate of updates.
While some of these quality elements might seem independent of a specific application, in reality they can be only be evaluated within a specific context of use. For example, when carrying out analysis of street-lighting in a specific part of town, the question of completeness become specific about the recording of all street-light objects within the bounds of the area of interest and if the data set includes does not include these features or if it is complete for another part of the settlement is irrelevant for the task at hand. The scrutiny of information quality within a specific application to ensure that it is good enough for the needs is termed ‘fitness for purpose’. As we shall see, fit-for-purpose is a central issue with respect to VGI.
To understand the reason that geographers are concerned with quality assurance of VGI, we need to recall the historical development of geographic information, and especially the historical context of geographic information systems (GIS) and GIScience development since the 1960s. For most of the 20th century, geographic information production became professionalised and institutionalised. The creation, organisation and distribution of geographic information was done by official bodies such as national mapping agencies or national geological bodies who were funded by the state. As a results, the production of geographic information became and industrial scientific process in which the aim is to produce a standardised product – commonly a map. Due to financial, skills and process limitations, products were engineered carefully so they can be used for multiple purposes. Thus, a topographic map can be used for navigation but also for urban planning and for many other purposes. Because the products were standardised, detailed specifications could be drawn, against which the quality elements can be tested and quality assurance procedures could be developed. This was the backdrop to the development of GIS, and to the conceptualisation of spatial data quality.
The practices of centralised, scientific and industrialised geographic information production lend themselves to quality assurance procedures that are deployed through organisational or professional structures, and explains the perceived challenges with VGI. Centralised practices also supported employing people with focus on quality assurance, such as going to the field with a map and testing that it complies with the specification that were used to create it. In contrast, most of the collection of VGI is done outside organisational frameworks. The people who contribute the data are not employees and seemingly cannot be put into training programmes, asked to follow quality assurance procedures, or expected to use standardised equipment that can be calibrated. The lack of coordination and top-down forms of production raise questions about ensuring the quality of the information that emerges from VGI.
To consider quality assurance within VGI require to understand some underlying principles that are common to VGI practices and differentiate it from organised and industrialised geographic information creation. For example, some VGI is collected under conditions of scarcity or abundance in terms of data sources, number of observations or the amount of data that is being used. As noted, the conceptualisation of geographic data collection before the emergence of VGI was one of scarcity where data is expensive and complex to collect. In contrast, many applications of VGI the situation is one of abundance. For example, in applications that are based on micro-volunteering, where the participant invest very little time in a fairly simple task, it is possible to give the same mapping task to several participants and statistically compare their independent outcomes as a way to ensure the quality of the data. Another form of considering abundance as a framework is in the development of software for data collection. While in previous eras, there will be inherently one application that was used for data capture and editing, in VGI there is a need to consider of multiple applications as different designs and workflows can appeal and be suitable for different groups of participants.
Another underlying principle of VGI is that since the people who collect the information are not remunerated or in contractual relationships with the organisation that coordinates data collection, a more complex relationships between the two sides are required, with consideration of incentives, motivations to contribute and the tools that will be used for data collection. Overall, VGI systems need to be understood as socio-technical systems in which the social aspect is as important as the technical part.
In addition, VGI is inherently heterogeneous. In large scale data collection activities such as the census of population, there is a clear attempt to capture all the information about the population over relatively short time and in every part of the country. In contrast, because of its distributed nature, VGI will vary across space and time, with some areas and times receiving more attention than others. An interesting example has been shown in temporal scales, where some citizen science activities exhibit ‘weekend bias’ as these are the days when volunteers are free to collect more information.
Because of the difference in the organisational settings of VGI, a different approaches to quality assurance is required, although as noted, in general such approaches have been used in many citizen science projects. Over the years, several approaches emerged and these include ‘crowdsourcing ‘, ‘social’, ‘geographic’, ‘domain’, ‘instrumental observation’ and ‘process oriented’. We now turn to describe each of these approaches.
The ‘crowdsourcing’ approach is building on the principle of abundance. Since there are is a large number of contributors, quality assurance can emerge from repeated verification by multiple participants. Even in projects where the participants actively collect data in uncoordinated way, such as the OpenStreetMap project, it has been shown that with enough participants actively collecting data in a given area, the quality of the data can be as good as authoritative sources. The limitation of this approach is when local knowledge or verification on the ground (‘ground truth’) is required. In such situations, the ‘crowdsourcing’ approach will work well in central, highly populated or popular sites where there are many visitors and therefore the probability that several of them will be involved in data collection rise. Even so, it is possible to encourage participants to record less popular places through a range of suitable incentives.
The ‘social’ approach is also building on the principle of abundance in terms of the number of participants, but with a more detailed understanding of their knowledge, skills and experience. In this approach, some participants are asked to monitor and verify the information that was collected by less experienced participants. The social method is well established in citizen science programmes such as bird watching, where some participants who are more experienced in identifying bird species help to verify observations by other participants. To deploy the social approach, there is a need for a structured organisations in which some members are recognised as more experienced, and are given the appropriate tools to check and approve information.
The ‘geographic’ approach uses known geographical knowledge to evaluate the validity of the information that is received by volunteers. For example, by using existing knowledge about the distribution of streams from a river, it is possible to assess if mapping that was contributed by volunteers of a new river is comprehensive or not. A variation of this approach is the use of recorded information, even if it is out-of-date, to verify the information by comparing how much of the information that is already known also appear in a VGI source. Geographic knowledge can be potentially encoded in software algorithms.
The ‘domain’ approach is an extension of the geographic one, and in addition to geographical knowledge uses a specific knowledge that is relevant to the domain in which information is collected. For example, in many citizen science projects that involved collecting biological observations, there will be some body of information about species distribution both spatially and temporally. Therefore, a new observation can be tested against this knowledge, again algorithmically, and help in ensuring that new observations are accurate.
The ‘instrumental observation’ approach remove some of the subjective aspects of data collection by a human that might made an error, and rely instead on the availability of equipment that the person is using. Because of the increased in availability of accurate-enough equipment, such as the various sensors that are integrated in smartphones, many people keep in their pockets mobile computers with ability to collect location, direction, imagery and sound. For example, images files that are captured in smartphones include in the file the GPS coordinates and time-stamp, which for a vast majority of people are beyond their ability to manipulate. Thus, the automatic instrumental recording of information provide evidence for the quality and accuracy of the information.
Finally, the ‘process oriented’ approach bring VGI closer to traditional industrial processes. Under this approach, the participants go through some training before collecting information, and the process of data collection or analysis is highly structured to ensure that the resulting information is of suitable quality. This can include provision of standardised equipment, online training or instruction sheets and a structured data recording process. For example, volunteers who participate in the US Community Collaborative Rain, Hail & Snow network (CoCoRaHS) receive standardised rain gauge, instructions on how to install it and an online resources to learn about data collection and reporting.
Importantly, these approach are not used in isolation and in any given project it is likely to see a combination of them in operation. Thus, an element of training and guidance to users can appear in a downloadable application that is distributed widely, and therefore the method that will be used in such a project will be a combination of the process oriented with the crowdsourcing approach. Another example is the OpenStreetMap project, which in the general do not follow limited guidance to volunteers in terms of information that they collect or the location in which they collect it. Yet, a subset of the information that is collected in OpenStreetMap database about wheelchair access is done through the highly structured process of the WheelMap application in which the participant is require to select one of four possible settings that indicate accessibility. Another subset of the information that is recorded for humanitarian efforts is following the social model in which the tasks are divided between volunteers using the Humanitarian OpenStreetMap Team (H.O.T) task manager, and the data that is collected is verified by more experienced participants.
The final, and critical point for quality assurance of VGI that was noted above is fitness-for-purpose. In some VGI activities the information has a direct and clear application, in which case it is possible to define specifications for the quality assurance element that were listed above. However, one of the core aspects that was noted above is the heterogeneity of the information that is collected by volunteers. Therefore, before using VGI for a specific application there is a need to check for its fitness for this specific use. While this is true for all geographic information, and even so called ‘authoritative’ data sources can suffer from hidden biases (e.g. luck of update of information in rural areas), the situation with VGI is that variability can change dramatically over short distances – so while the centre of a city will be mapped by many people, a deprived suburb near the centre will not be mapped and updated. There are also limitations that are caused by the instruments in use – for example, the GPS positional accuracy of the smartphones in use. Such aspects should also be taken into account, ensuring that the quality assurance is also fit-for-purpose.
References and Further Readings
Bonney, Rick. 1996. Citizen Science – a lab tradition, Living Bird, Autumn 1996.
Bonney, Rick, Shirk, Jennifer, Phillips, Tina B. 2013. Citizen Science, Encyclopaedia of science education. Berlin: Springer-Verlag.
Goodchild, Michael F. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4), 211–221.
Goodchild, Michael F., and Li, Linna. 2012, Assuring the quality of volunteered geographic information. Spatial Statistics, 1 110-120
Haklay, Mordechai. 2010. How Good is volunteered geographical information? a comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design, 37(4), 682–703.
Sui, Daniel, Elwood, Sarah and Goodchild, Michael F. (eds), 2013. Crowdsourcing Geographic Knowledge, Berlin:Springer-Verlag.
Van Oort, Pepjin .A.J. 2006. Spatial data quality: from description to application, PhD Thesis, Wageningen: Wageningen Universiteit, p. 125.
The 3 days of the Royal Geographical Society (with IBG) or RGS/IBG annual conference are always valuable, as they provide an opportunity to catch up with the current themes in (mostly human) Geography. While I spend most of my time in an engineering department, I also like to keep my ‘geographer identity’ up to date as this is the discipline that I feel most affiliated with.
Since last year’s announcement that the conference will focus on ‘Geographies of Co-Production‘ I was looking forward to it, as this topic relate many themes of my research work. Indeed, the conference was excellent – from the opening session to the last one that I attended (a discussion about the co-production of co-production).
Just before the conference, the participatory geographies research group run a training day, in which I run a workshop on participatory mapping. It was good to see the range of people that came to the workshop, many of them in early stages of their research career who want to use participatory methods in their research.
In the opening session on Tuesday’s night, Uma Kothari raised a very important point about the risk of institutions blaming the participants if a solution that was developed with them failed. There is a need to ensure that bodies like the World Bank or other funders don’t escape their responsibilities and support as a result of participatory approaches. Another excellent discussion came from Keri Facer who analysed the difficulties of interdisciplinary research based on her experience from the ‘connected communities‘ project. Noticing and negotiating the multiple dimensions of differences between research teams is critical for the co-production of knowledge.
By the end of this session, and as was demonstrated throughout the conference, it became clear that there are many different notions of ‘co-production of knowledge’ – sometime it is about two researchers working together, for others it is about working with policy makers or civil servants, and yet for another group it means to have an inclusive knowledge production with all people that can be impacted by a policy or research recommendation. Moreover, there was even a tension between the type of inclusiveness – should it be based on simple openness (‘if you want to participate, join’), or representation of people within the group, or should it be a active effort for inclusiveness? The fuzziness of the concept proved to be very useful as it led to many discussions about ‘what co-production means?’, as well as ‘what co-production does?’.
Two GIS education sessions were very good (see Patrick’s summery on the ExCiteS blog) and I found Nick Tate and Claire Jarvis discussion about the potential of virtual community of practice (CoP) for GIScience professionals especially interesting. An open question that was left at the end of the session was about the value of generic expertise (GIScience) or the way they are used in a specific area. In other words, do we need a CoP to share the way we use the tools and methods or is it about situated knowledge within a specific domain?
The Chair Early Career panel was, for me, the best session in the conference. Maria Escobar-Tello, Naomi Millner, Hilary Geoghegan and Saffron O’Neil discussed their experience in working with policy makers, participants, communities and universities. Maria explored the enjoyment of working at the speed of policy making in DEFRA, which also bring with it major challenges in formulating and doing research. Naomi discussed productive margins project which involved redesigning community engagement, and also noted what looks like very interesting reading: the e-book Problems of Participation: Reflections on Authority, Democracy, and the Struggle for Common Life. Hilary demonstrated how she has integrated her enthusiasm for enthusiasm into her work, while showing how knowledge is co-produced at the boundaries between amateurs and professionals, citizens and scientists. Hilary recommended another important resource – the review Towards co-production in research with communities (especially the diagram/table on page 9). Saffron completed the session with her work on climate change adaptation, and the co-production of knowledge with scientists and communities. Her research on community based climate change visualisation is noteworthy, and suggest ways of engaging people through photos that they take around their homes.
In another session which focused on mapping, the Connected Communities project appeared again, in the work of Chris Speed, Michelle Bastian & Alex Hale on participatory local food mapping in Liverpool and the lovely website that resulted from their project, Memories of Mr Seel’s Garden. It is interesting to see how methods travel across disciplines and to reflect what insights should be integrated in future work (while also resisting a feeling of ‘this is naive, you should have done this or that’!).
On the last day of the conference, the sessions on ‘the co-production of data based living‘ included lots to contemplate on. Rob Kitchin discussion and critique of smart-cities dashboards, highlighting that data is not-neutral, and that it is sometime used to decontextualised the city from its history and exclude non-quantified and sensed forms of knowledge (his new book ‘the data revolution’ is just out). Agnieszka Leszczynski continued to develop her exploration of the mediation qualities of techno-social-spatial interfaces leading to the experience of being at a place intermingled with the experience of the data that you consume and produce in it. Matt Wilson drawn parallel between the quantified self and the quantified city, suggesting the concept of ‘self-city-nation’ and the tensions between statements of collaboration and sharing within proprietary commercial systems that aim at extracting profit from these actions. Also interesting was Ewa Luger discussion of the meaning of ‘consent’ within the Internet of Things project ‘Hub of All Things‘ and the degree in which it is ignored by technology designers.
The highlight of the last day for me was the presentation by Rebecca Lave on ‘Critical Physical Geography‘. This is the idea that it is necessary to combine scientific understanding of hydrology and ecology with social theory. It is also useful in alerting geographers who are dealing with human geography to understand the physical conditions that influence life in specific places. This approach encourage people who are involved in research to ask questions about knowledge production, for example social justice aspects in access to models when corporations can have access to weather or flood models that are superior to what is available to the rest of society.
The co-production of knowledge isn’t entirely new and Wendy is quick to point out that themes like citizen science and participatory methods are well established within geography. “What we are now seeing is a sustained move towards the co-production of knowledge across our entire discipline.”
9 August, 2014
Today, OpenStreetMap celebrates 10 years of operation as counted from the date of registration. I’ve heard about the project when it was in early stages, mostly because I knew Steve Coast when I was studying for my Ph.D. at UCL. As a result, I was also able to secured the first ever research grant that focused on OpenStreetMap (and hence Volunteered Geographic Information – VGI) from the Royal Geographical Society in 2005. A lot can be said about being in the right place at the right time!
Having followed the project during this decade, there is much to reflect on – such as thinking about open research questions, things that the academic literature failed to notice about OSM or the things that we do know about OSM and VGI because of the openness of the project. However, as I was preparing the talk for the INSPIRE conference, I was starting to think about the start dates of OSM (2004), TomTom Map Share (2007), Waze (2008), Google Map Maker (2008). While there are conceptual and operational differences between these projects, in terms of ‘knowledge-based peer production systems’ they are fairly similar: all rely on large number of contributors, all use both large group of contributors who contribute little, and a much smaller group of committed contributors who do the more complex work, and all are about mapping. Yet, OSM started 3 years before these other crowdsourced mapping projects, and all of them have more contributors than OSM.
Since OSM is described as ‘Wikipedia of maps‘, the analogy that I was starting to think of was that it’s a bit like a parallel history, in which in 2001, as Wikipedia starts, Encarta and Britannica look at the upstart and set up their own crowdsourcing operations so within 3 years they are up and running. By 2011, Wikipedia continues as a copyright free encyclopedia with sizable community, but Encarta and Britannica have more contributors and more visibility.
Knowing OSM closely, I felt that this is not a fair analogy. While there are some organisational and contribution practices that can be used to claim that ‘it’s the fault of the licence’ or ‘it’s because of the project’s culture’ and therefore justify this, not flattering, analogy to OSM, I sensed that there is something else that should be used to explain what is going on.
Then, during my holiday in Italy, I was enjoying the offline TripAdvisor app for Florence, using OSM for navigation (in contrast to Google Maps which are used in the online app) and an answer emerged. Within OSM community, from the start, there was some tension between the ‘map’ and ‘database’ view of the project. Is it about collecting the data so beautiful maps or is it about building a database that can be used for many applications?
Saying that OSM is about the map mean that the analogy is correct, as it is very similar to Wikipedia – you want to share knowledge, you put it online with a system that allow you to display it quickly with tools that support easy editing the information sharing. If, on the other hand, OSM is about a database, then OSM is about something that is used at the back-end of other applications, a lot like DBMS or Operating System. Although there are tools that help you to do things easily and quickly and check the information that you’ve entered (e.g. displaying the information as a map), the main goal is the building of the back-end.
Maybe a better analogy is to think of OSM as ‘Linux of maps’, which mean that it is an infrastructure project which is expected to have a lot of visibility among the professionals who need it (system managers in the case of Linux, GIS/Geoweb developers for OSM), with a strong community that support and contribute to it. The same way that some tech-savvy people know about Linux, but most people don’t, I suspect that TripAdvisor offline users don’t notice that they use OSM, they are just happy to have a map.
The problem with the Linux analogy is that OSM is more than software – it is indeed a database of information about geography from all over the world (and therefore the Wikipedia analogy has its place). Therefore, it is somewhere in between. In a way, it provide a demonstration for the common claim in GIS circles that ‘spatial is special‘. Geographical information is infrastructure in the same way that operating systems or DBMS are, but in this case it’s not enough to create an empty shell that can be filled-in for the specific instance, but there is a need for a significant amount of base information before you are able to start building your own application with additional information. This is also the philosophical difference that make the licensing issues more complex!
In short, both Linux or Wikipedia analogies are inadequate to capture what OSM is. It has been illuminating and fascinating to follow the project over its first decade, and may it continue successfully for more decades to come.
12 July, 2014
The Vespucci initiative has been running for over a decade, bringing together participants from wide range of academic backgrounds and experiences to explore, in a ‘slow learning’ way, various aspects of geographic information science research. The Vespucci Summer Institutes are week long summer schools, most frequently held at Fiesole, a small town overlooking Florence. This year, the focus of the first summer institute was on crowdsourced geographic information and citizen science.
The workshop was supported by COST ENERGIC (a network that links researchers in the area of crowdsourced geographic information, funded by the EU research programme), the EU Joint Research Centre (JRC), Esri and our Extreme Citizen Science research group. The summer school included about 30 participants and facilitators that ranged from master students students that are about to start their PhD studies, to established professors who came to learn and share knowledge. This is a common feature of Vespucci Institute, and the funding from the COST network allowed more early career researchers to participate.
Apart from the pleasant surrounding, Vespucci Institutes are characterised by the relaxed, yet detailed discussions that can be carried over long lunches and coffee breaks, as well as team work in small groups on a task that each group present at the end of the week. Moreover, the programme is very flexible so changes and adaptation to the requests of the participants and responding to the general progression of the learning are part of the process.
This is the second time that I am participating in Vespucci Institutes as a facilitator, and in both cases it was clear that participants take the goals of the institute seriously, and make the most of the opportunities to learn about the topics that are explored, explore issues in depth with the facilitators, and work with their groups beyond the timetable.
The topics that were covered in the school were designed to provide an holistic overview of geographical crowdsourcing or citizen science projects, especially in the area where these two types of activities meet. This can be when a group of citizens want to collect and analyse data about local environmental concerns, or oceanographers want to work with divers to record water temperature, or when details that are emerging from social media are used to understand cultural differences in the understanding of border areas. These are all examples that were suggested by participants from projects that they are involved in. In addition, citizen participation in flood monitoring and water catchment management, sharing information about local food and exploring data quality of spatial information that can be used by wheelchair users also came up in the discussion. The crossover between the two areas provided a common ground for the participants to explore issues that are relevant to their research interests.
The holistic aspect that was mentioned before was a major goal for the school – so to consider the tools that are used to collect information, engaging and working with the participants, managing the data that is provided by the participants and ensuring that it is useful for other purposes. To start the process, after introducing the topics of citizen science and volunteered geographic information (VGI), the participants learned about data collection activities, including noise mapping, OpenStreetMap contribution, bird watching and balloon and kite mapping. As can be expected, the balloon mapping raised a lot of interest and excitement, and this exercise in local mapping was linked to OpenStreetMap later in the week.
The experience with data collection provided the context for discussions about data management and interoperability and design aspects of citizen science applications, as well as more detailed presentations from the participants about their work and research interests. With all these details, the participants were ready to work on their group task: to suggest a research proposal in the area of VGI or Citizen Science. Each group of 5 participants explored the issues that they agreed on – 2 groups focused on a citizen science projects, another 2 focused on data management and sustainability and finally another group explored the area of perception mapping and more social science oriented project.
Some of the most interesting discussions were initiated at the request of the participants, such as the exploration of ethical aspects of crowdsourcing and citizen science. This is possible because of the flexibility in the programme.
Now that the institute is over, it is time to build on the connections that started during the wonderful week in Fiesole, and see how the network of Vespucci alumni develop the ideas that emerged this week.