A new book has just been published about OpenStreetMap and Geographic Information Science. The book, which was edited by Jamal Jokar Arsanjani, Alexander Zipf, Peter Mooney, Marco Helbich is “OpenStreetMap in GISciene : Experiences, Research, and applications” contains 16 chapters on different aspects of OpenStreetMap in GIScience including 1) Data Management and Quality, 2) Social Context, 3) Network Modeling and Routing, and 4) Land Management and Urban Form.
Thanks to invitations from UNIGIS and Edinburgh Earth Observatory / AGI Scotland, I had an opportunity to reflect on how Geographic Information Science (GIScience) can contribute to citizen science, and what citizen science can contribute to GIScience.
Despite the fact that it’s 8 years since the term Volunteers Geographic Information (VGI) was coined, I didn’t assume that all the audience is aware of how it came about or the range of sources of VGI. I also didn’t assume knowledge of citizen science, which is far less familiar term for a GIScience audience. Therefore, before going into a discussion about the relationship between the two areas, I opened with a short introduction to both, starting with VGI, and then moving to citizen science. After introduction to the two areas, I’m suggesting the relationships between them – there are types of citizen science that are overlapping VGI – biological recording and environmental observations, as well as community (or civic) science, while other types, such as volunteer thinking includes many projects that are non-geographical (think EyeWire or Galaxy Zoo).
However, I don’t just list a catalogue of VGI and citizen science activities. Personally, I found trends a useful way to make sense of what happen. I’ve learned that from the writing of Thomas Friedman, who used it in several of his books to help the reader understand where the changes that he covers came from. Trends are, of course, speculative, as it is very difficult to demonstrate causality or to be certain about the contribution of each trends to the end result. With these caveats in mind, there are several technological and societal trends that I used in the talk to explain how VGI (and the VGI element of citizen science) came from.
Of all these trends, I keep coming back to one technical and one societal that I see as critical. The removal of selective availability of GPS in May 2000 is my top technical change, as the cascading effect from it led to the deluge of good enough location data which is behind VGI and citizen science. On the societal side, it is the Flynn effect as a signifier of the educational shift in the past 50 years that explains how the ability to participate in scientific projects have increased.
In terms of the reciprocal contributions between the fields, I suggest the following:
GIScience can support citizen science by considering data quality assurance methods that are emerging in VGI, there are also plenty of Spatial Analysis methods that take into account heterogeneity and therefore useful for citizen science data. The areas of geovisualisation and human-computer interaction studies in GIS can assist in developing more effective and useful applications for citizen scientists and people who use their data. There is also plenty to do in considering semantics, ontologies, interoperability and standards. Finally, since critical GIScientists have been looking for a long time into the societal aspects of geographical technologies such as privacy, trust, inclusiveness, and empowerment, they have plenty to contribute to citizen science activities in how to do them in more participatory ways.
On the other hand, citizen science can contribute to GIScience, and especially VGI research, in several ways. First, citizen science can demonstrate longevity of VGI data sources with some projects going back hundreds of years. It provides challenging datasets in terms of their complexity, ontology, heterogeneity and size. It can bring questions about Scale and how to deal with large, medium and local activities, while merging them to a coherent dataset. It also provide opportunities for GIScientists to contribute to critical societal issues such as climate change adaptation or biodiversity loss. It provides some of the most interesting usability challenges such as tools for non-literate users, and finally, plenty of opportunities for interdisciplinary collaborations.
The slides from the talk are available below.
As far as I can tell, Nelson et al. (2006) ‘Towards development of a high quality public domain global roads database‘ and Taylor & Caquard (2006) Cybercartography: Maps and Mapping in the Information Era are the first peer-reviewed papers that mention OpenStreetMap. Since then, OpenStreetMap has received plenty of academic attention. More ‘conservative’ search engines such as ScienceDirect or Scopus find 286 and 236 peer reviewed papers (respectively) that mention the project. The ACM digital library finds 461 papers in the areas that are relevant to computing and electronics, while Microsoft Academic Research finds only 112. Google Scholar lists over 9000 (!). Even with the most conservative version from Microsoft, we can see an impact on fields ranging from social science to engineering and physics. So lots to be proud of as a major contribution to knowledge beyond producing maps.
Michael Goodchild, in his 2007 paper that started the research into Volunteered Geographic Information (VGI), mentioned OpenStreetMap (OSM), and since then there is a lot of conflation of OSM and VGI. In some recent papers you can find statements such as ‘OpenstreetMap is considered as one of the most successful and popular VGI projects‘ or ‘the most prominent VGI project OpenStreetMap‘ so, at some level, the boundary between the two is being blurred. I’m part of the problem – for example, with the title of my 2010 paper ‘How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets‘. However, the more I think about it, the more uncomfortable I am with this equivalence. I feel that the recent line from Neis & Zielstra (2014) is more accurate: ‘One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM)‘. I’ll explain why.
Let’s look at the whole area of OpenStreetMap studies. Over the past decade, several types of research paper have emerged.
First, there is a whole set of research projects that use OSM data because it’s easy to use and free to access (in computer vision or even string theory). These studies are not part of ‘OSM studies’ or VGI, as, for them, this is just data to be used.
Third, there are studies that also look at the interactions between the contribution and the data – for example, in trying to infer trustworthiness.
[Unfortunately, due to academic practices and publication outlets, many of these papers are locked behind paywalls, but thatis another issue… ]
In short, there is a significant body of knowledge regarding the nature of the project, the implications of what it produces, and ways to understand the information that emerges from it. Clearly, we now know that OSM produces good data and are ware of the patterns of contribution. What is also clear is that many of these patterns are specific to OSM. Because of the importance of OSM to so many application areas (including illustrative maps in string theory!) these insights are very important. Some of these insights are expected to also be present in other VGI projects (hence my suggestions for assertions about VGI) but this needs to be done carefully, only when there is evidence from other projects that this is the case. In short, we should avoid conflating VGI and OSM.
Opening geodata is an interesting issue for INSPIRE directive. INSPIRE was set before the hype of Government 2.0 was growing and pressure on opening data became apparent, so it was not designed with these aspects in mind explicitly. Therefore the way in which the organisations that are implementing INSPIRE are dealing with the provision of open and linked data is bound to bring up interesting challenges.
Dealing with open and linked data was the topic that I followed on the second day of INSPIRE 2014 conference. The notes below are my interpretation of some of the talks.
Tina Svan Colding discussed the Danish attempt to estimate the value (mostly economically) of open geographic data. The study was done in collaboration with Deloitte, and they started with a change theory – expectations that they will see increase demands from existing customers and new ones. The next assumption is that there will be new products, companies and lower prices and then that will lead to efficiency and better decision making across the public and private sector, but also increase transparency to citizens. In short, trying to capture the monetary value with a bit on the side. They have used statistics, interviews with key people in the public and private sector and follow that with a wider survey – all with existing users of data. The number of users of their data increased from 800 users to over 10,000 within a year. The Danish system require users to register to get the data, so this are balk numbers, but they could also contacted them to ask further questions. The new users – many are citizens (66%) and NGO (3%). There are further 6% in the public sector which had access in principle in the past but the accessibility to the data made it more usable to new people in this sector. In the private sector, construction, utilities and many other companies are using the data. The environmental bodies are aiming to use data in new ways to make environmental consultation more engaging to audience (is this is another Deficit Model? assumption that people don’t engage because it’s difficult to access data?). Issues that people experienced are accessibility to users who don’t know that they need to use GIS and other datasets. They also identified requests for further data release. In the public sector, 80% identified potential for saving with the data (though that is the type of expectation that they live within!).
Panagiotis Tziachris was exploring the clash between ‘heavy duty’ and complex INSPIRE standards and the usual light weight approaches that are common in Open Data portal (I think that he intended in the commercial sector that allow some reuse of data). This is a project of 13 Mediterranean regions in Spain, Italy, Slovenia, Montenegro, Greece, Cyprus and Malta. The HOMER project (website http://homerproject.eu/) used different mechanism, including using hackathons to share knowledge and experience between more experienced players and those that are new to the area. They found them to be a good way to share practical knowledge between partners. This is an interesting side of purposeful hackathon within a known people in a project and I think that it can be useful for other cases. Interestingly, from the legal side, they had to go beyond the usual documents that are provided in an EU consortium, and in order to allow partners to share information they created a memorandum of understanding for the partners as this is needed to deal with IP and similar issues. Also practices of open data – such as CKAN API which is a common one for open data websites were used. They noticed separation between central administration and local or regional administration – the competency of the more local organisations (municipality or region) is sometimes limited because knowledge is elsewhere (in central government) or they are in different stages of implementation and disagreements on releasing the data can arise. Antoehr issue is that open data is sometime provided at regional portals while another organisation at the national level (environment ministry or cadastre body) is the responsible to INSPIRE. The lack of capabilities at different governmental levels is adding to the challenges of setting open data systems. Sometime Open Data legislation are only about the final stage of the process and not abour how to get there, while INPIRE is all about the preparation, and not about the release of data – this also creates mismatching.
Adam Iwaniak discussed how “over-engineering” make the INSPIRE directive inoperable or relevant to users, on the basis of his experience in Poland. He asks “what are the user needs?” and demonstrated it by pointing that after half term of teaching students about the importance of metadata, when it came to actively searching for metadata in an assignment, the students didn’t used any of the specialist portals but just Google. Based on this and similar experiences, he suggested the creation of a thesaurus that describe keywords and features in the products so it allows searching according to user needs. Of course, the implementation is more complex and therefore he suggests an approach that is working within the semantic web and use RDF definitions. By making the data searchable and index-able in search engines so they can be found. The core message was to adapt the delivery of information to the way the user is most likely to search it – so metadata is relevant when the producer make sure that a search in Google find it.
Jesus Estrada Vilegas from the SmartOpenData project http://www.smartopendata.eu/ discussed the implementation of some ideas that can work within INSPIRE context while providing open data. In particular, he discussed a Spanish and Portuguese data sharing. Within the project, they are providing access to the data by harmonizing the data and then making it linked data. Not all the data is open, and the focus of their pilot is in agroforestry land management. They are testing delivery of the data in both INSPIRE compliant formats and the internal organisation format to see which is more efficient and useful. INSPIRE is a good point to start developing linked data, but there is also a need to compare it to other ways of linked the data
Massimo Zotti talked about linked open data from earth observations in the context of business activities, since he’s working in a company that provide software for data portals. He explored the business model of open data, INSPIRE and the Copernicus programme. From the data that come from earth observation, we can turn it into information – for example, identifying the part of the soil that get sealed and doesn’t allow the water to be absorbed, or information about forest fires or floods etc. These are the bits of useful information that are needed for decision making. Once there is the information, it is possible to identify increase in land use or other aspects that can inform policy. However, we need to notice that when dealing with open data mean that a lot of work is put into bringing datasets together. The standarisation of data transfer and development of approaches that helps in machine-to-machine analysis are important for this aim. By fusing data they are becoming more useful and relevant to knowledge production process. A dashboard approach to display the information and the processing can help end users to access the linked data ‘cloud’. Standarisation of data is very important to facilitate such automatic analysis, and also having standard ontologies is necessary. From my view, this is not a business model, but a typical one to the operations in the earth observations area where there is a lot of energy spend on justification that it can be useful and important to decision making – but lacking quantification of the effort that is required to go through the process and also the speed in which these can be achieved (will the answer come in time for the decision?). A member of the audience also raised the point that assumption of machine to machine automatic models that will produce valuable information all by themselves is questionable.
Maria Jose Vale talked about the Portuguese experience in delivering open data. The organisation that she works in deal with cadastre and land use information. She was also discussing on activities of the SmartOpenData project. She describe the principles of open data that they considered which are: data must be complete, primary, timely, accessible, processable; data formats must be well known, should be permanence and addressing properly usage costs. For good governance need to know the quality of the data and the reliability of delivery over time. So to have automatic ways for the data that will propagate to users is within these principles. The benefits of open data that she identified are mostly technical but also the economic values (and are mentioned many times – but you need evidence similar to the Danish case to prove it!). The issues or challenges of open data is how to deal with fuzzy data when releasing (my view: tell people that it need cleaning), safety is also important as there are both national and personal issues, financial sustainability for the producers of the data, rates of updates and addressing user and government needs properly. In a case study that she described, they looked at land use and land cover changes to assess changes in river use in a river watershed. They needed about 15 datasets for the analysis, and have used different information from CORINE land cover from different years. For example, they have seen change from forest that change to woodland because of fire. It does influence water quality too. Data interoperability and linking data allow the integrated modelling of the evolution of the watershed.
Francisco Lopez-Pelicer covered the Spanish experience and the PlanetData project http://www.planet-data.eu/ which look at large scale public data management. Specifically looking in a pilot on VGI and Linked data with a background on SDI and INSPIRE. There is big potential, but many GI producers don’t do it yet. The issue is legacy GIS approaches such as WMS and WFS which are standards that are endorsed in INSPIRE, but not necessarily fit into linked data framework. In the work that he was involved in, they try to address complex GI problem with linked data . To do that, they try to convert WMS to a linked data server and do that by adding URI and POST/PUT/DELETE resources. The semantic client see this as a linked data server even through it can be compliant with other standards. To try it they use the open national map as authoritative source and OpenStreetMap as VGI source and release them as linked data. They are exploring how to convert large authoritative GI dataset into linked data and also link it to other sources. They are also using it as an experiment in crowdsourcing platform development – creating a tool that help to assess the quality of each data set. The aim is to do quality experiments and measure data quality trade-offs associated with use of authoritative or crowdsourced information. Their service can behave as both WMS and “Linked Map Server”. The LinkedMap, which is the name of this service, provide the ability to edit the data and explore OpenStreetMap and thegovernment data – they aim to run the experiment in the summer so this can be found at http://linkedmap.unizar.es/. The reason to choose WMS as a delivery standard is due to previous crawl over the web which showed that WMS is the most widely available service, so it assumed to be relevant to users or one that most users can capture.
Paul van Genuchten talked about the GeoCat experience in a range of projects which include support to Environment Canada and other activities. INSPIRE meeting open data can be a clash of cultures and he was highlighting neogeography as the term that he use to describe the open data culture (going back to the neogeo and paleogeo debate which I thought is over and done – but clearly it is relevant in this context). INSPIRE recommend to publish data open and this is important to ensure that it get big potential audience, as well as ‘innovation energy’ that exist among the ‘neogeo’/’open data’ people. The common things within this culture are expectations that APIs are easy to use, clean interfaces etc. But under the hood there are similarities in the way things work. There is a perceived complexity by the community of open data users towards INSPIRE datasets. Many of Open Data people are focused and interested in OpenStreetMap, and also look at companies such as MapBox as a role model, but also formats such as GeoJSON and TopoJSON. Data is versions and managed in git like process. The projection that is very common is web mercator. There are now not only raster tiles, but also vector tiles. So these characteristics of the audience can be used by data providers to provide help in using their data, but also there are intermediaries that deliver the data and convert it to more ‘digestible’ forms. He noted CitySDK by Waag.org which they grab from INSPIRE and then deliver it to users in ways that suite open data practices.He demonstrated the case of Environment Canada where they created a set of files that are suitable for human and machine use.
Ed Parsons finished the set of talks of the day (talk link goo.gl/9uOy5N) , with a talk about multi-channel approach to maximise the benefits of INSPIRE. He highlighted that it’s not about linked data, although linked data it is part of the solution to make data accessibility. Accessibility always wins online – and people make compromises (e.g. sound quality in CD and Spotify). Google Earth can be seen as a new channel that make things accessible, and while the back-end is not new in technology the ease of access made a big difference. The example of Denmark use of minecraft to release GI is an example of another channel. Notice the change over the past 10 years in video delivery, for example, so the early days of the video delivery was complex and require many steps and expensive software and infrastructure, and this is somewhat comparable to current practice within geographic information. Making things accessible through channels like YouTube and the whole ability around it changed the way video is used, uploaded and consumed, and of course changes in devices (e.g. recording on the phone) made it even easier. Focusing on the aspects of maps themselves, people might want different things that are maps and not only the latest searchable map that Google provide – e.g. the administrative map of medieval Denmark, or maps of flood, or something that is specific and not part of general web mapping. In some cases people that are searching for something and you want to give them maps for some queries, and sometime images (as in searching Yosemite trails vs. Yosemite). There are plenty of maps that people find useful, and for that Google now promoting Google Maps Gallery – with tools to upload, manage and display maps. It is also important to consider that mapping information need to be accessible to people who are using mobile devices. The web infrastructure of Google (or ArcGIS Online) provide the scalability to deal with many users and the ability to deliver to different platforms such as mobile. The gallery allows people to brand their maps. Google want to identify authoritative data that comes from official bodies, and then to have additional information that is displayed differently. But separation of facts and authoritative information from commentary is difficult and that where semantics play an important role. He also noted that Google Maps Engine is just maps – just a visual representation without an aim to provide GIS analysis tools.
The Journal of Spatial Information Science (JOSIS) is a new open access journal in GIScience, edited by Matt Duckham, Jörg-Rüdiger Sack, and Michael Worboys. In addition, the journal adopted an open peer review process, so readers are invited to comment on a paper while it goes through the formal peer review process. So this seem to be the most natural outlet for a new paper that analyses the completeness of OpenStreetMap over 18 months – March 2008 to October 2009. The paper was written in collaboration with Claire Ellul. The abstract of the paper provided below, and you are very welcome to comment on the paper on JOSIS forum that is dedicated to it, where you can also download it.
Abstract: The ability of lay people to collect and share geographical information has increased markedly over the past 5 years as results of the maturation of web and location technologies. This ability has led to a rapid growth in Volunteered Geographical Information (VGI) applications. One of the leading examples of this phenomenon is the OpenStreetMap project, which started in the summer of 2004 in London, England. This paper reports on the development of the project over the period March 2008 to October 2009 by focusing on the completeness of coverage in England. The methodology that is used to evaluate the completeness is comparison of the OpenStreetMap dataset to the Ordnance Survey dataset Meridian 2. The analysis evaluates the coverage in terms of physical coverage (how much area is covered), followed by estimation of the percentage of England population which is covered by completed OpenStreetMap data and finally by using the Index of Deprivation 2007 to gauge socio-economic aspects of OpenStreetMap activity. The analysis shows that within 5 years of project initiation, OpenStreetMap already covers 65% of the area of England, although when details such as street names are taken into consideration, the coverage is closer to 25%. Significantly, this 25% of England’s area covers 45% of its population. There is also a clear bias in data collection practices – more affluent areas and urban locations are better covered than deprived or rural locations. The implications of these outcomes to studies of volunteered geographical information are discussed towards the end of the paper.
This is call for papers for a workshop on methods and research techniques that are suitable for geospatial technologies. The workshop is planned for the day before GISRUK 2010, and we are aware of the clashes with the AAG 2010 annual meeting, CHI 2010 and the Ergonomics Society Annual Conference. However, if you would like to contribute to the book that the commission is developing but can’t attend the workshop, please send an abstract and inform us that you can’t attend.
In the near future I’ll publish information about another workshop in March 2010 about the usability and Human-Computer Interaction aspects of geographical information itself – see the report from the Ordnance Survey workshop earlier in 2009.
So here is the full call:
Workshop on Methods and Techniques of Use, User and Usability Research in Geo-information Processing and Dissemination
Tuesday 13 April 2010 at University College London
The Commission on Use and User Issues of the International Cartographic Association (ICA) is currently working on a new handbook specifically addressing the application of user research methods and techniques in the geodomain.
In order to share experiences and interesting case studies a workshop is organized by the Commission, in collaboration with UCL, on the day preceding GISRUK 2010.
CALL FOR PAPERS
While there is growing awareness within the research community on the need to develop usability engineering and use and user research methods that are suitable for geographical and spatial information and systems, to date there is a lack of organized and documented experience in this area.
We therefore invite researchers with recent experience with use, user and usability research in the broad geodomain (cartography, GIS, geovisualization, Location Based Services, geographical information, GeoWeb etc.) to present a paper specifically focusing on the research methods and techniques applied, with an aim to develop the body of knowledge for the domain.
To participate, please send an abstract of 1 page A4 at maximum containing:
- A description of the research method(s) and technique(s) applied
- A short description of the case in which they have been applied
- The overall research framework
- Contact details and affiliation of the author(s)
We are also encouraging PhD researchers to submit paper proposals and share experiences from their research. At the workshop there will be ample time for discussing the application of user research methods and techniques. Good papers may be the basis for contributions to the handbook that is planned for publication in 2011.
Abstracts should be submitted on or before 1 December 2009 to the Chairman of the Commission Corné van Elzakker ( firstname.lastname@example.org )
the website of the ICA Commission on Use and User Issues and the GISRUK2010 website
Trying to track down the source of a term is one of the more interesting academic tasks. For example, finding out when people started researching Human-Computer Interaction and GIS is a bit like following the thread. First of all, the term Human-Computer Interaction is sometimes presented as Computer-Human Interaction, especially in the early 1980s, when it emerged – the ACM Special Interest Group still uses CHI and not HCI. Before that, the common term used was Man-Machine Interaction which was actually a term that came out of studies in the 1940s. The way to uncover this terminology chain is to find papers that mention both terms and follow it through. Quite quickly you develop an understanding of the chain…
Then there is the issue of GIS – after all, the term was invented only around the mid 1960s: surely many people outside the small circle of researchers that became familiar with the term used other terminology. So you need to look for other terms, such as geographic information (as well as geographical information), maps, etc.
Following this approach, I have found a paper from 1963 by Malcolm Pivar, Ed Fredkin and Henry Stommel about ‘Computer-Compiled Oceanographic Atlas: an Experiment in Man-Machine Interaction’. The paper is as interesting as its writers – with Pivar and Fredkin among the Artificial Intelligence group at MIT, and Stommel a leading oceanographer. The data came from surveys that were part of the International Geophysical Year (1957/8 ) – and the paper shows that information overload is nothing new.
For me, the most interesting passage in the paper is:
‘[I]n preparing a printed atlas certain irrevocable choices of scale, of map projections, of contour interval, and of type of map (shall we plot temperature at standard depths, or on density surfaces, etc.?) must be made from the vast infinitude of all possible mappings. An atlas-like representation, generated by digital computer and displayed upon a cathode-ray screen, enables the oceanographer to modify these choices at will. Only a high-speed computer has the capacity and speed to follow the quickly shifting demands and questions of a human mind exploring a large field of numbers. The ideal computer-compiled oceanographic atlas will be immediately responsive to any demand of the user, and will provide the precise detailed information requested without any extraneous information. The user will be able to interrogate the display to evoke further information; it will help him track down errors and will offer alternative forms of presentation. Thus, the display on the screen is not a static one; instead, it embodies animation as varying presentations are scanned. In a very real sense, the user “converses” with the machine about the stored data.’ (Pivar et al., 1963, p. 396)
What an amazing vision in 1963 – it would take another 30 years and even more before what they are describing became a reality!