At the State of the Map (EU) 2011 conference that was held in Vienna from 15-17 July, I gave a keynote talk on the relationships between the OpenStreetMap (OSM) community and the GIScience research community. Of course, the relationships are especially important for those researchers who are working on volunteered Geographic Information (VGI), due to the major role of OSM in this area of research.
The talk included an overview of what researchers have discovered about OpenStreetMap over the 5 years since we started to pay attention to OSM. One striking result is that the issue of positional accuracy does not require much more work by researchers. Another important outcome of the research is to understand that quality is impacted by the number of mappers, or that the data can be used with confidence for mainstream geographical applications when some conditions are met. These results are both useful, and of interest to a wide range of groups, but there remain key areas that require further research – for example, specific facets of quality, community characteristics and how the OSM data is used.
Reflecting on the body of research, we can start to form a ‘code of engagement’ for both academics and mappers who are engaged in researching or using OpenStreetMap. One such guideline would be that it is both prudent and productive for any researcher do some mapping herself, and understand the process of creating OSM data, if the research is to be relevant and accurate. Other aspects of the proposed ‘code’ are covered in the presentation.
12 May, 2011
GIS Research UK (GISRUK) is a long running conference series, and the 2011 instalment was hosted by the University of Portsmouth at the end of April.
During the conference, I was asked to give a keynote talk about Participatory GIS. I decided to cover the background of Participatory GIS in the mid-1990s, and the transition to more advanced Web Mapping applications from the mid-2000s. Of special importance are the systems that allow user-generated content, and the geographical types of systems that are now leading to the generation of Volunteer Geographic Information (VGI).
The next part of the talk focused on Citizen Science, culminating with the ideas that are the basis for Extreme Citizen Science.
Interestingly, as in previous presentations, one of the common questions about Citizen Science came up. Professional scientists seem to have a problem with the suggestion that citizens are as capable as scientists in data collection and analysis. While there is an acceptance about the concept, the idea that participants can suggest problems, collect data rigorously and analyse it seems to be too radical – or worrying.
What is important to understand is that the ideas of Extreme Citizen Science are not about replacing the role of scientists, but are a call to rethink the role of the participants and the scientists in cases where Citizen Science is used. It is a way to consider science as a collaborative process of learning and exploration of issues. My own experience is that participants have a lot of respect for the knowledge of the scientists, as long as the scientists have a lot of respect for the knowledge and ability of the participants. The participants would like to learn more about the topic that they are exploring and are keen to know: ‘what does the data that I collected mean?’ At the same time, some of the participants can become very serious in terms of data collection, reading about the specific issues and using the resources that are available online today to learn more. At some point, they are becoming knowledgeable participants and it is worth seeing them as such.
The slides below were used for this talk, and include links to the relevant literature.
18 January, 2011
Yesterday, for the first time, I came across the phrase ‘GIS Systems’ in an academic paper, written by geographers (not GIS experts). I have also noticed that the term is being used more often in recent times when people talk about packages such as ArcGIS or Mapinfo.
On the face of it, talking about a ‘GIS System’ is ridiculous – how can you say ‘geographic information system system’? However, people have a reason for using this phrase and it makes some sense to them.
Maybe the reason is that GIS now stands for a class or type of computer software that can manage, manipulate and visualise geographic information, so GIS system is the specific hardware and software that is used. Personally, I’ll continue to find it odd and use GIS for what it is…
10 July, 2010
The slides below are from my presentation in State of the Map 2010 in Girona, Spain. While the conference is about OpenStreetMap, the presentation covers a range of spatially implicint and explicit crowdsourcing projects and also activities that we carried out in Mapping for Change, which all show that unlike other crowdsourcing activities, geography (and places) are both limiting and motivating contribution to them.
In many ways, OpenStreetMap is similar to other open source and open knowledge projects, such as Wikipedia. These similarities include the patterns of contribution and the importance of participation inequalities, in which a small group of participants contribute very significantly, while a very large group of occasional participants contribute only occasionally; the general demographic of participants, with strong representation from educated young males; or the temporal patterns of engagements, in which some participants go through a peak of activity and lose interest, while a small group joins and continues to invest its time and effort to help the progress of the project. These aspects have been identified by researchers who explored volunteering and leisure activities, and crowdsourcing as well as those who explored commons-based peer production networks (Benkler & Nissenbaum 2006).
However, OpenStreetMap is a project about geography, and deals with the shape of features and information about places on the face of the Earth. Thus, the emerging question is ‘what influence does geography have on OSM?’ Does geography make some fundamental changes to the basic principles of crowdsourcing, or should OSM be treated as ‘wikipedia for maps’?
In the presentation, which is based on my work, as well as the work of Vyron Antoniou and Nama Budhathoki, we argue that geography is playing a ‘tyrannical’ role in OSM and other projects that are based on crowdsourced geographical information and shapes the nature of the project beyond what is usually accepted.
The first influence of geography is on motivation. A survey of OSM participants shows that specific geographical knowledge, which a participant acquired at first hand, and the wish to use this knowledge and see it mapped well is an important factor in participation in the project. We found that participants are driven to mapping activities by their desire to represent the places they care about and fix the errors on the map. Both of these motives require local knowledge.
A second influence is on the accuracy and completeness of coverage, with places that are highly populated, and therefore have a larger pool of potential participants, showing better coverage than suburban areas of well-mapped cities. Furthermore, there is an ongoing discussion within the OSM community about the value of mapping without local knowledge and the impact of such action on the willingness of potential contributors to fix errors and contribute to the map.
A third, and somewhat surprising, influence is the impact of mapping places that the participants haven’t or can’t visit, such as Haiti after the earthquake or Baghdad in 2007. Despite the willingness of participants to join in and help in the data collection process, the details that can be captured without being on the ground are fairly limited, even when multiple sources such as Flickr images, Google Street View and paper maps are used. The details are limited to what was captured at a certain point in time and to the limitations of the sensing device, so the mapping is, by necessity, incomplete.
We will demonstrate these and other aspects of what we termed ‘the tyranny of place’ and its impact on what can be covered by OSM without much effort and which locations will not be covered without a concentrated effort that requires some planning.
29 January, 2010
After the publication of the comparison of OpenStreetMap and Google Map Maker coverage of Haiti, Nicolas Chavent from the Humanitarian OpenStreetMap Team contacted me and turned my attention to the UN Stabilization Mission in Haiti’s (known as MINUSTAH) geographical dataset, which is seen as the core set for the post earthquake humanitarian effort, and therefore a comparison with this dataset might be helpful, too. The comparison of the two Volunteered Geographical Information (VGI) datasets of OpenStreetMap and Google Map Maker with this core dataset also exposed an aspect of the usability of geographical information in emergency situations that is worth commenting on.
For the purpose of the comparison, I downloaded two datasets from GeoCommons – the detailed maps of Port-au-Prince and the Haiti road network. Both are reported on GeoCommons as originating from MINUSTAH. I combined them together, and then carried out the comparison. As in the previous case, the comparison focused only on the length of the roads, with the hypothesis that, if there is a significant difference in the length of the road at a given grid square, it is likely that the longer dataset is more complete. The other comparisons between established and VGI datasets give ground to this hypothesis, although caution must be applied when the differences are small. The following maps show the differences between the MINUSTAH dataset and OpenStreetMap and MINUSTAH and Google Map Maker datasets. I have also reproduced the original map that compares OpenStreetMap and Map Maker for the purpose of comparison and consistency, as well as for cartographic quality.
The maps show that MINUSTAH does provide fairly comprehensive coverage across Haiti (as expected) and that the volunteered efforts of OpenStreetMap and Map Maker provide further details in urban areas. There are areas that are only covered by one of the datasets, so they all have value.
The final comparison uses the 3 datasets together, with the same criteria as in the previous map – the dataset with the longest length of roads is the one that is considered the most complete.
It is interesting to note the south/north divide between OpenStreetMap and Google Map Maker, with Google Map Maker providing more details in the north, and OpenStreetMap in the south (closer to the earthquake epicentre). When compared over the areas in which there is at least 100 metres of coverage of MINUSTAH, OpenStreetMap is, overall, 64.4% complete, while Map Maker is 41.2% complete. Map Maker is covering further 354 square kilometres which are not covered by MINUSTAH or OpenStreetMap, and OpneStreetMap is covering further 1044 square kilometres that are missing from the other datasets, so clearly there is a benefit in integrating them. The grid that includes the analysis of the integrated datasets in shapefile format is available here, in case that it is of any use or if you like to carry out further analysis and or visualise it.
While working on this comparison, it was interesting to explore the data fields in the MINUSTAH dataset, with some of them included to provide operational information, such as road condition, length of time that it takes to travel through it, etc. These are the hallmarks of practical and operational geographical information, with details that are relevant directly to the end-users in their daily tasks. The other two datasets have been standardised for universal coverage and delivery, and this is apparent in their internal data structure. Google Map Maker schema is closer to traditional geographical information products in field names and semantics, exposing the internal engineering of the system – for example, including a country code, which is clearly meaningless in a case where you are downloading one country! OpenStreetMap (as provided by either CloudMade or GeoFabrik) keeps with the simplicity mantra and is fairly basic. Yet, the scheme is the same in Haiti as in England or any other place. So just like Google, it takes a system view of the data and its delivery.
This means that, from an end-user perspective, while these VGI data sources were produced in a radically different way to traditional GI products, their delivery is similar to the way in which traditional products were delivered, burdening the user with the need to understand the semantics of the different fields before using the data.
In emergency situations, this is likely to present an additional hurdle for the use of any data, as it is not enough to provide the data for download through GeoCommons, GeoFabrik or Google – it is how it is going to be used that matters. Notice that the maps tell a story in which an end-user who wants to have full coverage of Haiti has to combine three datasets, so the semantic interpretation can be an issue for such a user.
So what should a user-centred design of GI for an emergency situation look like? The general answer is ‘find the core dataset that is used by the first responders, and adapt your data to this standard’. In the case of Haiti, I would suggest that the MINUSTAH dataset is a template for such a thing. It is more likely to find users of GI on the ground who are already exposed to the core dataset and familiar with it. The fields are relevant and operational and show that this is more ‘user-centred’ than the other two. Therefore, it would be beneficial for VGI providers who want to help in an emergency situation to ensure that their data comply to the local de facto standard, which is the dataset being used on the ground, and bring their schema to fit it.
Of course, this is what GI ontologies are for, to allow for semantic interoperability. The issue with them is that they add at least two steps – define the ontology and figure out the process to translate the dataset that you have acquired to the required format. Therefore, this is something that should be done by data providers, not by end-users when they are dealing with the real situation on the ground. They have more important things to do than to find a knowledge engineer that can understand semantic interoperability…
28 December, 2009
In my previous analysis of OpenStreetMap (OSM) data, I compared it to the Index of Deprivation, as a way to understand if there is any socio-economic spatial pattern in the coverage of OSM. Following numerous interactions with various parts of the OSM community, I had suspected that there might be a bias, with the result that affluent areas might be mapped more completely than deprived areas. I explored this systematically, as only empirical analysis could provide evidence one way or another.
Here are the details of the analytical process that was used.
The core data that was used for the comparison is the UK government’s Index of Multiple Deprivation 2007 (IMD 2007) which is calculated from a combination of governmental datasets and provides a score for each Lower Level Super Output Area (LSOA) in England. The position of each LSOA was used to calculate the percentile position within the IMD 2007. Each percentile point includes about 325 LSOAs. Areas that are in the bottom percentile are the most deprived, while those at the 99th percentile are the most affluent places in the UK according to the index.
Following the same methodology that was used to evaluate completeness, the road datasets from OSM and from the Ordnance Survey’s Meridian 2 were clipped to each of the LSOAs, and then the total length of the two datasets was compared. Because the size of LSOAs varies, it is more meaningful to compare percentage completeness and not the absolute length.
The analysis of data from March 2008 showed a clear difference between the LSOAs at the bottom of the scale and those at the top. While the LOSAs at the bottom were not neglected, the level of coverage was far lower, even when taking into account the variability in LSOA areas. I wanted to explore whether the situation has changed since then and undertook further analysis using the same methodology.
Has the situation changed during the 19 months from March 2008 to October 2009?
The graph above shows that things have changed, but not for the better. The graph shows the level of completeness for each group of LSOAs. To avoid confusion with rural areas, where the size of the LSOA becomes very large, only LSOAs that are within a standard deviation of area size are included. The effect of this is that the graph shows the results for mostly urban LSOAs.
I compared 3 datasets: March 2008, March 2009 and October 2009. A rather alarming trend is visible. Instead of shrinking, the gap between affluent and deprived LSOAs is growing. The average completeness of the bottom percentile in March 2008 was 40.7%, grew to 65.7% a year later and to 71.8% by October 2008. For the most affluent percentile, completeness grew from 67.5% in March 2008 to 97.0% a year later and to 108.9% by October 2009. In other words, the gap between the top and the bottom has grown from 26.6% to 37.1% within the analysis period.
Within the OpenStreetMap community, there are activities such as those led by Mikel Maron to map informal settlements in Kenya and to ensure coverage of other marginalised parts of the world (see the posts on his blog). From the work that we are doing in Mapping for Change, it is clear to me that mapping can be an excellent motivator to encourage people to use digital tools, and therefore adding data to OSM can work as a way increase digital inclusion. So maybe OSM coverage can be increased in the UK with some governmental support, which has stated an aim of increasing digital inclusion?
If you would like to explore the data by yourself, here is a spreadsheet with the information, including the LSOA codes, the position in IMD 2004 and IMD 2007, and the coverage percentage for March 2008, March 2009 and October 2009. Please note the terms and conditions for its use – and let me know what you have done with it!
12 December, 2009
Over the past decade, different people either hailed or criticised the growing inability to forget in the digital age in which we are living. Logging on to Google Dashboard and seeing every single search that I carried out since 2006 is rather scary – especially as there is no guarantee that if I ask to delete my web history, it will be also deleted from Google servers – just anonymising the information which is not much, really. An interesting point of view on the virtue of forgetting in today’s digital world is available in Viktor Mayer-Schonberger’s recent lecture at the RSA .
And then there is all the public information about you that is already on the open web and that is going to be there for as long as the servers and the web continue to be around. While looking for my earliest internet trails, I came across a posting to the usenet group comp.infosystems.gis from 1994. Back then I was working on a large-scale GIS project for the Israel Electric Corporation and, as far as I can recall, I was asked to write a briefing about the direction that we should take regarding the hardware and software platforms that would be used by the client in the roll-out of the system, which was designed for IBM RS/6000 workstations. The requests that I sent to the list and the discussion are summarised in a posting that is still accessible on Google Groups – so anyone can find it and read it …
In terms of internet memory, it does expose certain aspects that I’m now much more aware about – such as my control of English back then. Glossing over the grammar and spelling mistakes, the analysis makes interesting reading from 15 years perspective.
Firstly, it is interesting to note that the need for high-end computing in terms of operating systems and hardware for GIS remains a relevant issue. See, for example Manifold GIS’s use of 64-bit operating system or the issue of graphic capabilities and the use of General Processing of Graphic Processing Units (GPGPU) in GIS and Remote Sensing packages such as Geomatica. Another indication of the continued need for processing power is illustrated in the description of ‘who might need this?’ for high-end workstations – although in 1994 no one in PC Magazine ever mentioned GIS.
However, for the absolute majority of end-users who are using GIS for basic map making and processing, this is not true anymore and many are using standard desktop or laptop computers quite well. Over the next few years, as more of the processing migrates to the ‘cloud’, the number of GIS users who need high-end machines will continue to decline. In 1994 the expectation was that most users will need a workstation, whereas very soon they will happily use a low-powered netbook.
Secondly, it is interesting to see the changes in data sizes – I note in the text that 1GB data caused us difficulties in backups and the local network (10BASE-T). I recall complaints from the rest of the company, which was running mainframe systems with millions of alpha-numeric operations, when we ran performance tests because of the bandwidth that GIS processing consumed. This aspect of geographical information handling is still challenging, usually not at the local level – even for large-scale processing, the cost of storage is so low that it’s not a problem. However, for the people who manage the backbone of large-scale applications, say Yahoo! Maps, this is still an issue – I assume that video, images and maps are now major consumers of bandwidth and disk storage that require special handling and planning.
Thirdly, there is a lesson about ‘disruptive technologies’. The PC was one such disruptive technology and, even over a decade after their introduction, PCs were not comparable to workstations in terms of memory, processing, multitasking and networking. The advantage of workstations was clear in 1994. Even as late as 1999, when we ran the Town Centres project on Sun workstations, there was still an advantage, but it was disappearing rapidly. Today, UNIX workstations occupy a very small niche.
This is an issue when we think forward to the way GIS will look in 2015 (as the AGI Foresight study is doing) or 2020. Some of the disruptions to the way GIS operated for many years are gathering pace, such as the move to software and data as services where organisations will receive the two bundled from a provider, or using more crowd sourced information.
So sometimes it is useful to come across old writing – it makes you re-evaluate the present and consider the future. At the same time, it is only because I forgot about the post that it was interesting to come across it – so Victor Mayer-Schonberger is correct that there is a virtue in forgetting.
29 October, 2009
The discussion about the future of the GIS ‘profession’ has flared up in recent days – see the comments from Sean Gorman, Steven Feldman (well, citing me) and Don Meltz among others. My personal perspective is about the educational aspect of this debate.
I’ve been teaching GIS since 1995, and been involved in the MSc in GIS at UCL since 1998 – teaching on it since 2001. Around 1994 I was contemplating the excellent MSc in GIS programme in Edinburgh, though I opted to continue with my own mix of geography and computer science, which turned out to be great in the end – but I can say that I have been following the trends in GIS education for quite a while.
Based on this experience, I would argue that the motivation for studying an MSc in GIS over the past 20 years was to get the ‘ARC/INFO driving licence’. I use ARC/INFO as a metaphor – you can replace it with any other package, but ARC/INFO was the de facto package for teaching GIS (and its predecessor ArcGIS is today), so it is suitable shorthand. What I mean by that is that for a long time GIS packages were hard to use and required a significant amount of training in order to operate successfully. Even if a fairly simple map was needed, the level of technical knowledge and the number of steps required were quite significant. So employers, who mostly wanted someone who could make them maps, recruited people who gained skills in operating the complex packages that allow the production of maps.
The ‘ARC/INFO driving licence’ era included an interesting dissonance – the universities were telling themselves that they were teaching the principles of GIScience but the students were mostly interested in learning how to operate a GIS at a proficient level to get a job. I’ve seen and talked with enough students to recognise that many of them, in their daily jobs, rarely used the spatial statistical analysis that we were teaching and they mostly worked at ‘taming the beast’, which GIS was.
As expected, at UCL there was always a group that was especially interested in the principles of GIScience and that continued their studies beyond the MSc. But they are never the majority of the cohort.
The model worked well for everyone – universities were teaching GIS by a combination of principles and training of specific packages and the students found jobs at the end and joined GIS departments in different organisations.
The disruption that changed this arrangement started in the late 1990s, with Oracle Spatial starting to show that GIS can be integrated in mainstream products. The whole process accelerated around 2005 with the emergence of GeoWeb, Free and Open Source GIS (FOSS GIS) and the whole range of applications that come with it. Basically, you don’t need a licence any more. More and more employers (even GIS consultancies) are not recruiting from GIS education programmes – they are taking computing professionals and teaching them the GIS skills. Going through an MSc in GIS to be proficient with a tool is not necessary.
So in an era in which you don’t need a licence to join the party, what is the MSc in GIS for?
The answer is that it can be the time when you focus on principles and on improving specific skills. Personally, that was my route to education. I started working in GIS software development without much more than high school education in 1988. After hearing people around me talking about registers, bugs, polygons and databases I was convinced that I must understand these principles properly. So I went for a degree that provided me with the knowledge. In the same way, I would expect that MSc programmes cater for the needs of people who gain some practical experience with operating geospatial technologies and want to learn the principles or become specialists in specific aspects of these systems.
We already see people doing the MSc while working with GIS – currently studying an MSc by distance learning or in the evening is very popular and I expect that this will continue. However, the definition of what is covered by GIS must be extended – it should include everything from Bing Maps API to PostGIS to ArcGIS.
I can also see the need for specialised courses – maybe to focus on the technical development of geospatial technologies or maybe on spatial statistical analysis for those who want to become geographical information analysts. I would also expect much more integration of GIS with other fields of study where it is taught as a tool – just look at the many MSc programmes that currently include GIS. I’m already finding myself teaching students of urban design, development planning or asset management.
All in all, I’m not going to feel sorry that the ‘ARC/INFO driving licence’ era is coming to its end.
While working on a text about HCI and GIS, I started to notice a general pattern of ten years or so delay between the dates a new functionality starts to become ‘mainstream’ in general computer use and when it becomes common in GIS.
Here are some examples: the early use of computers in the business environment was in the mid to late 1950s, but we had to wait until the late 1960s to get the first full-scale GIS (and even that was fairly primitive). Personal computers and microcomputers appeared in the late 1970s with machines such as the Apple II, which started to be used by many small offices for word processing and accounting, but the first PC GIS application, Mapinfo, appeared only in the second half of the 1980s. Human-Computer Interaction emerged as a field of research in the early 1980s, but only in the early 1990s was it recognised by GIS researchers. Graphical User Interfaces were first implemented in mainstream computing in the very early 1980s, and didn’t arrive to GIS until the 1990s. Finally, notice how e-commerce, e-mail and other Web applications were very successful in the early 1990s, but only in the mid 2000s did the GeoWeb emerge, with the success of Google Maps.
Several other examples of this gap exist – for example, the use of SQL databases. Even if you search for the earliest research paper or documentation for a major GIS functionality with a parallel in the mainstream, this lag appears. Some very early research appears around 5 years after the mainstream use (see the first HCI and GIS paper as an example) but it will take at least another 5 years to see it in real products that are used outside research labs.
This observation explains, to me, two puzzles: first, why is it that, for the two decades that I’ve been working with GIS, it keeps being referred to as an ‘emerging technology’? The answer is that it is always catching up so, for the journalist, who is familiar with other areas of computing, it feels like something that is emerging; second, why are companies that are getting into geotechnologies early either failing (examples aplenty in the Location-based services area in the 1990s) or needing about 10 years of survival to become successful? The reason here is that they are too optimistic about the technical challenges that they are facing.
I think that the lag is due to the complexities of dealing with geographical information, and the need for hardware and software to get to the stage when geographical applications are possible. Another reason is the relative lack of investment in the development of geotechnologies, which were considered for a long time niche applications.
What is your explanation for the gap?
14 February, 2009
In the post about the Engaging Geography seminar, I’ve discussed how different levels of engagement with geography can be used to define if a person using a system should be considered a ‘public geographer’ or just a consumer of geographical information in a passive and ephemeral way.
Thinking more broadly on geotechnologies, it is appropriate to include the people who are producing many of the everyday geographical representations. Frequently, the people who are producing these representations use GIS.
When thinking about the Web, it’s clear that the vast majority of the people involved in public geographies do not have any ‘formal’ geographical background. You might think that, in the case of GIS, because of the barriers to entry, the situation will be different.
This is not so. As Dave Unwin noted in his paper in 2005, many of the people operating GIS are actually ‘accidental geographers’. When you take the number of GIS users worldwide, it is clear that only a few have gone through formal geographical education beyond basic school geography. Unwin notes that ‘accidental geographers’ have naïve conceptualisations of geography (for example that it is all about the location of factual objects in space), lack of understanding of spatial analysis and sometimes have a dismissive attitude to the academic disciplines of geography or cartography.
Neogeography is putting these accidental geographers in a new light. Some users do indeed see geography as uncomplicated and GIS as the ‘something that produces maps’. However, as a person is exposed to systems that are dealing with geography for a sustained period she is more likely to start questioning the nature of this geography and the way that it is represented. After a while, these questions will lead to a process of learning about geographical concepts – and the fact that so much information is now available on the Web will certainly help. Sometimes, the commitment to geography might lead to joining organisations such as the AGI and maybe even becoming Chartered Geographers (GIS).
So, in summary, there is a whole range of commitments and interests in geography, and both accidental geographers and neogeographers can be positioned along a continuum from ignorance to expert knowledge. I think that most will move through this continuum and enjoy the process of developing geographic knowledge.