At the last day of INSPIRE conference, I’ve attended a session about apps and applications and the final plenary which focused on knowledge based economy and the role of inspire within it. Some notes from the talks including my interpretations and comments.
Dabbie Wilson from the Ordnance Survey highlighted the issues that the OS is facing in designing next generation products from an information architect point of view. She noted that the core large scale product, MasterMap has been around for 14 years and been provided in GML all the way through. She noted that now the client base in the UK is used to it and happy with (and when it was introduced, there was a short period of adjustment that I recall, but I assume that by now everything is routine). Lots of small scale products are becoming open and also provided as linked data. The user community is more savvy – they want the Ordnance Survey to push data to them, and access the data through existing or new services and not just given the datasets without further interaction. They want to see ease of access and use across multiple platforms. The OS is considering moving away from provision of data to online services as the main way for people to get access to the data. The OS is investing heavily in Mobile apps for leisure but also helping the commercial sector in developing apps that are based on OS data and tools. For example, OS locate app provide mechanisms to work worldwide so it’s not only UK. They also put effort to create APIs and SDKs – such as OS OnDemands – and also allowing local authorities to update their address data. There is also focus on cloud-based application – such as applications to support government activities during emergencies. The information architecture side moving from product to content. The OS will continue to maintain content that is product agnostic and running the internal systems for a long period of 10 to 20 years so they need to decouple outward facing services from the internal representation. The OS need to be flexible to respond to different needs – e.g. in file formats it will be GML, RDF and ontology but also CSV and GeoJSON. Managing the rules between the various formats is a challenging task. Different representations of the same thing is another challenge – for example 3D representation and 2D representation.
Didier Leibovici presented a work that is based on Cobweb project and discussing quality assurance to crowdsourcing data. In crowdsourcing there are issues with quality of both the authoritative and the crowdsourcing data. The COBWEB project is part of a set of 5 citizen observatories, exploring air quality, noise, water quality, water management, flooding and land cover, odour perception and nuisance and they can be seen at http://www.citizen-obs.eu. COBWEB is focusing on the infrastructure and management of the data. The pilot studies in COBWEB look at landuse/land cover, species and habitat observations and flooding. They are mixing sensors in the environment, then they get the data in different formats and the way to managed it is to validate the data, approve its quality and make sure that it’s compliant with needs. The project involve designing an app, then encouraging people to collect the data and there can be lack of connection to other sources of data. The issues that they are highlighting are quality/uncertainty, accuracy, trust and relevance. One of the core questions is ‘is crowd-sourcing data need to different to any other QA/QC?’ (my view: yes, but depending on the trade offs in terms of engagement and process) they see a role of crowdsourcing in NSDI, with real time data capture QA and post dataset collection QA (they do both) and there are also re-using and conflating data sources. QA is aimed to know what is collected – there are multiple ways to define the participants which mean different ways of involving people and this have implications to QA. They are suggesting a stakeholder quality model with principles such as vaueness, ambiguity, judgement, reliability, validity, and trust. There is a paper in AGILE 2014 about their framework. The framework suggests that the people who build the application need to develop the QA/QC process and do that with workflow authoring tool, which is supported with ontology and then running it as web processing service. Temporality of data need to be consider in the metadata, and how to update the metadata on data quality.
Patrick Bell considered the use of smartphone apps – in a project of the BGS and the EU JRC they review existing applications. The purpose of the survey to explore what national geological organisations can learn from the shared experience with development of smartphone apps – especially in the geological sector. Who is doing the development work and which partnerships are created? What barriers are perceived and what the role of INSPIRE directive within the development of these apps? They also try to understand who are the users? There are 33 geological survey organisations in the EU and they received responses from 16 of them. They found 23 different apps – from BGS – iGeology http://www.bgs.ac.uk/igeology/home.html and provide access to geological amps and give access to subsidence and radon risk with in-app payment. They have soil information in the MySoil app which allow people to get some data for free and there is also ability to add information and do citizen science. iGeology 3D is adding AR to display a view of the geological map locally. aFieldWork is a way to capture information in harsh environment of Greenland. GeoTreat is providing information of sites with special value that is relevant to tourists or geology enthusiasts. BRGM – i-infoTerre provide geological information to a range of users with emphasis on professional one, while i-infoNappe tell you about ground water level. The Italian organisation developed Maps4You with hiking route and combining geology with this information in Emilia-Romagna region. The Czech Geologcial survey provide data in ArcGIS online.
The apps deal with a wide range of topics, among them geohazards, coastline, fossils, shipwrecks … The apps mostly provide map data and 3D, data collection and tourism. Many organisation that are not developing anything stated no interest or a priority to do so, and also lack of skills. They see Android as the most important – all apps are free but then do in app purchase. The apps are updated on a yearly basis. about 50% develop the app in house and mostly work in partnerships in developing apps. Some focus on webapps that work on mobile platform, to cross platform frameworks but they are not as good as native apps, though the later are more difficult to develop and maintain. Many people use ESRI SDK and they use open licenses. Mostly there is lack of promotion of reusing the tools – most people serve data. Barriers – supporting multiple platform, software development skills, lack of reusable software and limited support to reuse across communities – heavy focus on data delivery, OGC and REST services are used to deliver data to an app. Most suggesting no direct link to INSPIRE by respondents but principles of INSPIRE are at the basis of these applications.
Timo Aarmio – presented the OSKARI platform to release open data to end users (http://www.oskari.org/). They offer role-based security layers with authenticates users and four levels of permissions – viewing, viewing on embedded maps, publishing and downloading. The development of Oskari started in 2011 and is used by 16 member organisations and the core team is running from National Land Survey of Finland. It is used in Arctic SDI, ELF and Finish Geoportal – and lots of embedded maps. The end-users features allow search of metadata, searching map layers by data providers or INSPIRE themes. they have drag and drop layers and customisation of features in WFS. Sharing is also possible with uploading shapefiles by users. They also have printing functionality which allow PNG or PDF and provide also embedded maps so you can create a map and then embed it in your web page. The data sources that they support are OGC web services – WMS, WMTS, WFS, CSW and also ArcGIS REST, data import for Shapefiles and KML, and JSON for thematic maps . Spatial analysis is provided with OGC Web Processing Service – providing basic analysis of 6 methods – buffer, aggregate, union, intersect, union of analysed layres and area and sector. They are planning to add thematic maps, more advanced spatial analysis methods, and improve mobile device support. 20-30 people work on Oskari with 6 people at the core of it.
The final session focused on knowledge based economy and the link to INSPIRE.
Andrew Trigg provide the perspective of HMLR on fueling the knowledge based economy with open data. The Land registry dealing with 24 million titles with 5 million property transaction a year. They provided open access to individual titles since 1990 and INSPIRE and the open data agenda are important to the transition that they went through in the last 10 years. Their mission is now include an explicit reference to the management and reuse of land and property data and this is important in terms of how the organisation defines itself. From the UK context there is shift to open data through initiatives such as INSPIRE, Open Government Partnership, the G8 Open Data Charter (open by default) and national implementation plans. For HMLR, there is the need to be INSPIRE Compliance, but in addition, they have to deal with public data group, the outcomes of the Shakespeare review and commitment to a national information infrastructure. As a result, HMLR now list 150 datasets but some are not open due to need to protect against fraud and other factors. INSPIRE was the first catalyst to indicate that HMLR need to change practices and allowed the people in the organisation to drive changes in the organisation, secure resources and invest in infrastructure for it. It was also important to highlight to the board of the organisation that data will become important. Also a driver to improving quality before releasing data. The parcel data is available for use without registration. They have 30,000 downloads of the index polygon of people that can potentially use it. They aim to release everything that they can by 2018.
The challenges that HMLR experienced include data identification, infrastructure, governance, data formats and others. But the most important to knowledge based economy are awareness, customer insight, benefit measurement and sustainable finance. HMLR invested effort in promoting the reuse of their data however, because there is no registration, their is not customer insight but no relationships are being developed with end users – voluntary registration process might be an opportunity to develop such relations. Evidence is growing that few people are using the data because they have low confidence in commitment of providing the data and guarantee stability in format and build applications on top of it, and that will require building trust. knowing who got the data is critical here, too. Finally, sustainable finance is a major thing – HMLR is not allowed to cross finance from other areas of activities so they have to charge for some of their data.
Henning Sten Hansen from Aalborg University talked about the role of education. The talk was somewhat critical of the corporatisation of higher education, but also accepting some of it’s aspects, so what follows might be misrepresenting his views though I think he tried to mostly raise questions. Henning started by noting that knowledge workers are defined by OECD as people who work autonomously and reflectively, use tools effectively and interactively, and work in heterogeneous groups well (so capable of communicating and sharing knowledge). The Danish government current paradigm is to move from ‘welfare society’ to the ‘competitive society’ so economic aspects of education are seen as important, as well as contribution to enterprise sector with expectations that students will learn to be creative and entrepreneurial. The government require more efficiency and performance from higher education and as a result reduce the autonomy of individual academics. There is also expectation of certain impacts from academic research and emphasis on STEM for economic growth, governance support from social science and the humanities need to contribute to creativity and social relationships. The comercialisation is highlighted and pushing patenting, research parks and commercial spin-offs. There is also a lot of corporate style behaviour in the university sector – sometime managed as firms and thought as consumer product. He see a problem that today that is strange focus and opinion that you can measure everything with numbers only. Also the ‘Google dream’ dream is invoked – assuming that anyone from any country can create global companies. However, researchers that need time to develop their ideas more deeply – such as Niels Bohr who didn’t published and secure funding – wouldn’t survive in the current system. But is there a link between education and success? LEGO founder didn’t have any formal education [though with this example as with Bill Gates and Steve Jobs, strangely their business is employing lots of PhDs - so a confusion between a person that start a business and the realisation of it]. He then moved from this general context to INSPIRE, Geoinformation plays a strong role in e-Governance and in the private sector with the increase importance in location based services. In this context, projects such as GI-N2K (Geographic Information Need to Know) are important. This is a pan European project to develop the body of knowledge that was formed in the US and adapting it to current need. They already identified major gaps between the supply side (what people are being taught) and the demand side – there are 4 areas that are cover in the supply side but the demand side want wider areas to be covered. They aim to develop a new BoK for Europe and facilitating knowledge exchange between institutions. He concluded that Higher education is prerequisite for the knowledge economy – without doubt but the link to innovation is unclear . Challenges – highly educated people crowd out the job market and they do routine work which are not matching their skills, there are unclear the relationship to entrepreneurship and innovation and the needed knowledge to implement ideas. What is the impact on control universities reducing innovation and education – and how to respond quickly to market demands in skills when there are differences in time scale.
Giacomo Martirano provided a perspective of a micro-enterprise (http://www.epsilon-italia.it/IT/) in southern Italy. They are involved in INSPIRE across different projects – GeoSmartCities, Smart-Islands and SmeSpire – so lots of R&D funding from the EU. They are also involved in providing GIS services in their very local environment. From a perspective of SME, he see barriers that are orgnaisational, technical and financial. They have seen many cases of misalignment of technical competencies of different organisations that mean that they can’t participate fully in projects. Also misalignment of technical ability of clients and suppliers, heterogeneity in client organisation culture that add challenges. Financial management of projects and payment to organisations create problems to SME to join in because of sensitivity to cash-flow. They experience cases were awarded contracts won offering a price which is sometime 40% below the reference one. There is a need to invest more and more time with less aware partners and clients. When moving to the next generation of INSPIRE – there is a need to engage with micro-SMEs in the discussion ‘don’t leave us alone’ as the market is unfair. There is also a risk that member states, once the push for implementation reduced and without the EU driver will not continue to invest. His suggestion is to progress and think of INSPIRE as a Serivce – SDI as a Service can allow SMEs to join in. There is a need for cooperation between small and big players in the market.
Andrea Halmos (public services unit, DG CONNECT) – covering e-government, she noted her realisation that INSPIRE is more than ‘just environmental information’. From DG CONNECT view, ICT enabled open government, and the aim of the digital agenda for Europe is to empowering citizen and businesses, strengthening the internal market, highlighting efficiency and effectiveness and recognised pre-conditions. One of the focus is the effort to put public services in digital format and providing them in cross border way. The principles are to try to be user centred, with transparency and cross border support – they have used life events for the design. There are specific activities in sharing identity details, procurement, patient prescriptions, business, and justice. They see these projects as the building blocks for new services that work in different areas. They are seeing challenges such financial crisis, but there is challenge of new technologies and social media as well as more opening data. So what is next to public administration? They need to deal with customer – open data, open process and open services – with importance to transparency, collaboration and participation (http://www.govloop.com/profiles/blogs/three-dimensions-of-open-government). The services are open to other to join in and allow third party to create different public services. We look at analogies of opening decision making processes and support collaboration with people – it might increase trust and accountability of government. The public service need to collaborative with third parties to create better or new services. ICT is only an enablers – you need to deal with human capital, organisational issue, cultural issues, processes and business models – it even question the role of government and what it need to do in the future. What is the governance issue – what is the public value that is created at the end? will government can be become a platform that others use to create value? They are focusing on Societal Challenge Comments on their framework proposals are welcomed – it’s available at http://ec.europa.eu/digital-agenda/en/news/vision-public-services
After these presentations, and when Alessandro Annoni (who was charring the panel) completed the first round of questions, I was bothered that in all these talks about knowledge-based economy only the government and the private sector were mentioned as actors, and even when discussing development of new services on top of the open data and services, the expectation is only for the private sector to act in it. I therefore asked about the role of the third-sector and civil-society within INSPIRE and the visions that the different speakers presented. I even provided the example of mySociety – mainly to demonstrate that third-sector organisations have a role to play.
To my astonishment, Henning, Giacomo, Andrea and Alessandro answered this question by first not treating much of civil-society as organisations but mostly as individual citizens, so a framing that allow commercial bodies, large and small, to act but citizens do not have a clear role in coming together and acting. Secondly, the four of them seen the role of citizens only as providers of data and information – such as the reporting in FixMyStreet. Moreover, each one repeated that despite the fact that this is low quality data it is useful in some ways. For example, Alessandro highlighted that OSM mapping in Africa is an example for a case where you accept it, because there is nothing else (really?!?) but in other places it should be used only when it is needed because of the quality issue – for example, in emergency situation when it is timely.
Apart from yet another repetition of dismissing citizen generated environmental information on the false argument of data quality (see Caren Cooper post on this issue), the views that presented in the talks helped me in crystallising some of the thoughts about the conference.
As one would expect, because the participants are civil servants, on stage and in presentations they follow the main line of the decision makers for which they work, and therefore you could hear the official line that is about efficiency, managing to do more with reduced budgets and investment, emphasising economic growth and very narrow definition of the economy that matters. Different views were expressed during breaks.
The level in which the citizens are not included in the picture was unsurprising under the mode of thinking that was express in the conference about the aims of information as ‘economic fuel’. While the tokenism of improving transparency, or even empowering citizens appeared on some slides and discussions, citizens are not explicitly included in a meaningful and significant way in the consideration of the services or in the visions of ‘government as platform’. They are reprieved as customers or service users. The lesson that were learned in environmental policy areas in the 1980s and 1990s, which are to provide an explicit role for civil society, NGOs and social-enterprises within the process of governance and decision making are missing. Maybe this is because for a thriving civil society, there is a need for active government investment (community centres need to built, someone need to be employed to run them), so it doesn’t match the goals of those who are using austerity as a political tool.
Connected to that is the fact that although, again at the tokenism level, INSPIRE is about environmental applications, the implementation now is all driven by narrow economic argument. As with citizenship issues, environmental aspects are marginalised at best, or ignored.
The comment about data quality and some responses to my talk remind me of Ed Parsons commentary from 2008 about the UK GIS community reaction to Web Mapping 2.0/Neogeography/GeoWeb. 6 years on from that , the people that are doing the most important geographic information infrastructure project that is currently going, and it is progressing well by the look of it, seem somewhat resistant to trends that are happening around them. Within the core area that INSPIRE is supposed to handle (environmental applications), citizen science has the longest history and it is already used extensively. VGI is no longer new, and crowdsourcing as a source of actionable information is now with a decade of history and more behind it. Yet, at least in the presentations and the talks, citizens and civil-society organisations have very little role unless they are controlled and marshaled.
Despite all this critique, I have to end with a positive note. It has been a while since I’ve been in a GIS conference that include the people that work in government and other large organisations, so I did found the conference very interesting to reconnect and learn about the nature of geographic information management at this scale. It was also good to see how individuals champion use of GeoWeb tools, or the degree in which people are doing user-centred design.
Following the last post, which focused on an assertion about crowdsourced geographic information and citizen science I continue with another observation. As was noted in the previous post, these can be treated as ‘laws’ as they seem to emerge as common patterns from multiple projects in different areas of activity – from citizen science to crowdsourced geographic information. The first assertion was about the relationship between the number of volunteers who can participate in an activity and the amount of time and effort that they are expect to contribute.
This time, I look at one aspect of data quality, which is about consistency and coverage. Here the following assertion applies:
‘All information sources are heterogeneous, but some are more honest about it than others’
What I mean by that is the on-going argument about authoritative and crowdsourced information sources (Flanagin and Metzger 2008 frequently come up in this context), which was also at the root of the Wikipedia vs. Britannica debate, and the mistrust in citizen science observations and the constant questioning if they can do ‘real research’.
There are many aspects for these concerns, so the assertion deals with the aspects of comprehensiveness and consistency which are used as a reason to dismiss crowdsourced information when comparing them to authoritative data. However, at a closer look we can see that all these information sources are fundamentally heterogeneous. Despite of all the effort to define precisely standards for data collection in authoritative data, heterogeneity creeps in because of budget and time limitations, decisions about what is worthy to collect and how, and the clash between reality and the specifications. Here are two examples:
Take one of the Ordnance Survey Open Data sources – the map present themselves as consistent and covering the whole country in an orderly way. However, dig in to the details for the mapping, and you discover that the Ordnance Survey uses different standards for mapping urban, rural and remote areas. Yet, the derived products that are generalised and manipulated in various ways, such as Meridian or Vector Map District, do not provide a clear indication which parts originated from which scale – so the heterogeneity of the source disappeared in the final product.
The census is also heterogeneous, and it is a good case of specifications vs. reality. Not everyone fill in the forms and even with the best effort of enumerators it is impossible to collect all the data, and therefore statistical analysis and manipulation of the results are required to produce a well reasoned assessment of the population. This is expected, even though it is not always understood.
Therefore, even the best information sources that we accept as authoritative are heterogeneous, but as I’ve stated, they just not completely honest about it. The ONS doesn’t release the full original set of data before all the manipulations, nor completely disclose all the assumptions that went into reaching the final value. The Ordnance Survey doesn’t tag every line with metadata about the date of collection and scale.
Somewhat counter-intuitively, exactly because crowdsourced information is expected to be inconsistent, we approach it as such and ask questions about its fitness for use. So in that way it is more honest about the inherent heterogeneity.
Importantly, the assertion should not be taken to be dismissive of authoritative sources, or ignoring that the heterogeneity within crowdsources information sources is likely to be much higher than in authoritative ones. Of course all the investment in making things consistent and the effort to get universal coverage is indeed worth it, and it will be foolish and counterproductive to consider that such sources of information can be replaced as is suggest for the census or that it’s not worth investing in the Ordnance Survey to update the authoritative data sets.
Moreover, when commercial interests meet crowdsourced geographic information or citizen science, the ‘honesty’ disappear. For example, even though we know that Google Map Maker is now used in many part
s of the world (see the figure), even in cases when access to vector data is provided by Google, you cannot find out about who contribute, when and where. It is also presented as an authoritative source of information.
Despite the risk of misinterpretation, the assertion can be useful as a reminder that the differences between authoritative and crowdsourced information are not as big as it may seem.
5 December, 2012
Recently, I attended a meeting with people from a community that is concerned with vibration and noise caused by a railway near their homes. We have discussed the potential of using citizen science to measure the vibrations that pass the sensory threshold and that people classify as unpleasant, together with other perceptions and feeling about these incidents. This can form the evidence to a discussion with the responsible authorities to see what can be done.
As a citizen science activity, this is not dissimilar from the work carried out around Heathrow to measure the level of noise nuisance or air pollution monitoring that ExCiteS and Mapping for Change carried out in other communities.
In the meetings, the participants felt that they need to emphasise that they are not against the use of the railway or the development of new railway links. Like other groups that I have net in the past, they felt that it is important to emphasise that their concern is not only about their locality – in other words, this is not a case of ‘Not In My Back Yard’ (NIMBY) which is the most common dismissal of local concerns. The concern over NIMBY and citizen science is obvious one, and frequently come up in questions about the value and validity of data collected through this type of citizen science.
During my masters studies, I was introduced to Maarten Wolsink (1994) analysis of NIMBY as a compulsory reading in one of the courses. It is one of the papers that I keep referring to from time to time, especially when complaints about participatory work and NIMBY come up.
Inherently, what Wolsink is demonstrating is that the conceptualisation of the people who are involved in the process as selfish and focusing on only their own area is wrong. Through the engagement with environmental and community concerns, people will explore issues at wider scales and many time will argue for ‘Not in Anyone’s Back Yard’ or for a balance between the needs of infrastructure development and their own quality of life. Studies on environmental justice also demonstrated that what the people who are involved in such activities ask for are not narrow, but many times mix aspects of need for recognition, expectations of respect, arguments of justice, and participation in decision-making (Schlosberg 2007).
In other words, the citizen science and systematic data collection are a way for the community to bring to the table evidence that can enhance arguments beyond NIMBY, and while it might be part of the story it is not the whole story.
For me, these interpretations are part of the reason that such ‘activism’-based citizen science should receive the same attention and respect as any other data collection, most notably by the authorities.
Wolsink, M. (1994) Entanglement of Interests and Motives: Assumptions Behind the NIMBY-Theory on Facility Siting, Urban Studies, 31(6), pp. 851-866.
Scholsberg, D. (2007) Defining Environmental Justice: Theories, Movements, and Nature. Oxford University Press, 2007
In March 2008, I started comparing OpenStreetMap in England to the Ordnance Survey Meridian 2, as a way to evaluate the completeness of OpenStreetMap coverage. The rational behind the comparison is that Meridian 2 represents a generalised geographic dataset that is widely use in national scale spatial analysis. At the time that the study started, it was not clear that OpenStreetMap volunteers can create highly detailed maps as can be seen on the ‘Best of OpenStreetMap‘ site. Yet even today, Meridian 2 provides a minimum threshold for OpenStreetMap when the question of completeness is asked.
So far, I have carried out 6 evaluations, comparing the two datasets in March 2008, March 2009, October 2009, March 2010, September 2010 and March 2011. While the work on the statistical analysis and verification of the results continues, Oliver O’Brien helped me in taking the results of the analysis for Britain and turn them into an interactive online map that can help in exploring the progression of the coverage over the various time period.
Notice that the visualisation shows the total length of all road objects in OpenStreetMap, so does not discriminate between roads, footpaths and other types of objects. This is the most basic level of completeness evaluation and it is fairly coarse.
The application will allow you to browse the results and to zoom to a specific location, and as Oliver integrated the Ordnance Survey Street View layer, it will allow you to see what information is missing from OpenStreetMap.
Finally, note that for the periods before September 2010, the coverage is for England only.
Some details on the development of the map are available on Oliver’s blog.
How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information
10 January, 2011
The paper “How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to “ has appeared in The Cartographic Journal. The proper citation for the paper is:
Haklay, M and Basiouka, S and Antoniou, V and Ather, A (2010) How Many Volunteers Does It Take To Map An Area Well? The validity of Linus’ law to Volunteered Geographic Information. The Cartographic Journal , 47 (4) , 315 – 322.
The abstract of the paper is as follows:
In the area of volunteered geographical information (VGI), the issue of spatial data quality is a clear challenge. The data that are contributed to VGI projects do not comply with standard spatial data quality assurance procedures, and the contributors operate without central coordination and strict data collection frameworks. However, similar to the area of open source software development, it is suggested that the data hold an intrinsic quality assurance measure through the analysis of the number of contributors who have worked on a given spatial unit. The assumption that as the number of contributors increases so does the quality is known as `Linus’ Law’ within the open source community. This paper describes three studies that were carried out to evaluate this hypothesis for VGI using the OpenStreetMap dataset, showing that this rule indeed applies in the case of positional accuracy.
To access the paper on the journal’s website, you can follow the link: 10.1179/000870410X12911304958827. However, if you don’t hold a subscription to the journal, a postprint of the paper is available at the UCL Discovery repository. If you would like to get hold of the printed version, email me.
29 November, 2010
The website GPS Business News published an interview with me in which I covered several aspects of OpenStreetMap and crowdsourced geographical information, including aspects of spatial data quality, patterns of data collection, inequality in coverage and the implications of these patterns to the wider area of Volunteered geographical Information.
The interview is available here .
21 October, 2010
One issue that remained open in the studies on the relevance of Linus’ Law for OpenStreetMap was that the previous studies looked at areas with more than 5 contributors, and the link between the number of users and the quality was not conclusive – although the quality was above 70% for this number of contributors and above it.
Now, as part of writing up the GISRUK 2010 paper for journal publication, we had an opportunity to fill this gap, to some extent. Vyron Antoniou has developed a method to evaluate the positional accuracy on a larger scale than we have done so far. The methodology uses the geometric position of the Ordnance Survey (OS) Meridian 2 road intersections to evaluate positional accuracy. Although Meridian 2 is created by applying a 20-metre generalisation filter to the centrelines of the OS Roads Database, this generalisation process does not affect the positional accuracy of node points and thus their accuracy is the best available. An algorithm was developed for the identification of the correct nodes between the Meridian 2 and OSM, and the average positional error was calculated for each square kilometre in England. With this data, which provides an estimated positional accuracy for an area of over 43,000 square kilometres, it was possible to estimate the contribution that additional users make to the quality of the data.
As can be seen in the chart below, positional accuracy remains fairly level when the number of users is 13 or more – as we have seen in previous studies. On the other hand, up to 13 users, each additional contributor considerably improves the dataset’s quality. In grey you can see the maximum and minimum values, so the area represents the possible range of positional accuracy results. Interestingly, as the number of users increases, positional accuracy seems to settle close to 5m, which is somewhat expected when considering the source of the information – GPS receivers and aerial imagery. However, this is an aspect of the analysis that clearly requires further testing of the algorithm and the datasets.
It is encouraging to see that the results of the analysis are significantly correlated. For the full dataset the correlation is weak (-0.143) but significant at the 0.01 level (2-tailed). However, the average values for each number of contributors (blue line in the graph), the correlation is strong (-0.844) and significant at the 0.01 level (2-talled).
An important caveat is that the number of tiles with more than 10 contributors is fairly small, so that is another aspect that requires further exploration. Moreover, spatial data quality is not just positional accuracy, but also attribute accuracy, completeness, update and other properties. We can expect that they will also exhibit similar behaviour to positional accuracy, but this requires further studies – as always.
However, as this is a large-scale analysis that adds to the evidence from the small-scale studies, it is becoming highly likely that Linus’ Law is affecting the quality of OSM data and possibly of other so-called Volunteered Geographical Information (VGI) sources and there is a decreased gain in terms of positional accuracy when the number of contributors passes about 10 or so.
Completeness in volunteered geographical information – the evolution of OpenStreetMap coverage (2008-2009)
13 August, 2010
The Journal of Spatial Information Science (JOSIS) is a new open access journal in GIScience, edited by Matt Duckham, Jörg-Rüdiger Sack, and Michael Worboys. In addition, the journal adopted an open peer review process, so readers are invited to comment on a paper while it goes through the formal peer review process. So this seem to be the most natural outlet for a new paper that analyses the completeness of OpenStreetMap over 18 months – March 2008 to October 2009. The paper was written in collaboration with Claire Ellul. The abstract of the paper provided below, and you are very welcome to comment on the paper on JOSIS forum that is dedicated to it, where you can also download it.
Abstract: The ability of lay people to collect and share geographical information has increased markedly over the past 5 years as results of the maturation of web and location technologies. This ability has led to a rapid growth in Volunteered Geographical Information (VGI) applications. One of the leading examples of this phenomenon is the OpenStreetMap project, which started in the summer of 2004 in London, England. This paper reports on the development of the project over the period March 2008 to October 2009 by focusing on the completeness of coverage in England. The methodology that is used to evaluate the completeness is comparison of the OpenStreetMap dataset to the Ordnance Survey dataset Meridian 2. The analysis evaluates the coverage in terms of physical coverage (how much area is covered), followed by estimation of the percentage of England population which is covered by completed OpenStreetMap data and finally by using the Index of Deprivation 2007 to gauge socio-economic aspects of OpenStreetMap activity. The analysis shows that within 5 years of project initiation, OpenStreetMap already covers 65% of the area of England, although when details such as street names are taken into consideration, the coverage is closer to 25%. Significantly, this 25% of England’s area covers 45% of its population. There is also a clear bias in data collection practices – more affluent areas and urban locations are better covered than deprived or rural locations. The implications of these outcomes to studies of volunteered geographical information are discussed towards the end of the paper.
4 April, 2010
The opening of Ordnance Survey datasets at the beginning of April 2010 is bound to fundamentally change the way OpenStreetMap (OSM) information is produced in the UK. So just before this major change start to influence OpenStreetMap, it is worth evaluating what has been achieved so far without this data. It is also the time to update the completeness study, as the previous ones were conducted with data from March 2008 and March 2009.
Following the same method that was used in all the previous studies (which is described in details here), the latest version of Meridian 2 from OS OpenData was downloaded and used and compared to OSM data which was downloaded from GeoFabrik. The processing is now streamlined with MapBasic scripts, PostGIS scripts and final processing in Manifold GIS so it is possible to complete the analysis within 2 days. The colour scheme for the map is based on Cynthia Brewer and Mark Harrower‘s ColorBrewer 2.
By the end of March 2010, OpenStreetMap coverage of England grown to 69.8% from 51.2% a year ago. When attribute information is taken into account, the coverage grown to 24.3% from 14.7% a year ago. The chart on the left shows how the coverage progressed over the past 2 years, using the 4 data points that were used for analysis – March 2008, March 2009, October 2009 and March 2010. Notice that in terms of capturing the geometry less than 5% are now significantly under mapped when compared to Meridian 2. Another interesting aspect is the decline in empty cells – that is grid cells that don’t have any feature in Meridian 2 but now have features from OSM appearing in them. So in terms of capturing road information for England, it seems like the goal of capturing the whole country with volunteer effort was within reach, even without the release of Ordnance Survey data.
On the other hand, when attributes are included in the analysis, the picture is very different.
The progression of coverage is far from complete, and although the area that is empty of features that include street or road name in Meridian 2 is much larger, the progress of OSM mappers in completing the information is much slower. While the geometry coverage gone up by 18.6% over the past year, less than 10% (9.6% to be precise) were covered when attributes are taken into account. The reason for this is likely to be the need to carry a ground survey to find the street name without using other copyrighted sources.
The attribute area is the one that I would expect will show the benefits of Ordnance Survey data release to OSM mapping. Products such as StreetView and VectorMap District can be used to either copy the street name (StreetView) or write an algorithm that will copy the street name and other attributes from a vector data set – such as Meridian 2 or VectorMap District.
Of course, this is a failure of the ‘crowd’ in the sense that as this bit of information previously required an actual visit on the ground and it was a more challenging task than finding the people who are happy to volunteer their time to digitise maps.
As in the previous cases, there are local variations, and the geography of the coverage is interesting. The information includes 4 time points, so the most appropriate visualisation is one that allows for comparison and transition between maps. Below is a presentation (you can download it from SlideShare) that provides maps for the whole of England as well as 5 regional maps, roughly covering the South West, London, Birmingham and the Midlands, Manchester and Liverpool, and Newcastle upon Tyne and the North West.
If you want to create your own visualisation, of use the results of this study, you can download the results in a shapefile format from here.
For a very nice visualisation of Meridian 2 and OpenStreetMap data – see Ollie O’Brien SupraGeography blog .
29 January, 2010
After the publication of the comparison of OpenStreetMap and Google Map Maker coverage of Haiti, Nicolas Chavent from the Humanitarian OpenStreetMap Team contacted me and turned my attention to the UN Stabilization Mission in Haiti’s (known as MINUSTAH) geographical dataset, which is seen as the core set for the post earthquake humanitarian effort, and therefore a comparison with this dataset might be helpful, too. The comparison of the two Volunteered Geographical Information (VGI) datasets of OpenStreetMap and Google Map Maker with this core dataset also exposed an aspect of the usability of geographical information in emergency situations that is worth commenting on.
For the purpose of the comparison, I downloaded two datasets from GeoCommons – the detailed maps of Port-au-Prince and the Haiti road network. Both are reported on GeoCommons as originating from MINUSTAH. I combined them together, and then carried out the comparison. As in the previous case, the comparison focused only on the length of the roads, with the hypothesis that, if there is a significant difference in the length of the road at a given grid square, it is likely that the longer dataset is more complete. The other comparisons between established and VGI datasets give ground to this hypothesis, although caution must be applied when the differences are small. The following maps show the differences between the MINUSTAH dataset and OpenStreetMap and MINUSTAH and Google Map Maker datasets. I have also reproduced the original map that compares OpenStreetMap and Map Maker for the purpose of comparison and consistency, as well as for cartographic quality.
The maps show that MINUSTAH does provide fairly comprehensive coverage across Haiti (as expected) and that the volunteered efforts of OpenStreetMap and Map Maker provide further details in urban areas. There are areas that are only covered by one of the datasets, so they all have value.
The final comparison uses the 3 datasets together, with the same criteria as in the previous map – the dataset with the longest length of roads is the one that is considered the most complete.
It is interesting to note the south/north divide between OpenStreetMap and Google Map Maker, with Google Map Maker providing more details in the north, and OpenStreetMap in the south (closer to the earthquake epicentre). When compared over the areas in which there is at least 100 metres of coverage of MINUSTAH, OpenStreetMap is, overall, 64.4% complete, while Map Maker is 41.2% complete. Map Maker is covering further 354 square kilometres which are not covered by MINUSTAH or OpenStreetMap, and OpneStreetMap is covering further 1044 square kilometres that are missing from the other datasets, so clearly there is a benefit in integrating them. The grid that includes the analysis of the integrated datasets in shapefile format is available here, in case that it is of any use or if you like to carry out further analysis and or visualise it.
While working on this comparison, it was interesting to explore the data fields in the MINUSTAH dataset, with some of them included to provide operational information, such as road condition, length of time that it takes to travel through it, etc. These are the hallmarks of practical and operational geographical information, with details that are relevant directly to the end-users in their daily tasks. The other two datasets have been standardised for universal coverage and delivery, and this is apparent in their internal data structure. Google Map Maker schema is closer to traditional geographical information products in field names and semantics, exposing the internal engineering of the system – for example, including a country code, which is clearly meaningless in a case where you are downloading one country! OpenStreetMap (as provided by either CloudMade or GeoFabrik) keeps with the simplicity mantra and is fairly basic. Yet, the scheme is the same in Haiti as in England or any other place. So just like Google, it takes a system view of the data and its delivery.
This means that, from an end-user perspective, while these VGI data sources were produced in a radically different way to traditional GI products, their delivery is similar to the way in which traditional products were delivered, burdening the user with the need to understand the semantics of the different fields before using the data.
In emergency situations, this is likely to present an additional hurdle for the use of any data, as it is not enough to provide the data for download through GeoCommons, GeoFabrik or Google – it is how it is going to be used that matters. Notice that the maps tell a story in which an end-user who wants to have full coverage of Haiti has to combine three datasets, so the semantic interpretation can be an issue for such a user.
So what should a user-centred design of GI for an emergency situation look like? The general answer is ‘find the core dataset that is used by the first responders, and adapt your data to this standard’. In the case of Haiti, I would suggest that the MINUSTAH dataset is a template for such a thing. It is more likely to find users of GI on the ground who are already exposed to the core dataset and familiar with it. The fields are relevant and operational and show that this is more ‘user-centred’ than the other two. Therefore, it would be beneficial for VGI providers who want to help in an emergency situation to ensure that their data comply to the local de facto standard, which is the dataset being used on the ground, and bring their schema to fit it.
Of course, this is what GI ontologies are for, to allow for semantic interoperability. The issue with them is that they add at least two steps – define the ontology and figure out the process to translate the dataset that you have acquired to the required format. Therefore, this is something that should be done by data providers, not by end-users when they are dealing with the real situation on the ground. They have more important things to do than to find a knowledge engineer that can understand semantic interoperability…