PhD studentship in collaboration with the Ordnance Survey – identifying systematic biases in crowdsourced geographic information

Deadline 31st August 2017

UCL Department of Geography and the Ordnance Survey are inviting applications for a PhD studentship to explore the internal systematic biases in crowd-sourced geographic information datasets (also known as Volunteered Geographic Information – VGI).

The studentship provides an exciting opportunity for a student to work with Ordnance Survey on understanding the use of crowd-sourced geographic information MH DSCN0571and potentially contributing to the use of such data sources by national mapping agencies. Ordnance Survey is an active partner in its sponsored research and offers students opportunities to work on-site and to contribute to workshops and innovation within the business. In addition, the student will be part of the Extreme Citizen Science group at UCL, which is one of the leading research groups in the area of crowdsourced geographic information and the study thereof.

For more information about the project, the studentship and details how to apply, please see here. or below:


Start Date: October 2017

Funding status: Applications are invited from UK and EU citizenship holders.

Funding Body: EPSRC and Ordnance Survey

Funding Details: The scholarship covers UCL student fees at the Home/EU rate and provides a stipend of £16,553 per annum tax free. Travel expenses and research equipment will also be provided to the successful candidate.

Project Description:

UCL Department of Geography and the Ordnance Survey are inviting applications for a PhD studentship to explore the internal systematic biases in crowd-sourced geographic information datasets (also known as Volunteered Geographic Information – VGI).


There has been a rapid increase in information gathered by people from all walks of life who are using connected devices with an ability to collect and share geographic information, such as GPS tracks, photographs with location information, or observations of the natural environment in citizen science projects. There is now a vast array of projects and activities that use this type of information, and each project has its own characteristics. Yet, it can be hypothesised that some of the characteristics of crowd-sourced geographic information will be systematically biased, and these biases differ between projects and data sources.


Crowd-sourced datasets will have some systematic biases that repeat across crowd-sourcing platforms. For example the impact of population density, business activity, and tourism on the places where data is available, or a weekend or seasonal bias of the temporal period of data collection. Others biases are project-specific – for example, some projects manage to attract more young men, and therefore places that are of interest to this demographic will be over-represented. One of the major obstacles that limit the use of such data sources is understanding and separating systematic and project-level biases and then developing statistical methods to evaluate their impact. In order to use such datasets to identify hidden features and patterns, there is a need to identify what are the relationships between a dataset and the world.

The aim of this research project, therefore, is to create a large collection of crowd-sourced GPS tracks and pedestrian trajectories, and use conflation techniques and advanced analytics to develop methodologies to identify and estimate the biases. Once this is done, the aim will be to identify hidden characteristics to be more confident about the patterns that are being observed.

Studentship Description

The studentship provides an exciting opportunity for a student to work with Ordnance Survey on understanding the use of crowd-sourced geographic information, and potentially contributing to the use of such data sources by national mapping agencies. Ordnance Survey is an active partner in its sponsored research and offers students opportunities to work on-site and to contribute to workshops and innovation within the business. In addition, the student will be part of the Extreme Citizen Science group at UCL, which is one of the leading research groups in the area of crowdsourced geographic information and the study thereof.

The project will run for four years and will be supervised by Prof Muki Haklay from UCL and Jeremy Morley from Ordnance Survey. Professor Muki Haklay, who is a professor in the UCL Department of Geography and who has a track record of research and publication relating to crowdsourced data management and quality. Jeremy Morley is the Chief Geospatial Scientist at Ordnance Survey, leading the long-term business research programme, and has research experience in crowd-sourced geographic information.

 Person Specification

Applicants should possess a strong bachelor’s degree (1st Class or 2:1 minimum) or Masters degree in Computer Science, Spatial statistics, Ecology, Geomatics, Geographic Information Science or a related discipline. The skills required to build the required database of case studies and the programming and analytical skills to assess biases and develop algorithms for their identification, are highly desirable. Candidates will ideally have some relevant previous research experience and should also have excellent communication and presentation skills.

The funding is provided for 4 years, and will involve spending time at the Ordnance Survey in Southampton.


Applications are invited from UK and EU citizens residing in UK. In particular, applicants must meet EPSRC eligibility and residency requirements found here:

Application Procedure

Applicants should send the following by e-mail to Judy Barrett ( and Prof Haklay (

  1. Cover letter, including a personal statement explaining your interest in the project.
  2. Examples of academic writing and outputs from past work (e.g. a dissertation or assignment)
  3. Academic transcripts
  4. A CV

Shortlisted applicants will be invited to interview on the August 2017. Any incomplete applications will not be considered.


Building Centre – from Mapping to Making

The London based Building Centre organised an evening event – from Mapping to Making –  which looked at the “radical evolution in the making and meaning of maps is influencing creative output. New approaches to data capture and integration – from drones to crowd-sourcing – suggest maps are changing their impact on our working life, particularly in design.”  The event included 5 speakers (including me, on behalf of Mapping for Change) and a short discussion.

Lewis Blackwell of the Building Centre opened the evening by noting that in a dedicated exhibition on visualisation and the city, the Building Centre is looking at new visualisation techniques. He realised that a lot of the visualisations are connected to mapping – it’s circular: mapping can ask and answer questions about the design process of the build environment, and changes in the built environment create new data. The set of talks in the evening is exploring the role of mapping.

Rollo Home, Geospatial Product Development Manager, Ordnance Survey (OS), started by thinking about the OS as the ‘oldest data company in the world‘. The OS thinking of itself as data company – the traditional mapping products that are very familiar represent only 5% of turnover. The history of OS go back to 1746 and William Roy’s work on accurately mapping Britain. The first maps produced in Kent, for the purpose of positioning ordinances. The maps of today, when visualised, look somewhat the same as maps from 1800, but the current maps are in machine readable formats that mean that the underlying information is very different. Demands for mapping changed over the years: Originally for ordinances, then for land information and taxation, and later helping the development of the railways. During WW I & II the OS led many technological innovations – from national grid in 1930s to photogrammetry. In 1973 the first digital maps were produced, and the process was completed in the 1980s. This was, in terms of data structures, still structured as a map. Only in 2000, MasterMap appear with more machine readable format that is updated 10,000 times a day, based on Oracle database (the biggest spatial data in the world) – but it’s not a map. Real world information is modelled to allow for structure and meaning. Ability to answer questions from the database is critical to decision-making. The information in the data can become explicit to many parts of the information – from the area of rear gardens to height of a building. They see developments in the areas of oblique image capture, 3D data, details under the roof, facades and they do a lot of research to develop their future directions – e.g. challenges of capturing data in cloud points. They see data that come from different sources including social media, satellite, UAVs, and official sources. Most of Smart Cities/Transport etc. areas need geospatial information and the OS is moving from mapping to data, and enabling better decisions.

Rita Lambert, Development Planning Unit, UCL. Covered the ReMap Lima project – running since 2012, and looking at marginalised neighbourhoods in the city. The project focused on the questions of what we are mapping and what we are making through representations. Maps contain potential of what might become – we making maps and models that are about ideas, and possibilities for more just cities. The project is collaboration between DPU and CASA at UCL, with 3 NGOs in Lima, and 40 participants from the city. They wanted to explore the political agency of mapping, open up spaces to negotiate outcomes and expand the possibilities of spatial analysis in marginalised areas in a participatory action-learning approach. The use of technology is in the context of very specific theoretical aims. Use of UAV is deliberate to explore their progressive potential. They mapped the historic centre which is overmapped and it is marginalised through over-representation (e.g. using maps to show that it need regeneration) while the periphery is undermapped – large part of the city (50% of the area), and they are marginalised through omission. Maps can act through undermapping or overmapping. Issues are very different – from evictions, lack of services, loss of cultural heritage (people and building) at the centre, while at the informal settlement there are risks, land trafficking, destruction of ecological infrastructure, and lack of coordination between spatial planning between places. The process that they followed include mapping from the sky (with a drone) and mapping from the ground (through participatory mapping using aerial images). The drones provided the imagery in an area that changes rapidly – and the outputs were used in participatory mapping, with the people on the ground deciding what to map and where to map. The results allow to identify eviction through changes to the building that can be observed from above. The mapping process itself was also a mean to strengthen community organisations. The use of 3D visualisation at the centre and at the periphery helped in understanding the risks that are emerging or the changes to their area. Data collection is using both maps and data collection through tools such as EpiCollect+ and community mapping, and also printing 3D models so they can used by discussions and conversations. The work carries on as the local residents continue the work. The conclusion: careful consideration for the use of technology in the context, and mapping from the sky and the ground go hand in hand. Creating these new representation are significant and what is that we are producing. more information at  and

Simon Mabey, Digital Services Lead for City Modelling, Arup. Simon discussed city modelling in Arup – with the moved from visualisation to more sophisticated models. He leads on modelling cities in 3D, since the 1988, when visualisation of future designs was done stitching pieces of paper and photos. The rebuilding of Manchester in the mid 1990s, led to the development of 3D urban modelling, with animations and created an interactive CDROM. This continued to develop the data about Manchester and then shared it with others. The models were used in different ways – from gaming software to online, and trying to find ways to allow people to use it in real world context. Many models are used in interactive displays – e.g. for attracting inward investment. They went on to model many cities across the UK, with different levels of details and area that is covered. They also starting to identify features underground – utilities and the such. Models are kept up to date through collaboration, with clients providing back information about things that they are designing and integrating BIM data. In Sheffield, they also enhance the model through planning of new projects and activities. Models are used to communicate information to other stakeholders – e.g. traffic model outputs, and also do that with pedestrians movement. Using different information to colour code the model (e.g. enregy) or acoustic modelling or flooding. More recently, they move to city analytics, understanding the structure within models – for example understanding solar energy potential with the use and consumption of the building. They find themselves needing information about what utility data exist and that need to be mapped and integrated into their analysis. They also getting mobile phone data to predict trip journeys that people make.

I was the next speaker, on behalf Mapping for Change. I provided the background of Mapping for Change, and the approach that we are using for the mapping. In the context of other talks, which focused on technology, I emphasised that just as we are trying to reach out to people in the places that they use daily and fit the participatory process into their life rhythms, we need to do it in the online environment. That mean that conversations need to go where they are – so linking to facebook, twitter or whatsapp. We should also know that people are using different ways to access information – some will use just their phone, other laptops, and for others we need to think of laptop/desktop environment. In a way, this complicates participatory mapping much more than earlier participatory web mapping systems, when participants were more used to the idea of using multiple websites for different purposes. I also mentioned the need for listening to the people that we work with, and deciding if information should be shown online or not – taking into account what they would like to do with the data. I mentioned the work that involve citizen science (e.g. air quality monitoring) but more generally the ability to collect facts and evidence to deal with a specific issue. Finally, I also used some examples of our new community mapping system, which is based on GeoKey.

The final talk was from Neil Clark, Founder, EYELEVEL. He is from an architectural visualisation company that work in the North East and operate in the built environment area. They are using architectural modelling and us Ordnance Survey data and then position the designs, so they can be rendered accurately. Many of the processes are very expensive and complex. They have developed a tool called EYEVIEW for accurate augmented reality – working on iPad to allow viewing models in real-time. This can cut the costs of producing these models. They use a tripod to make it easier to control. The tool is the outcome of 4 years of development, allow the navigation of the architectural model to move it to overlay with the image. They are aiming at Accurate Visual Representation and they follow the detailed framework that is used in London for this purpose

The discussion that follow explored the political nature of information and who is represented and how. A question to OS was how open it will be with the detailed data and while Rollo explained that access to the data is complicated one and it need to be funded. I found myself defending the justification of charging high detailed models by suggesting to imagine a situation where the universal provision of high quality data at national level wasn’t there, and you had to deal with each city data model.

The last discussion point was about the truth in the mapping and the positions that were raised – It about the way that people understand their truth or is there an absolute truth that is captured in models and maps – or represented in 3D visualisations? Interestingly, 3 of the talk assume that there is a way to capture specific aspects of reality (structures, roads, pollution) and model it by numbers, while Rita and I took a more interpretive and culturally led representations.

Third day of INSPIRE 2014 – any space for civil society and citizens?

At the last day of INSPIRE conference, I’ve attended a session about  apps and applications and the final plenary which focused on knowledge based economy and the role of inspire within it. Some notes from the talks including my interpretations and comments.

Dabbie Wilson from the Ordnance Survey highlighted the issues that the OS is facing in designing next generation products from an information architect point of view. She noted that the core large scale product, MasterMap has been around for 14 years and been provided in GML all the way through. She noted that now the client base in the UK is used to it and happy with (and when it was introduced, there was a short period of adjustment that I recall, but I assume that by now everything is routine). Lots of small scale products are becoming open and also provided as linked data. The user community is more savvy – they want the Ordnance Survey to push data to them, and access the data through existing or new services and not just given the datasets without further interaction. They want to see ease of access and use across multiple platforms. The OS is considering moving away from provision of data to online services as the main way for people to get access to the data. The OS is investing heavily in Mobile apps for leisure but also helping the commercial sector in developing apps that are based on OS data and tools. For example, OS locate app provide mechanisms to work worldwide so it’s not only UK. They also put effort to create APIs and SDKs – such as OS OnDemands – and also allowing local authorities to update their address data. There is also focus on cloud-based application – such as applications to support government activities during emergencies. The information architecture side moving from product to content. The OS will continue to maintain content that is product agnostic and running the internal systems for a long period of 10 to 20 years so they need to decouple outward facing services from the internal representation. The OS need to be flexible to respond to different needs – e.g. in file formats it will be GML, RDF and ontology but also CSV and GeoJSON. Managing the rules between the various formats is a challenging task. Different representations of the same thing is another challenge – for example 3D representation and 2D representation.

Didier Leibovici presented a work that is based on Cobweb project and discussing quality assurance to crowdsourcing data. In crowdsourcing there are issues with quality of both the authoritative and the crowdsourcing data. The COBWEB project is part of a set of 5 citizen observatories, exploring air quality, noise, water quality, water management, flooding and land cover, odour perception and nuisance and they can be seen at COBWEB is focusing on the infrastructure and management of the data. The pilot studies in COBWEB look at landuse/land cover, species and habitat observations and flooding. They are mixing sensors in the environment, then they get the data in different formats and the way to managed it is to validate the data, approve its quality and make sure that it’s compliant with needs. The project involve designing an app, then encouraging people to collect the data and there can be lack of connection to other sources of data. The issues that they are highlighting are quality/uncertainty, accuracy, trust and relevance. One of the core questions is ‘is crowd-sourcing data need to different to any other QA/QC?’ (my view: yes, but depending on the trade offs in terms of engagement and process) they see a role of crowdsourcing in NSDI, with real time data capture QA and post dataset collection QA (they do both) and there are also re-using and conflating data sources. QA is aimed to know what is collected  – there are multiple ways to define the participants which mean different ways of involving people and this have implications to QA. They are suggesting a stakeholder quality model with principles such as vaueness, ambiguity, judgement, reliability, validity, and trust. There is a paper in AGILE 2014 about their framework.  The framework suggests that the people who build the application need to develop the QA/QC process and do that with workflow authoring tool, which is supported with ontology and then running it as web processing service. Temporality of data need to be consider in the metadata, and how to update the metadata on data quality.

Patrick Bell considered the use of smartphone apps – in a project of the BGS and the EU JRC they review existing applications. The purpose of the survey to explore what national geological organisations can learn from the shared experience with development of smartphone apps – especially in the geological sector. Who is doing the development work and which partnerships are created? What barriers are perceived and what the role of INSPIRE directive within the development of these apps? They also try to understand who are the users?  There are 33 geological survey organisations in the EU and they received responses from 16 of them. They found 23 different apps – from BGS – iGeology and provide access to geological amps and give access to subsidence and radon risk with in-app payment. They have soil information in the MySoil app which allow people to get some data for free and there is also ability to add information and do citizen science. iGeology 3D is adding AR to display a view of the geological map locally. aFieldWork is a way to capture information in harsh environment of Greenland.  GeoTreat is providing information of sites with special value that is relevant to tourists or geology enthusiasts. BRGM – i-infoTerre provide geological information to a range of users with emphasis on professional one, while i-infoNappe tell you about ground water level. The Italian organisation developed Maps4You with hiking route and combining geology with this information in Emilia-Romagna region. The Czech Geologcial survey provide data in ArcGIS online.

The apps deal with a wide range of topics, among them geohazards, coastline, fossils, shipwrecks … The apps mostly provide map data and 3D, data collection and tourism. Many organisation that are not developing anything stated no interest or a priority to do so, and also lack of skills. They see Android as the most important – all apps are free but then do in app purchase. The apps are updated on a yearly basis. about 50% develop the app in house and mostly work in partnerships in developing apps. Some focus on webapps that work on mobile platform, to cross platform frameworks but they are not as good as native apps, though the later are more difficult to develop and maintain. Many people use ESRI SDK and they use open licenses. Mostly there is lack of promotion of reusing the tools – most people serve data. Barriers – supporting multiple platform, software development skills, lack of reusable software and limited support to reuse across communities – heavy focus on data delivery, OGC and REST services are used to deliver data to an app. Most suggesting no direct link to INSPIRE by respondents but principles of INSPIRE are at the basis of these applications.

Timo Aarmio – presented the OSKARI platform to release open data to end users ( They offer role-based security layers with authenticates users and four levels of permissions – viewing, viewing on embedded maps, publishing and downloading. The development of Oskari started in 2011 and is used by 16 member organisations and the core team is running from National Land Survey of Finland. It is used in Arctic SDI, ELF and Finish Geoportal – and lots of embedded maps. The end-users features allow search of metadata, searching map layers by data providers or INSPIRE themes. they have drag and drop layers and customisation of features in WFS.  Sharing is also possible with uploading shapefiles by users.  They also have printing functionality which allow PNG or PDF and provide also embedded maps so you can create a map and then embed  it in your web page.  The data sources that they support are OGC web services – WMS, WMTS, WFS, CSW and also ArcGIS REST, data import for Shapefiles and KML, and JSON for thematic maps . Spatial analysis is provided with OGC Web Processing Service – providing basic analysis of 6 methods – buffer, aggregate, union, intersect, union of analysed layres and area and sector. They are planning to add thematic maps, more advanced spatial analysis methods, and improve mobile device support. 20-30 people work on Oskari with 6 people at the core of it.

The final session focused on knowledge based economy and the link to INSPIRE.

Andrew Trigg provide the perspective of HMLR on fueling the knowledge based economy with open data. The Land registry dealing with 24 million titles with 5 million property transaction a year. They provided open access to individual titles since 1990 and INSPIRE and the open data agenda are important to the transition that they went through in the last 10 years. Their mission is now include an explicit reference to the management and reuse of land and property data and this is important in terms of how the organisation defines itself. From the UK context there is shift to open data through initiatives such as INSPIRE, Open Government Partnership, the G8 Open Data Charter (open by default) and national implementation plans. For HMLR, there is the need to be INSPIRE Compliance, but in addition, they have to deal with public data group, the outcomes of the Shakespeare review and commitment to a national information infrastructure. As a result, HMLR now list 150 datasets but some are not open due to need to protect against fraud and other factors. INSPIRE was the first catalyst to indicate that HMLR need to change practices and allowed the people in the organisation to drive changes in the organisation, secure resources and invest in infrastructure for it. It was also important to highlight to the board of the organisation that data will become important. Also a driver to improving quality before releasing data. The parcel data is available for use without registration. They have 30,000 downloads of the index polygon of people that can potentially use it. They aim to release everything that they can by 2018.

The challenges that HMLR experienced include data identification, infrastructure, governance, data formats and others. But the most important to knowledge based economy are awareness, customer insight, benefit measurement and sustainable finance. HMLR invested effort in promoting the reuse of their data however, because there is no registration, their is not customer insight but no relationships are being developed with end users – voluntary registration process might be an opportunity to develop such relations. Evidence is growing that few people are using the data because they have low confidence in commitment of providing the data and guarantee stability in format and build applications on top of it, and that will require building trust. knowing who got the data is critical here, too. Finally, sustainable finance is a major thing – HMLR is not allowed to cross finance from other areas of activities so they have to charge for some of their data.

Henning Sten Hansen from Aalborg University talked about the role of education. The talk was somewhat critical of the corporatisation of higher education, but also accepting some of it’s aspects, so what follows might be misrepresenting his views though I think he tried to mostly raise questions. Henning started by noting that knowledge workers are defined by OECD as people who work autonomously and reflectively, use tools effectively and interactively, and work in heterogeneous groups well (so capable of communicating and sharing knowledge). The Danish government current paradigm is to move from ‘welfare society’ to the ‘competitive society’ so economic aspects of education are seen as important, as well as contribution to enterprise sector with expectations that students will learn to be creative and entrepreneurial. The government require more efficiency and performance from higher education and as a result  reduce the autonomy of individual academics. There is also expectation of certain impacts from academic research and emphasis on STEM  for economic growth, governance support from social science and the humanities need to contribute to creativity and social relationships. The comercialisation is highlighted and pushing patenting, research parks and commercial spin-offs. There is also a lot of corporate style behaviour in the university sector – sometime managed as firms and thought as consumer product. He see a problem that today that is strange focus and opinion that you can measure everything with numbers only. Also the ‘Google dream’ dream is invoked – assuming that anyone from any country can create global companies. However, researchers that need time to develop their ideas more deeply – such as Niels Bohr who didn’t published and secure funding – wouldn’t survive in the current system. But is there a link between education and success? LEGO founder didn’t have any formal education [though with this example as with Bill Gates and Steve Jobs, strangely their business is employing lots of PhDs – so a confusion between a person that start a business and the realisation of it]. He then moved from this general context to INSPIRE, Geoinformation plays a strong role in e-Governance and in the private sector with the increase importance in location based services. In this context, projects such as GI-N2K (Geographic Information Need to Know) are important. This is a pan European project to develop the body of knowledge that was formed in the US and adapting it to current need. They already identified major gaps between the supply side (what people are being taught) and the demand side – there are 4 areas that are cover in the supply side but the demand side want wider areas to be covered. They aim to develop a new BoK for Europe and facilitating knowledge exchange between institutions. He concluded that Higher education is prerequisite  for the knowledge economy – without doubt but the link to innovation is unclear . Challenges – highly educated people crowd out the job market and they do routine work which are not matching their skills, there are unclear the relationship to entrepreneurship and innovation and the needed knowledge to implement ideas. What is the impact on control universities reducing innovation and education – and how to respond quickly to market demands in skills when there are differences in time scale.

Giacomo Martirano provided a perspective of a micro-enterprise ( in southern Italy. They are involved in INSPIRE across different projects – GeoSmartCities, Smart-Islands and SmeSpire – so lots of R&D funding from the EU. They are also involved in providing GIS services in their very local environment. From a perspective of SME, he see barriers that are orgnaisational, technical and financial. They have seen many cases of misalignment of technical competencies of different organisations that mean that they can’t participate fully in projects. Also misalignment of technical ability of clients and suppliers, heterogeneity in client organisation culture that add challenges. Financial management of projects and payment to organisations create problems to SME to join in because of sensitivity to cash-flow. They experience cases were awarded contracts won offering a price which is sometime 40% below the reference one. There is a need to invest more and more time with less aware partners and clients. When moving to the next generation of INSPIRE – there is a need to engage with micro-SMEs in the discussion ‘don’t leave us alone’ as the market is unfair. There is also a risk that member states, once the push for implementation reduced and without the EU driver will not continue to invest. His suggestion is to progress and think of INSPIRE as a Serivce – SDI as a Service can allow SMEs to join in. There is a need for cooperation between small and big players in the market.

Andrea Halmos (public services unit, DG CONNECT) – covering e-government, she noted her realisation that INSPIRE is more than ‘just environmental information’. From DG CONNECT view, ICT enabled open government, and the aim of the digital agenda for Europe is to empowering citizen and businesses, strengthening the internal market, highlighting efficiency and effectiveness and recognised pre-conditions. One of the focus is the effort to put public services in digital format and providing them in cross border way. The principles are to try to be user centred, with transparency and cross border support – they have used life events for the design. There are specific activities in sharing identity details, procurement, patient prescriptions, business, and justice.  They see these projects as the building blocks for new services that work in different areas. They are seeing challenges such financial crisis, but there is challenge of new technologies and social media as well as more opening data. So what is next to public administration? They need to deal with customer – open data, open process and open services – with importance to transparency, collaboration and participation ( The services are open to other to join in and allow third party to create different public services. We look at analogies of opening decision making processes and support collaboration with people – it might increase trust and accountability of government. The public service need to collaborative with third parties to create better or new services. ICT is only an enablers – you need to deal with human capital, organisational issue, cultural issues, processes and business models – it even question the role of government and what it need to do in the future. What is the governance issue – what is the public value that is created at the end? will government can be become a platform that others use to create value? They are focusing on Societal Challenge   Comments on their framework proposals are welcomed – it’s available at 

After these presentations, and when Alessandro Annoni (who was charring the panel) completed the first round of questions, I was bothered that in all these talks about knowledge-based economy only the government and the private sector were mentioned as actors, and even when discussing development of new services on top of the open data and services, the expectation is only for the private sector to act in it. I therefore asked about the role of the third-sector and civil-society within INSPIRE and the visions that the different speakers presented. I even provided the example of mySociety – mainly to demonstrate that third-sector organisations have a role to play.

To my astonishment, Henning, Giacomo, Andrea and Alessandro answered this question by first not treating much of civil-society as organisations but mostly as individual citizens, so a framing that allow commercial bodies, large and small, to act but citizens do not have a clear role in coming together and acting. Secondly, the four of them seen the role of citizens only as providers of data and information – such as the reporting in FixMyStreet. Moreover, each one repeated that despite the fact that this is low quality data it is useful in some ways. For example, Alessandro highlighted that OSM mapping in Africa is an example for a case where you accept it, because there is nothing else (really?!?) but in other places it should be used only when it is needed because of the quality issue – for example, in emergency situation when it is timely.

Apart from yet another repetition of dismissing citizen generated environmental information on the false argument of data quality (see Caren Cooper post on this issue), the views that presented in the talks helped me in crystallising some of the thoughts about the conference.

As one would expect, because the participants are civil servants, on stage and in presentations they follow the main line of the decision makers for which they work, and therefore you could hear the official line that is about efficiency, managing to do more with reduced budgets and investment, emphasising economic growth and very narrow definition of the economy that matters. Different views were expressed during breaks.

The level in which the citizens are not included in the picture was unsurprising under the mode of thinking that was express in the conference about the aims of information as ‘economic fuel’. While the tokenism of improving transparency, or even empowering citizens appeared on some slides and discussions, citizens are not explicitly included in a meaningful and significant way in the consideration of the services or in the visions of ‘government as platform’. They are reprieved as customers or service users.  The lesson that were learned in environmental policy areas in the 1980s and 1990s, which are to provide an explicit role for civil society, NGOs and social-enterprises within the process of governance and decision making are missing. Maybe this is because for a thriving civil society, there is a need for active government investment (community centres need to built, someone need to be employed to run them), so it doesn’t match the goals of those who are using austerity as a political tool.

Connected to that is the fact that although, again at the tokenism level, INSPIRE is about environmental applications, the implementation now is all driven by narrow economic argument. As with citizenship issues, environmental aspects are marginalised at best, or ignored.

The comment about data quality and some responses to my talk remind me of Ed Parsons commentary from 2008 about the UK GIS community reaction to Web Mapping 2.0/Neogeography/GeoWeb. 6 years on from that , the people that are doing the most important geographic information infrastructure project that is currently going, and it is progressing well by the look of it, seem somewhat resistant to trends that are happening around them. Within the core area that INSPIRE is supposed to handle (environmental applications), citizen science has the longest history and it is already used extensively. VGI is no longer new, and crowdsourcing as a source of actionable information is now with a decade of history and more behind it. Yet, at least in the presentations and the talks, citizens and civil-society organisations have very little role unless they are controlled and marshaled.

Despite all this critique, I have to end with a positive note. It has been a while since I’ve been in a GIS conference that include the people that work in government and other large organisations, so I did found the conference very interesting to reconnect and learn about the nature of geographic information management at this scale. It was also good to see how individuals champion use of GeoWeb tools, or the degree in which people are doing user-centred design.

Usability of Geographical Information – the case of Code-Point Open

Ordnance Survey Code-Point OpenOne of the surprises of the Ordnance Survey OpenData release at the beginning of April was the inclusion of the Code-Point Open dataset, which lists the location of all postcodes in England, Wales and Scotland. This was clearly a very important dataset because of the way postcode geography drives many services and activities in the UK. Before the release, the costs of using postcodes in geographical analysis were prohibitive for many small organisations.

So how usable is this free Code-Point data? The principle of ‘do not look a gift horse  in the mouth’ doesn’t apply here. The whole point of releasing the data is to make it as useful as possible to encourage innovation, so it should be made available in a way that makes it easy to reuse. I evaluated it while analysing a dataset of 11,000 volunteers’ postcodes that I received from a third sector organisation.

The download process is excellent and easy, apart from the fact that there is no clear and short description of the products in a non-technical manner next to each product. To find a description, you need to go to the product page – so you are at least 2 clicks away from the product details. It would be better to have a link from each product and include a brief description in the download page. We will see in a second why this is important…

The next step was the download itself and the opening of the zip file, which was clear and easy. There is an oddity with all Ordnance Survey data that they have a redundant sub-directory in them – so in this case the data resides under \codepo_gb\Code-Point Open\ . The fact that the files is broken up into postcode area instead of one big file of 157MB is fine, but it can be helpful to remind users that they can concatenate files using simple commands – this is especially necessary to less tech-savvy users. So an explanation for Windows users that you can open the Command window using ‘cmd.exe’ and run ‘type a.csv b.csv > common.csv’ can save some people plenty of time.

But the real unpleasant surprise was that nowhere in the downloaded package is there a description of the fields in the files! So you open the files and need to figure out what the fields are. The user manual is hides 4 clicks away from the download page and luckily I knew that the ‘user manual’ is stored under ‘technical information’ on the product page, which is not that obvious at first visit. Why not deliver the user manual with the product ?!? The Doc directory is an obvious place to store it.

The user manual reveals that there are 19 fields in the file, of which 9 (half!) are ‘not available in Code-Point Open’ – so why are they delivered? After figuring out the fields, I created a single line that can be attached to the files before importing them to a GIS:

Postcode,Positional Quality,PR Delete,TP Delete,DQ Delete,RP Delete,BP Delete,PD Delete,MP Delete,UM Delete,Easting,Northing,Country,Regional Health Authority,Health Authority,County,District,Ward,LS Delete.

Of course, all the fields with ‘Delete’ in the name mean that they should be deleted once imported.

Interestingly, once you delete these fields, the total size of Code-Point Open drops from 157MB to 91MB – which means that it can save the Ordnance Survey bandwidth and carbon emissions by making the file smaller.

Another interesting point is that the user manual includes detailed instructions on how to change the postcode to a ‘single spaced postcode’. The instructions are for Excel, Mapinfo and ArcGIS. This is the type of information that can help end-users start using the data faster. Finally, you can use this wonderful information to create lovely maps.

All these problems are minor, apart from the description of the fields which is a major usability error. Similar analysis can be carried out for any of the Ordnance Survey datasets, to ensure that they are useful to their users. There are some easy improvements, such as including the user manual with the distribution, and I’m sure that, over time, the team at the Ordnance Survey will find the time to sort these issues.

Usability of VGI in Haiti earthquake response and the 2nd workshop on usability of geographic information

On the 23rd March 2010, UCL hosted the second workshop on usability of geographic information, organised by Jenny Harding (Ordnance Survey Research), Sarah Sharples (Nottingham), and myself. This workshop was extending the range of topics that we have covered in the first one, on which we have reported during the AGI conference last year. This time, we had about 20 participants and it was an excellent day, covering a wide range of topics – from a presentation by Martin Maguire (Loughborough) on the visualisation and communication of Climate Change data, to Johannes Schlüter (Münster) discussion on the use of XO computers with schoolchildren, to a talk by Richard Treves (Southampton) on the impact of Google Earth tours on learning. Especially interesting are the combination of sound and other senses in the work on Nick Bearman (UEA) and Paul Kelly (Queens University, Belfast).

Jenny’s introduction highlighted the different aspects of GI usability, from those that are specific to data to issues with application interfaces. The integration of data with software that creates the user experience in GIS was discussed throughout the day, and it is one of the reasons that the issue of the usability of the information itself is important in this field. The Ordnance Survey is currently running a project to explore how they can integrate usability into the design of their products – Michael Brown’s presentation discusses the development of a survey as part of this project. The integration of data and application was also central to Philip Robinson (GE Energy) presentation on the use of GI by utility field workers.

My presentation focused on some preliminary thoughts that are based on the analysis of OpenStreetMap  and Google Map communities response to the earthquake in Haiti at the beginning of 2010. The presentation discussed a set of issues that, if explored, will provide insights that are relevant beyond the specific case and that can illuminate issues that are relevant to daily production and use of geographic information. For example, the very basic metadata that was provided on portals such as GeoCommons and what users can do to evaluate fitness for use of a specific data set (See also Barbara Poore’s (USGS) discussion on the metadata crisis).

Interestingly, the day after giving this presentation I had a chance to discuss GI usability with Map Action volunteers who gave a presentation in GEO-10 . Their presentation filled in some gaps, but also reinforced the value of researching GI usability for emergency situations.

For a detailed description of the workshop and abstracts – see this site. All the presentations from the conference are available on SlideShare and my presentation is below.

OpenStreetMap and Meridian 2 – releasing the outputs

Back in September, during AGI Geocommunity ’09, I had a chat with Jo Cook about the barriers to the use of OpenStreetMap data by people who are not experts in the ways the data was created and don’t have the time and resources to evaluate the quality of the information. One of the difficulties is to decide if the coverage is complete (or close to complete) for a given area.

To help with this problem, I obtained permission from the Ordnance Survey research unit to release the results of my analysis, which compares OpenStreetMap coverage to the Ordnance Survey Meridian 2 dataset (see below about the licensing conundrum that the analysis produced as a by-product).

Before using the data, it is necessary to understnad how it was created. The methodology can be used for the comparison of completeness as well as the systematic analysis of other properties of two vector datasets. The methodology is based on the evaluation of two datasets A and B, where A is the reference dataset (Ordnance Survey Meridian 2 in this case) and B is the test dataset (OpenStreetMap), and a dataset C which includes the spatial units that will be used for the comparison (1km grid square across England).

The first step in the analysis is to decide on the spatial units that will be used in the comparison process (dataset C). This can be a reference grid with standard cell size, or some other meaningful geographical unit such as census enumeration units or administrative boundaries (see previous post, where lower level super output areas were used). There are advantages to the use of a regular grid, as this avoids problems that arise from the Modifiable Areal Unit Problem (MAUP) to some extent.

The two datasets (A and B) are then split along the boundaries of the geographical units, while preserving the attributes in each part of the object, to ensure that no information is lost. The splitting is necessary to support queries that address only objects that fall within each geographical unit.

The next step involves the creation of very small buffers around the geographical units. This is necessary because, due to computational errors in the algorithm that calculates the intersections and splits the objects and implementation of operators in the specific GIS package used, the co-ordinates where the object was split might be near, but not at, the boundary of the reference geographical unit. The buffers should be very small so as to ensure that only objects that should be calculated inside the unit’s area will be included in the analysis. In our case, the buffers are 25cm over grid square units that are 1km in length.

Finally, spatial queries can be carried out to evaluate the total length, area or any other property of dataset A that falls within each unit, and to compare these values to the results of the analysis of dataset B. The whole process is described in the image above.

The shape file provided here contains values from -4 to +4, and these values correspond to the difference between OpenStreetMap and Meridian 2. In each grid square, the following equation was calculated:

∑(OSM roads length)-∑(Meridian roads length)

If the value is negative, then the total length of Meridian objects is bigger than the length of OpenStreetMap objects. A value of -1, for example, means that ‘there are between 0 and 1000 metres more Meridian 2’ in this grid square whereas 1 means that ‘there are between 0 and 1000 metres more OpenStreetMap’. Importantly, 4 and -4 mean anything with a positive of negative difference of over 3000 metres. In general, the analysis shows that, if the difference is at levels 3 or 4, then you can consider OpenStreetMap as complete, while 1 and 2 will usually mean that some minor roads are likely to be missing. Also, -1 should be easy to complete. In areas where the values are -2 to -4, the OpenStreetMap community needs to do complete the map.

Finally, a licensing conundrum that shows the problems with both Ordnance Survey principles, which state that anything that is derived from its maps is Crown copyright and part of Ordnance Survey intellectual property, and with the use of the Creative Commons licence for OpenStreetMap data.

Look at the equation above. The left-hand side is indisputably derived from OpenStreetMap, so it is under the CC-By-SA licence. The right-hand side is indisputably derived from Ordnance Survey, so it is clearly Crown copyright. The equation, however, includes a lot of UCL’s work, and, most importantly, does not contain any geometrical object from either datasets – the grid was created afresh. Yet, without ‘deriving’ the total length from each dataset, it is impossible to compute the results that are presented here – but they are not derived by one or the other. So what is the status of the resulting dataset? It is, in my view, UCL copyright – but it is an interesting problem, and I might be wrong.

You can download the data from here – the file includes a metadata document.

If you use the dataset, please let me know what you have done with it.

OpenStreetMap and Ordnance Survey Master Map – Beyond good enough

OSM overlap with Master Map ITN for A and B roads
OSM overlap with Master Map ITN for A and B roads

In June, Aamer Ather, an M.Eng. student at the department, completed his research comparing OpenStreetMap (OSM) to Ordnance Survey Master Map Integrated Transport Layer (ITN). This was based on the previous piece of research in which another M.Eng. student, Naureen Zulfiqar, compared OSM to Meridian 2.

There are really surprising results. The analysis shows that when A-roads, B-roads and a motorway from ITN are compared to OSM data, the overlap can reach values that are over 95%. When the comparison with Master Map was completed, it became clear that OSM is of better quality than Meridian 2. It is also interesting to note that the results of higher overlap with ITN were achieved under stricter criteria for the buffering procedure that is used for comparison.

As noted, in the original analysis, Meridian 2 was used as the reference dataset, the ground truth. However, comparing Meridian 2 and OSM is not like with like, because OSM is not generalised and Meridian 2 is. The justification for treating Meridian 2 as the reference dataset was that the nodes are derived from high-accuracy datasets and it was expected that the 20 metres filter would not change positions significantly. It turns out that the generalisation impacts the quality of Meridian more than I anticipated. Yet, the advantage of Meridian 2 is that it allows comparisons for the whole of England, since the file size is still manageable, while the complexity of ITN would make an extensive comparison difficult, time-consuming and lengthy.

The results show that for the 4 Ordnance Survey London tiles that we’ve compared, the results put OSM only 10-30% from the ITN centre line. Rather impressive when you consider the knowledge, skills and backgrounds of the participants. My presentation from the State of the Map conference, below, provides more details of this analysis – and the excellent dissertation by Aamer Ather, which is the basis for this analysis, is available to download here.

The one caveat that will need to be explored in future projects is that the comparison in London means that OSM mappers had access to very high-resolution imagery from Yahoo! which have been georeferenced and rectified. Therefore, the high precision might be a result of tracing these images, and the question is what happens in places where high resolution images are not available. Thus, we need to test more tiles and in other places to validate the results in other areas of the UK.

Another student is currently comparing OSM to 1:10,000 map of Athens, so by the end of the summer I hope that it will be possible to estimate quality in other countries. The comparison to ITN in other areas of the UK will wait for a future student who will be interested in this topic!