The Vespucci initiative has been running for over a decade, bringing together participants from wide range of academic backgrounds and experiences to explore, in a ‘slow learning’ way, various aspects of geographic information science research. The Vespucci Summer Institutes are week long summer schools, most frequently held at Fiesole, a small town overlooking Florence. This year, the focus of the first summer institute was on crowdsourced geographic information and citizen science.

101_0083The workshop was supported by COST ENERGIC (a network that links researchers in the area of crowdsourced geographic information, funded by the EU research programme), the EU Joint Research Centre (JRC), Esri and our Extreme Citizen Science research group. The summer school included about 30 participants and facilitators that ranged from master students students that are about to start their PhD studies, to established professors who came to learn and share knowledge. This is a common feature of Vespucci Institute, and the funding from the COST network allowed more early career researchers to participate.

Apart from the pleasant surrounding, Vespucci Institutes are characterised by the relaxed, yet detailed discussions that can be carried over long lunches and coffee breaks, as well as team work in small groups on a task that each group present at the end of the week. Moreover, the programme is very flexible so changes and adaptation to the requests of the participants and responding to the general progression of the learning are part of the process.

This is the second time that I am participating in Vespucci Institutes as a facilitator, and in both cases it was clear that participants take the goals of the institute seriously, and make the most of the opportunities to learn about the topics that are explored, explore issues in depth with the facilitators, and work with their groups beyond the timetable.

101_0090The topics that were covered in the school were designed to provide an holistic overview of geographical crowdsourcing or citizen science projects, especially in the area where these two types of activities meet. This can be when a group of citizens want to collect and analyse data about local environmental concerns, or oceanographers want to work with divers to record water temperature, or when details that are emerging from social media are used to understand cultural differences in the understanding of border areas. These are all examples that were suggested by participants from projects that they are involved in. In addition, citizen participation in flood monitoring and water catchment management, sharing information about local food and exploring data quality of spatial information that can be used by wheelchair users also came up in the discussion. The crossover between the two areas provided a common ground for the participants to explore issues that are relevant to their research interests. 

2014-07-07 15.37.55The holistic aspect that was mentioned before was a major goal for the school – so to consider the tools that are used to collect information, engaging and working with the participants, managing the data that is provided by the participants and ensuring that it is useful for other purposes. To start the process, after introducing the topics of citizen science and volunteered geographic information (VGI), the participants learned about data collection activities, including noise mapping, OpenStreetMap contribution, bird watching and balloon and kite mapping. As can be expected, the balloon mapping raised a lot of interest and excitement, and this exercise in local mapping was linked to OpenStreetMap later in the week.

101_0061The experience with data collection provided the context for discussions about data management and interoperability and design aspects of citizen science applications, as well as more detailed presentations from the participants about their work and research interests. With all these details, the participants were ready to work on their group task: to suggest a research proposal in the area of VGI or Citizen Science. Each group of 5 participants explored the issues that they agreed on – 2 groups focused on a citizen science projects, another 2 focused on data management and sustainability and finally another group explored the area of perception mapping and more social science oriented project.

Some of the most interesting discussions were initiated at the request of the participants, such as the exploration of ethical aspects of crowdsourcing and citizen science. This is possible because of the flexibility in the programme.

Now that the institute is over, it is time to build on the connections that started during the wonderful week in Fiesole, and see how the network of Vespucci alumni develop the ideas that emerged this week.

 

Today marks the publication of the report ‘crowdsourced geographic information in government‘. ReportThe report is the result of a collaboration that started in the autumn of last year, when the World Bank Global Facility for Disaster Reduction and Recovery(GFDRR)  requested to carry out a study of the way crowdsourced geographic information is used by governments. The identification of barriers and success factors were especially needed, since GFDRR invest in projects across the world that use crowdsourced geographic information to help in disaster preparedness, through activities such as the Open Data for Resilience Initiative. By providing an overview of factors that can help those that implement such projects, either in governments or in the World Bank, we can increase the chances of successful implementations. To develop the ideas of the project, Robert Soden (GFDRR) and I run a short workshop during State of the Map 2013 in Birmingham, which helped in shaping the details of project plan as well as some preliminary information gathering. The project team included myself, Vyron Antoniou, Sofia Basiouka, and Robert Soden (GFDRR). Later on, Peter Mooney (NUIM) and Jamal Jokar (Heidelberg) volunteered to help us – demonstrating the value in research networks such as COST ENERGIC which linked us.

The general methodology that we decided to use is the identification of case studies from across the world, at different scales of government (national, regional, local) and domains (emergency, environmental monitoring, education). We expected that with a large group of case studies, it will be possible to analyse common patterns and hopefully reach conclusions that can assist future projects. In addition, this will also be able to identify common barriers and challenges.

We have paid special attention to information flows between the public and the government, looking at cases where the government absorbed information that provided by the public, and also cases where two-way communication happened.

Originally, we were aiming to ‘crowdsource’  the collection of the case studies. We identified the information that is needed for the analysis by using  few case studies that we knew about, and constructing the way in which they will be represented in the final report. After constructing these ‘seed’ case study, we aimed to open the questionnaire to other people who will submit case studies. Unfortunately, the development of a case study proved to be too much effort, and we received only a small number of submissions through the website. However, throughout the study we continued to look out for cases and get all the information so we can compile them. By the end of April 2014 we have identified about 35 cases, but found clear and useful information only for 29 (which are all described in the report).  The cases range from basic mapping to citizen science. The analysis workshop was especially interesting, as it was carried out over a long Skype call, with members of the team in Germany, Greece, UK, Ireland and US (Colorado) while working together using Google Docs collaborative editing functionality. This approach proved successful and allowed us to complete the report.

You can download the full report from UCL Discovery repository

Or download a high resolution copy for printing and find much more information about the project on the Crowdsourcing and government website 

At the last day of INSPIRE conference, I’ve attended a session about  apps and applications and the final plenary which focused on knowledge based economy and the role of inspire within it. Some notes from the talks including my interpretations and comments.

Dabbie Wilson from the Ordnance Survey highlighted the issues that the OS is facing in designing next generation products from an information architect point of view. She noted that the core large scale product, MasterMap has been around for 14 years and been provided in GML all the way through. She noted that now the client base in the UK is used to it and happy with (and when it was introduced, there was a short period of adjustment that I recall, but I assume that by now everything is routine). Lots of small scale products are becoming open and also provided as linked data. The user community is more savvy – they want the Ordnance Survey to push data to them, and access the data through existing or new services and not just given the datasets without further interaction. They want to see ease of access and use across multiple platforms. The OS is considering moving away from provision of data to online services as the main way for people to get access to the data. The OS is investing heavily in Mobile apps for leisure but also helping the commercial sector in developing apps that are based on OS data and tools. For example, OS locate app provide mechanisms to work worldwide so it’s not only UK. They also put effort to create APIs and SDKs – such as OS OnDemands – and also allowing local authorities to update their address data. There is also focus on cloud-based application – such as applications to support government activities during emergencies. The information architecture side moving from product to content. The OS will continue to maintain content that is product agnostic and running the internal systems for a long period of 10 to 20 years so they need to decouple outward facing services from the internal representation. The OS need to be flexible to respond to different needs – e.g. in file formats it will be GML, RDF and ontology but also CSV and GeoJSON. Managing the rules between the various formats is a challenging task. Different representations of the same thing is another challenge – for example 3D representation and 2D representation.

Didier Leibovici presented a work that is based on Cobweb project and discussing quality assurance to crowdsourcing data. In crowdsourcing there are issues with quality of both the authoritative and the crowdsourcing data. The COBWEB project is part of a set of 5 citizen observatories, exploring air quality, noise, water quality, water management, flooding and land cover, odour perception and nuisance and they can be seen at http://www.citizen-obs.eu. COBWEB is focusing on the infrastructure and management of the data. The pilot studies in COBWEB look at landuse/land cover, species and habitat observations and flooding. They are mixing sensors in the environment, then they get the data in different formats and the way to managed it is to validate the data, approve its quality and make sure that it’s compliant with needs. The project involve designing an app, then encouraging people to collect the data and there can be lack of connection to other sources of data. The issues that they are highlighting are quality/uncertainty, accuracy, trust and relevance. One of the core questions is ‘is crowd-sourcing data need to different to any other QA/QC?’ (my view: yes, but depending on the trade offs in terms of engagement and process) they see a role of crowdsourcing in NSDI, with real time data capture QA and post dataset collection QA (they do both) and there are also re-using and conflating data sources. QA is aimed to know what is collected  – there are multiple ways to define the participants which mean different ways of involving people and this have implications to QA. They are suggesting a stakeholder quality model with principles such as vaueness, ambiguity, judgement, reliability, validity, and trust. There is a paper in AGILE 2014 about their framework.  The framework suggests that the people who build the application need to develop the QA/QC process and do that with workflow authoring tool, which is supported with ontology and then running it as web processing service. Temporality of data need to be consider in the metadata, and how to update the metadata on data quality.

Patrick Bell considered the use of smartphone apps – in a project of the BGS and the EU JRC they review existing applications. The purpose of the survey to explore what national geological organisations can learn from the shared experience with development of smartphone apps – especially in the geological sector. Who is doing the development work and which partnerships are created? What barriers are perceived and what the role of INSPIRE directive within the development of these apps? They also try to understand who are the users?  There are 33 geological survey organisations in the EU and they received responses from 16 of them. They found 23 different apps – from BGS – iGeology http://www.bgs.ac.uk/igeology/home.html and provide access to geological amps and give access to subsidence and radon risk with in-app payment. They have soil information in the MySoil app which allow people to get some data for free and there is also ability to add information and do citizen science. iGeology 3D is adding AR to display a view of the geological map locally. aFieldWork is a way to capture information in harsh environment of Greenland.  GeoTreat is providing information of sites with special value that is relevant to tourists or geology enthusiasts. BRGM – i-infoTerre provide geological information to a range of users with emphasis on professional one, while i-infoNappe tell you about ground water level. The Italian organisation developed Maps4You with hiking route and combining geology with this information in Emilia-Romagna region. The Czech Geologcial survey provide data in ArcGIS online.

The apps deal with a wide range of topics, among them geohazards, coastline, fossils, shipwrecks … The apps mostly provide map data and 3D, data collection and tourism. Many organisation that are not developing anything stated no interest or a priority to do so, and also lack of skills. They see Android as the most important – all apps are free but then do in app purchase. The apps are updated on a yearly basis. about 50% develop the app in house and mostly work in partnerships in developing apps. Some focus on webapps that work on mobile platform, to cross platform frameworks but they are not as good as native apps, though the later are more difficult to develop and maintain. Many people use ESRI SDK and they use open licenses. Mostly there is lack of promotion of reusing the tools – most people serve data. Barriers – supporting multiple platform, software development skills, lack of reusable software and limited support to reuse across communities – heavy focus on data delivery, OGC and REST services are used to deliver data to an app. Most suggesting no direct link to INSPIRE by respondents but principles of INSPIRE are at the basis of these applications.

Timo Aarmio – presented the OSKARI platform to release open data to end users (http://www.oskari.org/). They offer role-based security layers with authenticates users and four levels of permissions – viewing, viewing on embedded maps, publishing and downloading. The development of Oskari started in 2011 and is used by 16 member organisations and the core team is running from National Land Survey of Finland. It is used in Arctic SDI, ELF and Finish Geoportal – and lots of embedded maps. The end-users features allow search of metadata, searching map layers by data providers or INSPIRE themes. they have drag and drop layers and customisation of features in WFS.  Sharing is also possible with uploading shapefiles by users.  They also have printing functionality which allow PNG or PDF and provide also embedded maps so you can create a map and then embed  it in your web page.  The data sources that they support are OGC web services – WMS, WMTS, WFS, CSW and also ArcGIS REST, data import for Shapefiles and KML, and JSON for thematic maps . Spatial analysis is provided with OGC Web Processing Service – providing basic analysis of 6 methods – buffer, aggregate, union, intersect, union of analysed layres and area and sector. They are planning to add thematic maps, more advanced spatial analysis methods, and improve mobile device support. 20-30 people work on Oskari with 6 people at the core of it.

The final session focused on knowledge based economy and the link to INSPIRE.

Andrew Trigg provide the perspective of HMLR on fueling the knowledge based economy with open data. The Land registry dealing with 24 million titles with 5 million property transaction a year. They provided open access to individual titles since 1990 and INSPIRE and the open data agenda are important to the transition that they went through in the last 10 years. Their mission is now include an explicit reference to the management and reuse of land and property data and this is important in terms of how the organisation defines itself. From the UK context there is shift to open data through initiatives such as INSPIRE, Open Government Partnership, the G8 Open Data Charter (open by default) and national implementation plans. For HMLR, there is the need to be INSPIRE Compliance, but in addition, they have to deal with public data group, the outcomes of the Shakespeare review and commitment to a national information infrastructure. As a result, HMLR now list 150 datasets but some are not open due to need to protect against fraud and other factors. INSPIRE was the first catalyst to indicate that HMLR need to change practices and allowed the people in the organisation to drive changes in the organisation, secure resources and invest in infrastructure for it. It was also important to highlight to the board of the organisation that data will become important. Also a driver to improving quality before releasing data. The parcel data is available for use without registration. They have 30,000 downloads of the index polygon of people that can potentially use it. They aim to release everything that they can by 2018.

The challenges that HMLR experienced include data identification, infrastructure, governance, data formats and others. But the most important to knowledge based economy are awareness, customer insight, benefit measurement and sustainable finance. HMLR invested effort in promoting the reuse of their data however, because there is no registration, their is not customer insight but no relationships are being developed with end users – voluntary registration process might be an opportunity to develop such relations. Evidence is growing that few people are using the data because they have low confidence in commitment of providing the data and guarantee stability in format and build applications on top of it, and that will require building trust. knowing who got the data is critical here, too. Finally, sustainable finance is a major thing – HMLR is not allowed to cross finance from other areas of activities so they have to charge for some of their data.

Henning Sten Hansen from Aalborg University talked about the role of education. The talk was somewhat critical of the corporatisation of higher education, but also accepting some of it’s aspects, so what follows might be misrepresenting his views though I think he tried to mostly raise questions. Henning started by noting that knowledge workers are defined by OECD as people who work autonomously and reflectively, use tools effectively and interactively, and work in heterogeneous groups well (so capable of communicating and sharing knowledge). The Danish government current paradigm is to move from ‘welfare society’ to the ‘competitive society’ so economic aspects of education are seen as important, as well as contribution to enterprise sector with expectations that students will learn to be creative and entrepreneurial. The government require more efficiency and performance from higher education and as a result  reduce the autonomy of individual academics. There is also expectation of certain impacts from academic research and emphasis on STEM  for economic growth, governance support from social science and the humanities need to contribute to creativity and social relationships. The comercialisation is highlighted and pushing patenting, research parks and commercial spin-offs. There is also a lot of corporate style behaviour in the university sector – sometime managed as firms and thought as consumer product. He see a problem that today that is strange focus and opinion that you can measure everything with numbers only. Also the ‘Google dream’ dream is invoked – assuming that anyone from any country can create global companies. However, researchers that need time to develop their ideas more deeply – such as Niels Bohr who didn’t published and secure funding – wouldn’t survive in the current system. But is there a link between education and success? LEGO founder didn’t have any formal education [though with this example as with Bill Gates and Steve Jobs, strangely their business is employing lots of PhDs - so a confusion between a person that start a business and the realisation of it]. He then moved from this general context to INSPIRE, Geoinformation plays a strong role in e-Governance and in the private sector with the increase importance in location based services. In this context, projects such as GI-N2K (Geographic Information Need to Know) are important. This is a pan European project to develop the body of knowledge that was formed in the US and adapting it to current need. They already identified major gaps between the supply side (what people are being taught) and the demand side – there are 4 areas that are cover in the supply side but the demand side want wider areas to be covered. They aim to develop a new BoK for Europe and facilitating knowledge exchange between institutions. He concluded that Higher education is prerequisite  for the knowledge economy – without doubt but the link to innovation is unclear . Challenges – highly educated people crowd out the job market and they do routine work which are not matching their skills, there are unclear the relationship to entrepreneurship and innovation and the needed knowledge to implement ideas. What is the impact on control universities reducing innovation and education – and how to respond quickly to market demands in skills when there are differences in time scale.

Giacomo Martirano provided a perspective of a micro-enterprise (http://www.epsilon-italia.it/IT/) in southern Italy. They are involved in INSPIRE across different projects – GeoSmartCities, Smart-Islands and SmeSpire – so lots of R&D funding from the EU. They are also involved in providing GIS services in their very local environment. From a perspective of SME, he see barriers that are orgnaisational, technical and financial. They have seen many cases of misalignment of technical competencies of different organisations that mean that they can’t participate fully in projects. Also misalignment of technical ability of clients and suppliers, heterogeneity in client organisation culture that add challenges. Financial management of projects and payment to organisations create problems to SME to join in because of sensitivity to cash-flow. They experience cases were awarded contracts won offering a price which is sometime 40% below the reference one. There is a need to invest more and more time with less aware partners and clients. When moving to the next generation of INSPIRE – there is a need to engage with micro-SMEs in the discussion ‘don’t leave us alone’ as the market is unfair. There is also a risk that member states, once the push for implementation reduced and without the EU driver will not continue to invest. His suggestion is to progress and think of INSPIRE as a Serivce – SDI as a Service can allow SMEs to join in. There is a need for cooperation between small and big players in the market.

Andrea Halmos (public services unit, DG CONNECT) – covering e-government, she noted her realisation that INSPIRE is more than ‘just environmental information’. From DG CONNECT view, ICT enabled open government, and the aim of the digital agenda for Europe is to empowering citizen and businesses, strengthening the internal market, highlighting efficiency and effectiveness and recognised pre-conditions. One of the focus is the effort to put public services in digital format and providing them in cross border way. The principles are to try to be user centred, with transparency and cross border support – they have used life events for the design. There are specific activities in sharing identity details, procurement, patient prescriptions, business, and justice.  They see these projects as the building blocks for new services that work in different areas. They are seeing challenges such financial crisis, but there is challenge of new technologies and social media as well as more opening data. So what is next to public administration? They need to deal with customer – open data, open process and open services – with importance to transparency, collaboration and participation (http://www.govloop.com/profiles/blogs/three-dimensions-of-open-government). The services are open to other to join in and allow third party to create different public services. We look at analogies of opening decision making processes and support collaboration with people – it might increase trust and accountability of government. The public service need to collaborative with third parties to create better or new services. ICT is only an enablers – you need to deal with human capital, organisational issue, cultural issues, processes and business models – it even question the role of government and what it need to do in the future. What is the governance issue – what is the public value that is created at the end? will government can be become a platform that others use to create value? They are focusing on Societal Challenge   Comments on their framework proposals are welcomed – it’s available at http://ec.europa.eu/digital-agenda/en/news/vision-public-services 

After these presentations, and when Alessandro Annoni (who was charring the panel) completed the first round of questions, I was bothered that in all these talks about knowledge-based economy only the government and the private sector were mentioned as actors, and even when discussing development of new services on top of the open data and services, the expectation is only for the private sector to act in it. I therefore asked about the role of the third-sector and civil-society within INSPIRE and the visions that the different speakers presented. I even provided the example of mySociety – mainly to demonstrate that third-sector organisations have a role to play.

To my astonishment, Henning, Giacomo, Andrea and Alessandro answered this question by first not treating much of civil-society as organisations but mostly as individual citizens, so a framing that allow commercial bodies, large and small, to act but citizens do not have a clear role in coming together and acting. Secondly, the four of them seen the role of citizens only as providers of data and information – such as the reporting in FixMyStreet. Moreover, each one repeated that despite the fact that this is low quality data it is useful in some ways. For example, Alessandro highlighted that OSM mapping in Africa is an example for a case where you accept it, because there is nothing else (really?!?) but in other places it should be used only when it is needed because of the quality issue – for example, in emergency situation when it is timely.

Apart from yet another repetition of dismissing citizen generated environmental information on the false argument of data quality (see Caren Cooper post on this issue), the views that presented in the talks helped me in crystallising some of the thoughts about the conference.

As one would expect, because the participants are civil servants, on stage and in presentations they follow the main line of the decision makers for which they work, and therefore you could hear the official line that is about efficiency, managing to do more with reduced budgets and investment, emphasising economic growth and very narrow definition of the economy that matters. Different views were expressed during breaks.

The level in which the citizens are not included in the picture was unsurprising under the mode of thinking that was express in the conference about the aims of information as ‘economic fuel’. While the tokenism of improving transparency, or even empowering citizens appeared on some slides and discussions, citizens are not explicitly included in a meaningful and significant way in the consideration of the services or in the visions of ‘government as platform’. They are reprieved as customers or service users.  The lesson that were learned in environmental policy areas in the 1980s and 1990s, which are to provide an explicit role for civil society, NGOs and social-enterprises within the process of governance and decision making are missing. Maybe this is because for a thriving civil society, there is a need for active government investment (community centres need to built, someone need to be employed to run them), so it doesn’t match the goals of those who are using austerity as a political tool.

Connected to that is the fact that although, again at the tokenism level, INSPIRE is about environmental applications, the implementation now is all driven by narrow economic argument. As with citizenship issues, environmental aspects are marginalised at best, or ignored.

The comment about data quality and some responses to my talk remind me of Ed Parsons commentary from 2008 about the UK GIS community reaction to Web Mapping 2.0/Neogeography/GeoWeb. 6 years on from that , the people that are doing the most important geographic information infrastructure project that is currently going, and it is progressing well by the look of it, seem somewhat resistant to trends that are happening around them. Within the core area that INSPIRE is supposed to handle (environmental applications), citizen science has the longest history and it is already used extensively. VGI is no longer new, and crowdsourcing as a source of actionable information is now with a decade of history and more behind it. Yet, at least in the presentations and the talks, citizens and civil-society organisations have very little role unless they are controlled and marshaled.

Despite all this critique, I have to end with a positive note. It has been a while since I’ve been in a GIS conference that include the people that work in government and other large organisations, so I did found the conference very interesting to reconnect and learn about the nature of geographic information management at this scale. It was also good to see how individuals champion use of GeoWeb tools, or the degree in which people are doing user-centred design.

The INSPIRE 2014 conference marks the middle of the implementation process of  the INSPIRE directive (Infrastructure for Spatial Information in the European Community). The directive is aimed at establishing a pan-European Spatial Data Infrastructure (SDI), and that mean lots of blueprints, pipes, machine rooms and protocols for enabling the sharing of geographic information. In GIS jargon,  blueprints translate to metadata which is a standardise way to describe a GIS dataset; pipes and machine rooms translate to data portals and servers, and the protocols translate to web services that use known standards (here you’ll have a real acronym soup of WMS, WCS, WFS and OGC). It is all aimed to allow people across Europe to share data in an efficient way so data can be found and used. In principle, at least!

This is the stuff of governmental organisations that are producing the data (national mapping agencies, government offices, statistical offices etc.) and the whole INSPIRE language and aims are targeted at the producers of the information, encouraging them to publish information about their data and share it with others. A domain of well established bureaucracies (in the positive sense of the word) and organisations that are following internal procedure in producing, quality checking and distributing their information products. At first sight, seem like the opposite world of ‘upscience‘ where sometime there are only ad-hoc structures and activities.

That is why providing a talk in the plenary session that was dedicated to Governance and Information, and aimed to “assess how INSPIRE is contributing to a more effective and participated environmental policy in Europe, and how it provides connectivity with other policies affecting our environment, society, and the economy” was of concern.  So where are the meeting points of INSPIRE and citizen science? 

One option, is to try a top-down approach and force those who collect data to provide it in INSPIRE compliant way. Of course this is destined to fail. So the next option is to force the intermediaries to do the translation – and projects such as COBWEB is doing that, although it remain to be seen what compromises will be needed. Finally, there is an option to adapt and change procedures such as INSPIRE to reflect the change in the way the world works.

To prepare the talk, I teamed with Dr Claire Ellul, who specialises in metadata (among many other things) and knows about INSPIRE more than me.

The talk started with my previous work about the three eras of environmental information, noticing the move from data by experts, and for experts (1969-1992) to by experts & the public, for experts & the public (2012 on)

As the diagrams show, a major challenges of  INSPIRE is that it is a regulation that was created on the basis of the “first era” and “second era” and it inherently assumes stable institutional practices in creating and disseminating and sharing environmental information.

Alas, the world has changed – and one particular moment of change is August 2004 when OpenStreetMap started, so by the time INSPIRE came into force, crowdsourced geographic information and citizen science became legitimate part of the landscape. These data sources are coming from a completely different paradigm of production and management, and now, with 10 years of experience in OSM and growing understanding of citizen science data, we can notice the differences in production, organisation and practices. For example, while being very viable source of geographic information, OSM still doesn’t have an office and ‘someone to call’.

Furthermore, data quality methods also require different framing for these data.  We have metadata standards and quality standards that are assuming the second era, but we need to find ways to integrate into sharing frameworks like INSPIRE the messy, noisy but also rich and important data from citizen science and crowdsourcing.

Claire provided a case study that analyses the challenges in the area of metadata in particular. The case looks at different noise mapping sources and how the can be understood. Her analysis demonstrates how the ‘producer centric’ focus of INSPIRE is challenging when trying to create systems that record and use metadata for crowdsourced information. The case study is based on our own experiences over the past 6 years and in different projects, so there is information that is explicit in the map, some in a documentation – but some that is only hidden (e.g. calibration and quality of smart phone apps).

We conclude with the message that the INSPIRE community need to start noticing these sources of data and consider how they can be integrated in the overall infrastructure.

The slides from the talk are provided below.

 

About a month ago, Francois Grey put out a suggestion that we should replace the term ‘bottom-up’  science with upscience  – do read his blog-post for a fuller explanation. I have met Francois in New York in April, when he discussed with me the ideas behind the concept, and why it is worth trying to use it.

At the end of May I had my opportunity to use the term and see how well it might work. I was invited to give a talk as part of the series Trusting the crowd: solving big problems with everyday solutions‘ at Oxford Martin School. The two previous talks in the series, about citizen science in the 19th Century and about crowdsourced journalism, set a high bar (and both are worth watching). My talk was originally titled ‘Beyond the screen: the power and beauty of ‘bottom-up’ citizen science projects’ so for the talk itself I have used ‘Beyond the screen: the power and beauty of ‘up-science’ projects‘ and it seem to go fine.

For me, the advantage of using up-science (or upscience) is in the avoidance of putting the people who are active in this form of science in the immediate disadvantage of defining themselves as ‘bottom’. For a very similar reason, I dislike the term ‘counter-mapping‘ as it puts those that are active in it in confrontational position, and therefore it can act as an additional marginalisation force. For few people, who are in favour of fights, this might make them more ‘fired up’, but for others, that might be a reason to avoid the process. Self-marginalisation is not a great position to start a struggle from.

In addition, I like the ability of upscience to be the term that catches the range of practices that Francois includes in the term, from DIY science, community based projects, civic science etc.

The content of the talk included a brief overview of the spectrum of citizen science, some of the typologies that help to make sense of them, and finally a focus on the type of practices that are part of up-science. Finally, some of the challenges and current solutions to them are covered. Below you can find a video of the talk and the discussion that followed it (which I found interesting and relevant to the discussion above).

If any of the references that I have noted in the talk is of interest, you can find them in the slide set below, which is the one that I used for the talk.

 

 

More or Less‘ is a good programme on BBC Radio 4. Regularly exploring the numbers and the evidence behind news stories and other important things, and checking if they stand out. However, the piece that was broadcast  this week about Golf courses and housing in the UK provides a nice demonstration of when not to use crowdsourced information. The issue that was discussed was how much actual space golf courses occupy, when compared to space that is used for housing. All was well, until they announced in the piece the use of clever software (read GIS) with a statistical superhero to do the analysis. Interestingly, the data that was used for the analysis was OpenStreetMap – and because the news item was about Surrey, they started doing the analysis with it.

For the analysis to be correct, you need to assume that all the building polygons in OpenStreetMap and all the Golf courses have been identified and mapped. My own guess that in Surrey, this could be the case – especially with all the wonderful work of James Rutter catalysed. However, assuming that this is the case for the rest of the country is, well, a bit fancy. I wouldn’t dare to state that OpenStreetMap is complete to such a level, without lots of quality testing which I haven’t seen. There is only the road length analysis of ITO World! and other bits of analysis, but we don’t know how complete OSM is.

While I like OpenStreetMap very much, it is utterly unsuitable for any sort of statistical analysis that works at the building level and then summing up to the country levelbecause of the heterogeneity of the data . For that sort of thing, you have to use a consistent dataset, or at least one that attempts to be consistent, and that data comes from the Ordnance Survey.

As with other statistical affairs, the core case that is made about the assertion as a whole in the rest of the clip is relevant here. First, we should question the unit of analysis (is it right to compare the footprint of a house to the area of Golf courses? Probably not) and what is to be gained by adding up individual building’s footprints to the level of the UK while ignoring roads, gardens, and all the rest of the built environment. Just because it is possible to add up every building’s footprint, doesn’t mean that you should. Second, this analysis is the sort of example of ‘Big Data’ fallacy which goes analyse first, then question (if at all) what the relationship between the data and reality.

Thursday marked the launch of The Conservation Volunteers (TCV) report on volunteering impact where they summarised a three year project that explored motivations, changes in pro-environmental behaviour, wellbeing and community resilience. The report is worth a read as it goes beyond the direct impact on the local environment of TCV activities, and demonstrates how involvement in environmental volunteering can have multiple benefits. In a way, it is adding ingredients to a more holistic understanding of ‘green volunteering’.
TCVmotivations One of the interesting aspects of the report is in the longitudinal analysis of volunteers motivation (copied here from the report).  The comparison is from 784 baseline surveys, 202 Second surveys and 73 third surveys, which were done with volunteers while they were involved with the TCV. The second survey was taken after 4 volunteering sessions, and the third after 10 sessions.

The results of the surveys are interesting in the context of online activities (e.g. citizen science or VGI) because they provide an example for an activity that happen off line – in green spaces such as local parks, community gardens and the such. Moreover, the people that are participating in them come from all walks of life, as previous analysis of TCV data demonstrated that they are recruiting volunteers across the socio-economic spectrum. So here is an activity that can be compared to online volunteering. This is valuable, as if the pattern of TCV information are similar, then we can understand online volunteering as part of general volunteering and not assume that technology changes everything.

So the graph above attracted my attention because of the similarities to Nama Budhathoki work on the motivation of OpenStreetMap volunteers. First, there is a difference between the reasons that are influencing the people that join just one session and those that are involved for the longer time. Secondly, social and personal development aspects are becoming more important over time.

There is clear need to continue and explore the data – especially because the numbers that are being surveyed at each period are different, but this is an interesting finding, and there is surly more to explore. Some of it will be explored by Valentine Seymour in ExCiteS who is working with TCV as part of her PhD.

It is also worth listening to the qualitative observations by volunteers, as expressed in the video that open the event, which is provided below.

TCV Volunteer Impacts from The Conservation Volunteers on Vimeo.

Some ideas take long time to mature into a form that you are finally happy to share them. This is an example for such thing.

I got interested in the area of Philosophy of Technology during my PhD studies, and continue to explore it since. During this journey, I found a lot of inspiration and links to Andrew Feenberg’s work, for example, in my paper about neogeography and the delusion of democratisation. The links are mostly due to Feenberg’s attention to ‘hacking’ or appropriating technical systems to functions and activities that they are outside what the designers or producers of them thought.

In addition to Feenberg, I became interested in the work of Albert Borgmann and because he explicitly analysed GIS, dedicating a whole chapter to it in Holding on to RealityIn particular, I was intrigues by his formulation to The Device Paradigm and the notion of Focal Things and Practices which are linked to information systems in Holding on to Reality where three forms of information are presented – Natural Information, Cultural Information and Technological Information. It took me some time to see that these 5 concepts are linked, with technological information being a demonstration of the trouble with the device paradigm, while natural and cultural information being part of focal things and practices (more on these concepts below).

I first used Borgmann’s analysis as part of ‘Conversations Across the Divide‘ session in 2005, which focused on Complexity and Emergence. In a joint contribution with David O’Sullivan about ‘complexity science and Geography: understanding the limits of narratives’, I’ve used Borgmann’s classification of information. Later on, we’ve tried to turn it into a paper, but in the end David wrote a much better analysis of complexity and geography, while the attempt to focus mostly on the information concepts was not fruitful.

The next opportunity to revisit Borgmann came in 2011, for an AAG pre-conference workshop on VGI where I explored the links between The Device Paradigm, Focal Practices and VGI. By 2013, when I was invited to the ‘Thinking and Doing Digital Mapping‘ workshop that was organise by ‘Charting the Digital‘ project. I was able to articulate the link between all the five elements of Borgmann’s approach in my position paper. This week, I was able to come back to the topic in a seminar in the Department of Geography at the University of Leicester. Finally, I feel that I can link them in a coherent way.

So what is it all about?

Within the areas of VGI and Citizen Science, there is a tension between the different goals or the projects and identification of practices in terms of what they mean for the participants – are we using people as ‘platform for sensors’ or are we dealing with fuller engagement? The use of Borgmann’s ideas can help in understanding the difference. He argues that modern technologies tend to adopt the myopic ‘Device Paradigm’ in which specific interpretation of efficiency, productivity and a reductionist view of human actions are taking precedence over ‘Focal Things and Practices’ that bring people together in a way meaningful to human life. In Holding On to Reality (1999), he differentiates three types of information: natural, cultural and technological.  Natural information is defined as information about reality: for example, scientific information on the movement of the earth or the functioning of a cell.  This is information that was created in order to understand the functioning of reality.  Cultural information is information that is being used to shape reality, such as engineering design plans.  Technological information is information as reality and leads to decreased human engagement with fundamental aspects of reality.  Significantly, these categories do not relate to the common usage of the words ‘natural’, ‘cultural and ‘technological’ rather to describe the changing relationship between information and reality at different stages of socio-technical development.

When we explore general geographical information, we can see that some of it is technological information, for example SatNav and the way that communicate to the people who us them, or virtual globes that try to claim to be a representation of reality with ‘current clouds’ and all. The paper map, on the other hand, provide a conduit to the experience of hiking and walking through the landscape, and is part of cultural information.

Things are especially interesting with VGI and Citizen Science. In them, information and practices need to be analysed in a more nuanced way. In some cases, the practices can become focal to the participants – for example in iSpot where the experience of identifying a species in the field is also link to the experiences of the amateurs and experts who discuss the classification. It’s an activity that brings people together. On the other hand, in crowdsourcing projects that grab information from SatNav devices, there is a demonstration of The Device Paradigm, with the potential of reducing of meaningful holiday journey to ‘getting from A to B at the shortest time’. The slides below go through the ideas and then explore the implications on GIS, VGI and Citizen Science.

Now for the next stage – turning this into a paper…

Following the two previous assertions, namely that:

you can be supported by a huge crowd for a very short time, or by few for a long time, but you can’t have a huge crowd all of the time (unless data collection is passive)’ (original post here)

And

‘All information sources are heterogeneous, but some are more honest about it than others’  (original post here)

The third assertion is about pattern of participation. It is one that I’ve mentioned before and in some way it is a corollary of the two assertions above.

‘When looking at crowdsourced information, always keep participation inequality in mind’ 

Because crowdsourced information, either Volunteered Geographic Information or Citizen Science, is created through a socio-technical process, all too often it is easy to forget the social side – especially when you are looking at the information without the metadata of who collected it and when. So when working with OpenStreetMap data, or viewing the distribution of bird species in eBird (below), even though the data source is expected to be heterogeneous, each observation is treated as similar to other observation and assumed to be produced in a similar way.

Distribution of House Sparrow

Yet, data is not only heterogeneous in terms of consistency and coverage, it is also highly heterogeneous in terms of contribution. One of the most persistence findings from studies of various systems – for example in Wikipedia , OpenStreetMap and even in volunteer computing is that there is a very distinctive heterogeneity in contribution. The phenomena was term Participation Inequality by Jakob Nielsn in 2006 and it is summarised succinctly in the diagram below (from Visual Liberation blog) – very small number of contributors add most of the content, while most of the people that are involved in using the information will not contribute at all. Even when examining only those that actually contribute, in some project over 70% contribute only once, with a tiny minority contributing most of the information.

Participation Inequality Therefore, when looking at sources of information that were created through such process, it is critical to remember the nature of contribution. This has far reaching implications on quality as it is dependent on the expertise of the heavy contributors, on their spatial and temporal engagement, and even on their social interaction and practices (e.g. abrasive behaviour towards other participants).

Because of these factors, it is critical to remember the impact and implications of participation inequality on the analysis of the information. There will be some analysis to which it will have less impact and some where it will have major one. In either cases, it need to be taken into account.

Following the last post, which focused on an assertion about crowdsourced geographic information and citizen science I continue with another observation. As was noted in the previous post, these can be treated as ‘laws’ as they seem to emerge as common patterns from multiple projects in different areas of activity – from citizen science to crowdsourced geographic information. The first assertion was about the relationship between the number of volunteers who can participate in an activity and the amount of time and effort that they are expect to contribute.

This time, I look at one aspect of data quality, which is about consistency and coverage. Here the following assertion applies: 

‘All information sources are heterogeneous, but some are more honest about it than others’

What I mean by that is the on-going argument about authoritative and  crowdsourced  information sources (Flanagin and Metzger 2008 frequently come up in this context), which was also at the root of the Wikipedia vs. Britannica debate, and the mistrust in citizen science observations and the constant questioning if they can do ‘real research’

There are many aspects for these concerns, so the assertion deals with the aspects of comprehensiveness and consistency which are used as a reason to dismiss crowdsourced information when comparing them to authoritative data. However, at a closer look we can see that all these information sources are fundamentally heterogeneous. Despite of all the effort to define precisely standards for data collection in authoritative data, heterogeneity creeps in because of budget and time limitations, decisions about what is worthy to collect and how, and the clash between reality and the specifications. Here are two examples:

Take one of the Ordnance Survey Open Data sources – the map present themselves as consistent and covering the whole country in an orderly way. However, dig in to the details for the mapping, and you discover that the Ordnance Survey uses different standards for mapping urban, rural and remote areas. Yet, the derived products that are generalised and manipulated in various ways, such as Meridian or Vector Map District, do not provide a clear indication which parts originated from which scale – so the heterogeneity of the source disappeared in the final product.

The census is also heterogeneous, and it is a good case of specifications vs. reality. Not everyone fill in the forms and even with the best effort of enumerators it is impossible to collect all the data, and therefore statistical analysis and manipulation of the results are required to produce a well reasoned assessment of the population. This is expected, even though it is not always understood.

Therefore, even the best information sources that we accept as authoritative are heterogeneous, but as I’ve stated, they just not completely honest about it. The ONS doesn’t release the full original set of data before all the manipulations, nor completely disclose all the assumptions that went into reaching the final value. The Ordnance Survey doesn’t tag every line with metadata about the date of collection and scale.

Somewhat counter-intuitively, exactly because crowdsourced information is expected to be inconsistent, we approach it as such and ask questions about its fitness for use. So in that way it is more honest about the inherent heterogeneity.

Importantly, the assertion should not be taken to be dismissive of authoritative sources, or ignoring that the heterogeneity within crowdsources information sources is likely to be much higher than in authoritative ones. Of course all the investment in making things consistent and the effort to get universal coverage is indeed worth it, and it will be foolish and counterproductive to consider that such sources of information can be replaced as is suggest for the census or that it’s not worth investing in the Ordnance Survey to update the authoritative data sets.

Moreover, when commercial interests meet crowdsourced geographic information or citizen science, the ‘honesty’ disappear. For example, even though we know that Google Map Maker is now used in many part

s of the world (see the figure), even in cases when access to vector data is provided by Google, you cannot find out about who contribute, when and where. It is also presented as an authoritative source of information. 

Despite the risk of misinterpretation, the assertion can be useful as a reminder that the differences between authoritative and crowdsourced information are not as big as it may seem.

Follow

Get every new post delivered to your Inbox.

Join 2,228 other followers