Citizen Science & Scientific Crowdsourcing – week 2 – Google Local Guides

The first week of the “Introduction to Citizen Science and Scientific Crowdsourcing” course was dedicated to an introduction to the field of citizen science using the history, examples and typologies to demonstrate the breadth of the field. The second week was dedicated to the second half of the course name – crowdsourcing in general, and its utilisation in scientific contexts. In the lecture, after a brief introduction to the concepts, I wanted to use a concrete example that shows a maturity in the implementation of commercial crowdsourcing. I also wanted something that is relevant to citizen science and that many parallels can be drawn from, so to learn lessons. This gave me the opportunity to use Google Local Guides as a demonstration.

My interest in Google Local Guides (GLG) come from two core aspects of it. As I pointed in OpenStreetMap studies, I’m increasingly annoyed by claims that OpenStreetMap is the largest Volunteered Geographical Information (VGI) project in the world. It’s not. I guessed that GLG was, and by digging into it, I’m fairly confident that with 50,000,000 contributors (of which most are, as usual, one-timers), Google created the largest VGI project around. The contributions are within my “distributed intelligence” and are voluntary. The second aspect that makes the project is fascinating for me is linked to a talk from 2007 in one of the early OSM conferences about the usability barriers that OSM (or more general VGI) need to cross to reach a wide group of contributors – basically about user-centred design. The design of GLG is outstanding and shows how much was learned by the Google Maps and more generally by Google about crowdsourcing. I had very little information from Google about the project (Ed Parsons gave me several helpful comments on the final slide set), but by experiencing it as a participant who can notice the design decisions and implementation, it is hugely impressive to see how VGI is being implemented professionally.

As a demonstration project, it provides examples for recruitment, nudging participants to contribute, intrinsic and extrinsic motivation, participation inequality, micro-tasks and longer tasks, incentives, basic principles of crowdsourcing such as “open call” that support flexibility, location and context aware alerts, and much more. Below is the segment from the lecture that focuses on Google Local Guides, and I hope to provide a more detailed analysis in a future post.

The rest of the lecture is available on UCLeXtend.

Advertisements

Science Foo Camp 2016

Science Foo Camp (SciFoo) is an invitation based science unconference that is organised by O’Reilly media, Google, Nature, and Digital Science. Or put it another way, a weekend event (from Friday evening to Sunday afternoon), where 250 scientists, science communicators and journalists, technology people from area that relate to science, artists and ‘none of the above’ come and talk about their interests, other people interests, and new ideas, in a semi-structured way.

As this is an invitation only event, when I got the invitation, I wasn’t sure if it is real – only to replace this feeling with excitement after checking some of the information about it (on Wikipedia and other sites). I was also a little bit concerned after noticing how many of the participants are from traditional natural science disciplines, such as physics, computer science, neuroscience, chemistry, engineering and such (‘Impostor syndrome‘). However, the journey into citizen science, since 2010 and the first Citizen Cyberscience Summit, have led me to fascinating encounters in ecological conferences, physicists and environmental scientists, synthetic biologists, epidemiologists, and experimental physicists, in addition to links to Human-Computer Interaction researchers, educational experts, environmental policy makers, and many more. So I hoped that I could also communicate with the scientists that come to SciFoo.

I was especially looking forward to see how the unconference is organised and run. I’ve experienced unconferences (e.g. WhereCampEU in 2010, parts of State of the Map) and organised the Citizen Cyberscience Summits in 2012 & 2014 where we meshed-up a formal academic conference with unconference. I was intrigued to see how it works when the O’Reilly media team run it, as they popularised the approach.

The event itself run from the evening of Friday to early afternoon on Sunday, with very active 45 hours in between.

wp-1469243960730.jpgThe opening of the event included the following information (from Sarah Winge, Cat Allman, Chris DiBona, Daniel Hook, and Tim O’Reilly): The Foo Camp is an opportunity to bunch of really interesting people to get together and tell each other interesting stories – talk about the most interesting story that you’ve got. The main outputs are new connections between people. This as an opportunities to recharge and to get new ideas – helping each person to recharge using someone else battery. The ground rules include: go to sessions outside your field of expertise – an opportunity to see the world from a different perspective; be as extroverted as you can possibly be – don’t sit with people that you know, as you’ll have a better weekend to talk to different people. The aim is to make a conference that is made mostly from breaks – it’s totally OK to spend time not in a session; the law of two feet – it’s OK to leave and come from sessions and coming and going. It’s a DIY event. There are interesting discussions between competitors commercially, or academically – so it is OK to say that part of the conversations will be kept confidential.

wp-1469414697362.jpgThe expected scramble to suggest sessions and fill the board led to a very rich programme with huge variety – 110 sessions for a day and a half, ranging from ‘Origami Innovations’, ‘Are there Global Tipping Points?’, to ‘Growth Hacking, Rare disease R&D’, and ‘What we know about the universe? and what we don’t know?’. Multiple sessions explored Open science (open collaborations, reproducibility, open access publication), issues with science protocols, increasing engagement in science, gender, social justice side by side with designer babies, geoengineering, life extension, artificial intelligence and much more.

In addition, several curated sessions of lightning talks (5 minutes rapid presentations by participants), provided a flavour and extent of the areas that participants cover. For example, Carrie Partch talk about understanding how circadian cycles work – including the phenomena of social jet-lag, with people sleeping much more at weekends to compensate for lack of sleep during the weekdays. Or Eleine Chew demonstrated her mathematical analysis of different music performances and work as concert pianist.

I’ve followed the advice from Sarah, and started conversation with different people during meals, or on the bus to and from SciFoo, or while having coffee breaks. Actually everyone around was doing it – it was just wonderful to see all around people introducing themselves, and starting to talk about what they are doing. I found myself learning about research on common drugs that can extend the life of mice, brain research with amputees, and discussing how to move academic publications to open access (but somehow ending with the impact of the cold war on the investment in science).

I have organised a session about citizen science, crowdsourcing and open science, in which the discussion included questions about science with monks in Tibet, and patient active involvement in research about their condition. I’ve joined two other sessions about ‘Making Science Communication Thrilling for the Lay Person‘ with Elodie Chabrol (who run Pint of Science) and Adam Davidson; and ‘Science Communication: What? What? How? Discuss‘ with Suze Kundu, Jen Gupta, Simon Watt & Sophie Meekings. Plenty of ideas (and even a sub-hashtag to get responses for specific questions) came from these sessions, but also realisation of the challenges for early career academics in developing their skills in this area, with discouraging remarks from more senior academics, and potential career risks – so we also dedicated thinking about appropriate mechanisms to support public engagement activity.

Another fantastic discussion was led by Kevin Esvelt about ‘Better than nature: ethics of ecological engineering‘ – when this involve gene editing with techniques such as CRISPR with potential far reaching impact on ecological systems. This session just demonstrated how valuable it is to have interdisciplinary conference where the expertise of the people in the room range from geoengineering to ecology and ethics. It was also a mini-demonstration of Responsible Research and Innovation (RRI) in action, where potential directions of scientific research are discussed with a range of people with different background and knowledge.

The amount of input, encounters and discussion at SciFoo is overwhelming, and the social activities after the sessions (including singing and sitting by the ‘fire’) is part of the fun – though these were very exhausting 40 hours.

Because SciFoo invitees include a whole group of people from science communication, and as SciFoo coincide with Caren Cooper stint of the twitter account @IamSciComm account where she discussed the overlap between citizen science and science communication, I paid attention to the overlap during the meeting. The good news is that many of the scientists had some idea of what citizen science is. I always check that people know the term before explaining my work, so it’s great to see that term is gaining traction. The less good news is that it is still categorised under ‘science communication’ and maybe a useful session would have been ‘What is the problem of scientists with citizen science?’.

wp-1469414712786.jpg

For me, SciFoo raised the question about the value of interdisciplinary meetings and how to make them work. With such a list of organisers, location, exclusiveness and the mystery of invitation (several people, including me, wonder ‘It’s great being here, but how did they found out about my work?’) – all make it possible to get such an eclectic collection of researchers. While it’s obvious that the list is well curated with considerations of research areas, expertise, background, academic career stage, and diversity, the end results and the format open up the possibility of creative and unexpected meetings (e.g. during lunch). My own experience is that to achieve something that approach such a mix of disciplines in a common ‘bottom-up’ academic conference is very challenging and need a lot of work. The Citizen Cyberscience summits, ECSA conference, or the coming Citizen Science Association conference are highly interdisciplinary in terms of the traditional academic areas from which participant come – but they require to convince people to submit papers and come to the conference. Usually, the interdisciplinary event is an additional commitment to their disciplinary focus and this creates a special challenge. Maybe it can be possible to achieve similar interdisciplinary meetings by getting endorsements from multiple disciplinary societies, or get support from bodies with wide remit like the Royal Society and Royal Academy of Engineering.

Another thought is that the model of reaching out to people and convincing them that it is worth their while to come to such a meeting might also work better in allowing mixing, as open call are impacted by ‘self deselection’ where people decide that the conference is not for them (e.g. getting active participants to a citizen science conference, or ensuring that papers are coming from all flavours of citizen science).

Another delightful aspect is to notice how the unconference format worked with people that (mostly) haven’t experienced it before – the number of slots and opportunities was enough for people to mostly put their sessions forward. Although the call for people to be extroverts, the people with less confident will prepare their ideas more slowly, and can end up outside the grid. It was nice to see how some places in the grid were blocked off during the early stages, and then release to ideas that came during breaks, or for sessions that were proposed more slowly and didn’t secure a spot. There might be also value in restricting people to one session, and then progressing to more? What are the steps that are required to make an unconference format inclusive at the session setting stage?

In contrast to the approach in academic meetings to control the number of parallel sessions (to ensure enough people are showing up to a session), SciFoo is having so many, that most of the sessions are with a small group of about 10 or 20 people. This make it more valuable and suitable for exploratory discussions – which worked well in the sessions that I attended. In a way, at its best, SciFoo is many short brain storming sessions which leave you with a wish to discuss for longer.

If you get an invitation (and being flattered is part of the allure of SciFoo), it is worth going on the Wiki, give a bit of a description of yourself and think about a session that you’d like to propose – +1 can help you to get a feeling that people will be interested in it. Think about a catchy title that includes keywords, and remember that you are talking to intelligent lay people from outside your discipline, so prepare to explain some core principles for the discussion in 5 minutes or so. Don’t dedicate the time to tell people only about your research – think of an issue that bother you to some degree and you want to explore (for me it was the connection between citizen science and open science) and consider that you’ll have one hour to discuss it.

Follow the advice – say hello to everyone and have great conversations during breaks, and don’t go to sessions if the conversation is more interesting. Another take on the meeting is provided by Bjoern Brembs on his blog, with whom I had the open access conversation (and I still unsure how we ended with the Cold War).  Also remember to enjoy the experience, sit by the ‘fire’ and talk about things other than science!

 

 

AAG 2015 notes – day 4 – Citizen Science & OpenStreetMap Studies

The last day of AAG 2015 is about citizen science and OpenStreetMap studies.

The session Beyond motivation? Understanding enthusiasm in citizen science and volunteered geographic information was organised together with Hilary Geoghegan. We were interest to ‘explore and debate current research and practice moving beyond motivation, to consider the associated enthusiasm, materials and meanings of participating in citizen science and VGI.’

As Hilary couldn’t attend the conference, we started the session with a discussion about experiences of enthusiasm – for example, my own experience with IBM World Community Grid.  Jeroen Verplanke raised the addiction in volunteer thinking projects, such as logging in to Zooniverse or Tomnod project, and time fly-by. Mairead de Roiste described mapping wood-pigeon in New Zealand – public got involved because they wanted to help, but when they hear that the data wasn’t use, they might lose interest. Urgency can also be a form influencing participation.

Britta Ricker – University of Washington Tacoma – Look what I can do! Harnessing drone enthusiasm for increased motivation to participate. On-going research. Looking at the Geoweb – it allow people to access information, and made imagery available to the public, and the data is at the whim of whoever give us the data. With drones, we can send them up when we want or need to. Citizen Science is deeply related to geoweb – challenge is to get people involve and make them stay involved. We can harness drone enthusiasm – they evoke negative connotation but also thinking about them for good – humanitarian applications. Evidence for the enthusiasm is provided by YouTube where there are plenty of drone video – 3.44M – lots of action photography: surfing community and GoPro development. People are attached to the drone – jumping to the water to save them. So how the enthusiasm to drones can be harnessed to help participatory mapping. We need to design a workflow around stages: pre-flight, flight, post processing. She partnered with water scientists to explore local issues. There are considerations of costs and popularity – and selected quadcopter for that. DJI Phantom Vision 2+. With drones need to read the manual and plan the flight. There are legal issues of where it is OK to fly, and Esri & MapBox provide information on where you can fly them. Need to think of camera angle – need also to correct fisheye, and then process the images. Stitch imagery can be done manually (MapKnitter/QGIS/ArcGIS). Possible to do it in automated software, but open source (e.g. OpenDroneMap) is not yet good enough in terms of ease of use. Software such as Pix4D is useful but expensive. Working with raster data is difficult, drones require practice, and software/hardware is epensive – not yet ready to everyone. NGOs can start using it. Idea: sharing photos , classifying images together by volunteers.

Brittany Davis – Allegheny College – Motivated to Kill: Lionfish Derbies, Scuba Divers, and Citizen Science. Lionfish are stunning under water – challenging to differentiate between the two sub species but it doesn’t matter if you’re trying to catch them. They are invasive species and are without predators, exploded – especially from 2010. There is a lot of informational campaign and encouraging people to hunt them, especially in dive centres – telling people that it is a way to save a Caribbean reefs. When people transform themselves from ‘benign environmental activity’ to ‘you tell me that I can hunt? cool!’. Lionfish is tasty so having the meat for dinner is a motivation. Then doing ‘lionfish derbies’ – how many can you kill in a day. Seen a lot of enthusiasm for lionfish derbies. Trying to sign up people to where they go but they are not recording where they hunt the lionfish. People go to another site for competition as they want to capture more. REEF trying to encourage a protocol for capturing them, and there are cash prizes for the hunting. They use the catch to encourage people to hunt lionfish. Derbies increase in size – 14832 were removed from 2009 to 2014 and some evidence for the success of the methodology. There was a pressure on ‘safely and humanely capture and euthanase these fish’ – challenge for PADI who run special scuba courses that are linked to conservation. People hear about the hunting and that motivate people to go diving. There is a very specific process of REEF sanctioned lionfish derby, so trying to include recording and public information. But there are challenges below the depth of recreational divers. She also explored if it is possible to improve data collection for scientists.

Cheryl Gilge – University of Washington – The rhetorical flourish of citizen participation (or, the formation of cultural fascism?) offered a theoretical analysis of citizen science and web 2.0 as part of a wider project to understand labour relationships and power. She argues that there is agency to the average citizen to link to their environment. They have the ability to contribute, and to receive information is part of Web 2.0. As a technology layer, it changes both the individual and society levels. The collaboration and participation in Web 2.0 is framed around entrepreneurialism, efficiencies, and innovation. The web is offering many opportunities to help wider projects, where amateur and expert knowledge are both valued. However, there is a risk of reducing the politics of participation – semblance of agency. Democratic potential – but also co-opting the spirit is in evidence. There is plenty of examples of inducing individuals to contribute data and information, researchers are eager to understand motivation over a long period. Rational system to explain what is going on can’t explain the competing goals and values that are in action. The desire to participation is spread – fun, boredom etc. From understanding people as ‘snowflakes’ to unashamed exploitation. Why do people contribute to the wider agenda? As provocation, harnessing crowd potential to neoliberalisation agenda of universities. We give freedom to the efficiency and promise of digital tools. Government promise ‘open government’ or ‘smart cities’ that put efficiency as the top value. Deep libertarian desire for small government is expressed through technology. The government have sensors that reduce cost of monitoring what is happening. In the academic environment – reduce funding, hiring freeze, increase in pressure to publish – an assumption that it is possible to mechanically produce top research. Trading in ideas are less valued. Desire for capacity of information processing, or dealing with humanitarian efforts – projects like Galaxy Zoo require more people to analyse the masses of data that research produces, or mapathons to deal with emergencies. Participants are induced to do more through commitment to the project and harnessing enthusiasm. Adding inducement to the participants. She introduce the concept of micro-fascism from Guattari  – taking over freedoms in the hope of future promises. It enable large group formation to happen – e.g. identities such as I’m Mac/PC – it is harder to disconnect. Fascism can be defined as an ideology that rely on the masses in believing in the larger goals, the unquestioned authority of data in Web 2.0. Belief in technology induce researchers to get data and participation regardless of the costs. Open source is presented as democracy, but there are also similarities with fascism. Participation in the movement and participants must continue to perform. It bring uncomfortable participation – putting hope on these activities, but also happens in top down and bottom up, and Web 2.0. What is the ethical role of researchers who are involved in these projects? How do we value this labour? Need to admit that it is a political.

In a final comment, Teresa Scassa pointed that we need to consider the implication of legitimising drones, killing fish or employing unpaid labour – underlying all is a moral discomfort.

Afternoon, the two sessions on OpenStreetMap that Alan McConchie and I organised, taking the 10th birthday of OSM as a starting point, this session will survey the state of geographical research on OpenStreetMap and recognising that OSM studies are different from VGI. The session is supported by the European COST Energic (COST Action IC1203) network: European Network Exploring Research into Geospatial Information Crowdsourcing.

OpenStreetMap Studies 1 

Jennings Anderson, Robert Soden, Mikel Maron, Marina Kogan & Ken Anderson – University of Colorado, Boulder – The Social Life of OpenStreetMap: What Can We Know from the Data? New Tools and Approaches. OSM provides a platform to understand human centred computing. The is very valuable information in OSM history file, and they built a framework (EPIC OSM) that can run spatial and temporal queries and produces JSON output that can be then analysed. They are use existing tools and software frameworks to deliver it. The framework was demonstrated: can ask questions by day, or by month and even bin them by week and other ways. Running such questions which are evaluated by Ruby, so easy to add more questions and change them. They already use the framework in a paper in CHI about the Haiti earthquake (see video below).  Once they’ve created the underlying framework, they also developed an interface – OSM Markdown – can embed code and see changesets, accumulative nodes collected and classification by type of user. They are also providing information with tags. When analysing Haiti response, they see spike in noted added and what they see in buildings – the tags of collapse=yes

Christian Bittner – Diverse crowds, diverse VGI? Comparing OSM and Wikimapia in JerusalemChristian looked at differences in Wikimapia and OSM as sources of VGI. Especially interested in the social implications such as the way exclusion plays in VGI – challenges between Palestine/Israel – too contradicting stories that play out in a contested space, and there are conflict and fights over narratives that the two sides enact in different areas. With new tools, there is a ‘promise’ of democratisation – so a narrative of collaboration and participation. In crowdsourced geographic information we can ask: who is the crowd, and who is not? Studying social bias in OSM is a topic that is being discussed in the literature. The process is to look at the database of OSM. Analysing the data and metadata and used the municipal boundaries of Jerusalem. Simplified representation of the city, and region are classified by majority – Arab or Jewish. Then used cartograpms according to size of population and the amount of information collected.In OSM, Jewish areas are over-represented, while Arab areas are under-represented. Bias toward male from privileged socio-economic background as participants. In Wikimapia, the process is tagging places and uses visual information from Google. Wikimapia is about qualitative information so objects are messy and overlap, with no definitions of what consist of a place. In Wikimapia, there is much more descriptions of the Arab areas which are over-represented. The amount of information in Wikimpaia is smaller – 2679 objects, compared to 33,411 ways in OSM. In OSM there is little Arabic, and more Hebrew, though Latin is the most used language. Wikimapia is the other way around, with Hebrew in the minority. The crowd is different between projects. There are wider implications – diverse crowd so diverse VGI? VGI is diverse form of data, and they are produced in different ways from different knowledge cultures. He call for very specific studies on each community before claiming that VGI is general form of information.

Tim Elrick  & Georg Glasze – University of Erlangen-Nuremberg, Germany –  A changing mapping practices? Representation of Places of Worship in OpenStreetMap and other sourcesThe start of the process is noticing that churches are presented on official maps, but not a masques, noticing how maps are used to produce specific narratives. What happen in new forms of mapping? In Google Maps, the masque is presented, but not the church, in OSM both are mapped. What is happening? In the old topographic maps, the official NMAs argue that it provides a precise representation – but failing to do so in terms of religious differences. Some state do not include non-Christian places of worship – the federal mapping agency came with symbols for such places (masques, synagogues) but the preference from the states NMAs was for a generic mark for all non-Christian places that do not differentiate between religions. USGS just have single mark for house of worship – with cross. The USGS suggested to carry out crowdsourcing to identify places of worship so they are willing to change. In OSM there are free tagging and marks for religion, but the rendering dictate only some tags. In 2007 there was suggestion to change rendering of non-Christian places. Once Steve Chilton created cartographic symbols for the change. OSM do-ocracy can lead to change, but in other places that use OSM this was not accepted – there are different symbols in OpenCycleMaps. In Germany, there are conflicts about non visible places of worship (e.g. Masque in social club). Adaptive approach to dealing with location in OSM. In Google there is a whole set of data sources that are used, but also crowdsourcing which go to moderators in Google – no accountability or local knolwedge. Places of worship is not transparent. Categorisation and presentation change with new actors – corporate and open data. Google use economy of attention.

Alan McConchie – University of British Columbia – Map Gardening in Practice: Tracing Patterns of Growth and Maintenance in OpenStreetMap. Looking at history of OSM. Editing existing features is an important as adding new ones – having to collaborate and dealing with other people data. In the US, OSM is a mixed of volunteer and imported data – it’s ongoing aspect of the project. Questions: do the ‘explorers’ stick around? the people who like empty spaces . Do imports hinder the growth of the community? and does activity shift to ‘gardening’? The TIGER import in 2007 have been significant to the growth of the project. There are also many other imports – address in Denmark, French land cover, incomplete land cover imports in Canada. There was community backlash from people who were concerned about the impact of imports (e.g. Crowe 2011; Fredrik Ramm, 2012, Tobias Knerr, 2015). The debate is also between different regional factions. There is an assumption that only empty areas are exciting. That is problematic in terms of someone joining now in Germany. New best practices that are evolving Imports in Seattle were used to encourage the community and build it. Zielstra et al. 2013 explored imports show different growths, but not so simple as just to pin it on imports. Alan takes the ‘Wiki Gardening’ concept – people who like to keep things tidy and well maintained. Analysing small areas. Identifying blank spots, but trying to normalise across city in the world – e.g. population from the gridded population of the world. Exploring edits per month. We see many imports happening all the time. At individual city, explore the behaviour of explorers and those that never mapped the unknown. In London, new mappers are coming in while at Vancouver the original mapper are the one that continue to maintain the map. There is power law effects that trump anything else, and shift to new contributors and it is not clear cut.

Monica G. Stephens – University at Buffalo – Discussant: she started looking at OSM only few years ago, because of a statement from Mike Goodchild that women are not included, so done survey of internet users in Google Maps and OSM. She found that geotagging is much more male – more then just sharing image. In her survey she noticed gender bias in OSM. Maps are biased by the norms, traditions, assumptions and politics of map maker (Harley 1989). Biases – but biases of map maker – bikes in Denver (what interest them), or uneven representation of Hebrew in Jerusalem, or Religious attributes. Also there is how the community makes decision – how to display information? what to import? There are issues of ethos – there are fundamental differences in UK and Germany communities to US mapping communities. This lead to interesting conversations between these communities. There are also comparison, Wikimapia, Google Maps, Topo Maps – the tell us what OSM is doing. OSM democracy is more efficient and responding to communities ideas. The discussions on tagging childcare – rejected but there are discussions that led to remapping of tags in response to the critique. Compare to Google Maps, who was creating local knowledge? in Google Maps 96% of reviewers are male (in Google Map Maker 2012), so the question is who is the authority that govern Wikimapia.

OpenStreetMap Studies 2  included the following:

Martin Loidl – Department of Geoinformatics, University of Salzburg – An intrinsic approach for the detection and correction of attributive inconsistencies and semantic heterogeneity in OSM data. Martin come from data modelling perspective, accepting that OSM is based on bottom-up approach, with flat data modelling and attributes, with no restriction on tag usage. There are attributive inconsistencies. Semantics heterogeneity is influencing visualisation, statistics and spatial analysis. Suggesting to improve results by harmonization and correction through estimation. There has been many comparison of OSM quality over the years. There is little work on attribute information. Martin suggested an intrinsic approach that rely on the data in OSM – expecting major roads to be connected and consistent. Showing how you can attributes in completeness. Most of the road in OSM are local roads and  and there is high heterogeneity, but we need them and we should care about them. There are issues with keeping the freedom to tag – it expose the complexity of OSM.

Peter A. Johnson – University of Waterloo Challenges and Constraints to Municipal Government Adoption of OpenStreetMap. The collaboration of MapBox with NYC – agreement on data sharing was his starting point and motivation to explore how we can connect government and citizens to share data. Potentially, OSM community will help with official data, improve it and send it back. Just delivering municipal data over OSM base map is not much – maybe we need to look at mirroring – questions about currency, improvement of our services, and cheaper/easier to get are core questions. Evaluating official data and OSM data. Interview with governments in Canada, with range of sizes – easy in large cities, basic steps in medium and little progress in rural places. No official use of OSM, but do make data available to OSM community, and anecdotal evidence of using it for different jobs unofficially. Not seeing benefits in mirroring data, and they are the authoritative source for information, no other data is relevant. Constraints: not sure that OSM is more accurate and risk averse culture. They question fit with organisation needs, lacking required attributes, and they do see costs in altering existing data. OSM might be relevant to rural and small cities where data is not being updated.

Muki Haklay – University College London COST Energic – A European Network for research of VGI: the role of OSM/VGI/Citizen Science definitionsI’ve used some of the concepts that I first presented in SOTM 2011 in Vienna, and extended them to the general area of citizen science and VGI. Arguing that academics need to be ‘critical friends’, in a nice way, to OSM and other communities. The different talks and Monica points about changes in tagging demonstrate that this approach is effective and helpful.

Discussant: Alan McConchie – University of British Columbia. The later session looked at intrinsic or extrinsic analysis of OSM – such as Martin’s work on internal consistency, there are issues of knowing specific person in the bits of the process who can lead to the change. There is a very tiny group of people that make the decisions, but there is a slow opening towards accountability (e.g. OSM rendering style on Github). There are translation of knowledge and representation that happen in different groups and identifying how to make the information correctly. There is a sense of ‘no one got the right answer’. Industry and NGOs also need to act as critical friends – it will make it a better project. There is also critical GIS conversations – is there ‘fork’ within the OSM studies? We can have conversations about these issues.

Follow up questions explored the privacy of the participants and maybe mentioned it to participants and the community, and also the position as participant or someone who alters the data and as a researcher – the implications of participatory observations.

Second day of INSPIRE 2014 – open and linked data

Opening geodata is an interesting issue for INSPIRE  directive. INSPIRE was set before the hype of Government 2.0 was growing and pressure on opening data became apparent, so it was not designed with these aspects in mind explicitly. Therefore the way in which the organisations that are implementing INSPIRE are dealing with the provision of open and linked data is bound to bring up interesting challenges.

Dealing with open and linked data was the topic that I followed on the second day of INSPIRE 2014 conference. The notes below are my interpretation of some of the talks.

Tina Svan Colding discussed the Danish attempt to estimate the value (mostly economically) of open geographic data. The study was done in collaboration with Deloitte, and they started with a change theory – expectations that they will see increase demands from existing customers and new ones. The next assumption is that there will be new products, companies and lower prices and then that will lead to efficiency and better decision making across the public and private sector, but also increase transparency to citizens. In short, trying to capture the monetary value with a bit on the side. They have used statistics, interviews with key people in the public and private sector and follow that with a wider survey – all with existing users of data. The number of users of their data increased from 800 users to over 10,000 within a year. The Danish system require users to register to get the data, so this are balk numbers, but they could also contacted them to ask further questions. The new users – many are citizens (66%) and NGO (3%). There are further 6% in the public sector which had access in principle in the past but the accessibility to the data made it more usable to new people in this sector. In the private sector, construction, utilities and many other companies are using the data. The environmental bodies are aiming to use data in new ways to make environmental consultation more engaging to audience (is this is another Deficit Model? assumption that people don’t engage because it’s difficult to access data?). Issues that people experienced are accessibility to users who don’t know that they need to use GIS and other datasets. They also identified requests for further data release. In the public sector, 80% identified potential for saving with the data (though that is the type of expectation that they live within!).

Roope Tervo, from the Finish Meteorological Institute talked about the implementation of open data portal. Their methodology was very much with users in mind and is a nice example of user-centred data application. They hold a lot of data – from meteorological observations to air quality data (of course, it all depends on the role of the institute). They have chose to use WFS download data, with GML as the data format and coverage data in meteorological formats (e.g. grib). He showed that selection of data models (which can be all compatible with the legislation) can have very different outcomes in file size and complexity of parsing the information. Nice to see that they considered user needs – though not formally. They created an open source JavaScript library that make it is to use the data- so go beyond just releasing the data to how it is used. They have API keys that are based on registration. They had to limit the number of requests per day and the same for the view service. After a year, they have 5000 users, and 100,000 data downloads per day and they are increasing. Increasing slowly. They are considering how to help clients with complex data models.

Panagiotis Tziachris was exploring the clash between ‘heavy duty’ and complex INSPIRE standards and the usual light weight approaches that are common in Open Data portal (I think that he intended in the commercial sector that allow some reuse of data). This is a project of 13 Mediterranean regions in Spain, Italy, Slovenia, Montenegro, Greece, Cyprus and Malta. The HOMER project (website http://homerproject.eu/) used different mechanism, including using hackathons to share knowledge and experience between more experienced players and those that are new to the area. They found them to be a good way to share practical knowledge between partners. This is an interesting side of purposeful hackathon within a known people in a project and I think that it can be useful for other cases. Interestingly, from the legal side, they had to go beyond the usual documents that are provided in an EU consortium, and  in order to allow partners to share information they created a memorandum of understanding for the partners as this is needed to deal with IP and similar issues. Also practices of open data – such as CKAN API which is a common one for open data websites were used. They noticed separation between central administration and local or regional administration – the competency of the more local organisations (municipality or region) is sometimes limited because knowledge is elsewhere (in central government) or they are in different stages of implementation and disagreements on releasing the data can arise. Antoehr issue is that open data is sometime provided at regional portals while another organisation at the national level (environment ministry or cadastre body) is the responsible to INSPIRE. The lack of capabilities at different governmental levels is adding to the challenges of setting open data systems. Sometime Open Data legislation are only about the final stage of the process and not abour how to get there, while INPIRE is all about the preparation, and not about the release of data – this also creates mismatching.

Adam Iwaniak discussed how “over-engineering” make the INSPIRE directive inoperable or relevant to users, on the basis of his experience in Poland. He asks “what are the user needs?” and demonstrated it by pointing that after half term of teaching students about the importance of metadata, when it came to actively searching for metadata in an assignment, the students didn’t used any of the specialist portals but just Google. Based on this and similar experiences, he suggested the creation of a thesaurus that describe keywords and features in the products so it allows searching  according to user needs. Of course, the implementation is more complex and therefore he suggests an approach that is working within the semantic web and use RDF definitions. By making the data searchable and index-able in search engines so they can be found. The core message  was to adapt the delivery of information to the way the user is most likely to search it – so metadata is relevant when the producer make sure that a search in Google find it.

Jesus Estrada Vilegas from the SmartOpenData project http://www.smartopendata.eu/ discussed the implementation of some ideas that can work within INSPIRE context while providing open data. In particular, he discussed a Spanish and Portuguese data sharing. Within the project, they are providing access to the data by harmonizing the data and then making it linked data. Not all the data is open, and the focus of their pilot is in agroforestry land management. They are testing delivery of the data in both INSPIRE compliant formats and the internal organisation format to see which is more efficient and useful. INSPIRE is a good point to start developing linked data, but there is also a need to compare it to other ways of linked the data

Massimo Zotti talked about linked open data from earth observations in the context of business activities, since he’s working in a company that provide software for data portals. He explored the business model of open data, INSPIRE and the Copernicus programme. From the data that come from earth observation, we can turn it into information – for example, identifying the part of the soil that get sealed and doesn’t allow the water to be absorbed, or information about forest fires or floods etc. These are the bits of useful information that are needed for decision making. Once there is the information, it is possible to identify increase in land use or other aspects that can inform policy. However, we need to notice that when dealing with open data mean that a lot of work is put into bringing datasets together. The standarisation of data transfer and development of approaches that helps in machine-to-machine analysis are important for this aim. By fusing data they are becoming more useful and relevant to knowledge production process. A dashboard approach to display the information and the processing can help end users to access the linked data ‘cloud’. Standarisation of data is very important to facilitate such automatic analysis, and also having standard ontologies is necessary. From my view, this is not a business model, but a typical one to the operations in the earth observations area where there is a lot of energy spend on justification that it can be useful and important to decision making – but lacking quantification of the effort that is required to go through the process and also the speed in which these can be achieved (will the answer come in time for the decision?). A member of the audience also raised the point that assumption of machine to machine automatic models that will produce valuable information all by themselves is questionable.

Maria Jose Vale talked about the Portuguese experience in delivering open data. The organisation that she works in deal with cadastre and land use information. She was also discussing on activities of the SmartOpenData project. She describe the principles of open data that they considered which are: data must be complete, primary, timely, accessible, processable; data formats must be well known, should be permanence and addressing properly usage costs. For good governance need to know the quality of the data and the reliability of delivery over time. So to have automatic ways for the data that will propagate to users is within these principles. The benefits of open data that she identified are mostly technical but also the economic values (and are mentioned many times – but you need evidence similar to the Danish case to prove it!). The issues or challenges of open data is how to deal with fuzzy data when releasing (my view: tell people that it need cleaning), safety is also important as there are both national and personal issues, financial sustainability for the producers of the data, rates of updates and addressing user and government needs properly. In a case study that she described, they looked at land use and land cover changes to assess changes in river use in a river watershed. They needed about 15 datasets for the analysis, and have used different information from CORINE land cover from different years. For example, they have seen change from forest that change to woodland because of fire. It does influence water quality too. Data interoperability and linking data allow the integrated modelling of the evolution of the watershed.

Francisco Lopez-Pelicer covered the Spanish experience and the PlanetData project http://www.planet-data.eu/ which look at large scale public data management. Specifically looking in a pilot on VGI and Linked data with a background on SDI and INSPIRE. There is big potential, but many GI producers don’t do it yet. The issue is legacy GIS approaches such as WMS and WFS which are standards that are endorsed in INSPIRE, but not necessarily fit into linked data framework. In the work that he was involved in, they try to address complex GI problem with linked data . To do that, they try to convert WMS to a linked data server and do that by adding URI and POST/PUT/DELETE resources. The semantic client see this as a linked data server even through it can be compliant with other standards. To try it they use the open national map as authoritative source and OpenStreetMap as VGI source and release them as linked data. They are exploring how to convert large authoritative GI dataset into linked data and also link it to other sources. They are also using it as an experiment in crowdsourcing platform development – creating a tool that help to assess the quality of each data set. The aim is to do quality experiments and measure data quality trade-offs associated with use of authoritative or crowdsourced information. Their service can behave as both WMS and “Linked Map Server”. The LinkedMap, which is the name of this service, provide the ability to edit the data and explore OpenStreetMap and thegovernment data – they aim to run the experiment in the summer so this can be found at http://linkedmap.unizar.es/. The reason to choose WMS as a delivery standard is due to previous crawl over the web which showed that WMS is the most widely available service, so it assumed to be relevant to users or one that most users can capture.

Paul van Genuchten talked about the GeoCat experience in a range of projects which include support to Environment Canada and other activities. INSPIRE meeting open data can be a clash of cultures and he was highlighting neogeography as the term that he use to describe the open data culture (going back to the neogeo and paleogeo debate which I thought is over and done – but clearly it is relevant in this context). INSPIRE recommend to publish data open and this is important to ensure that it get big potential audience, as well as ‘innovation energy’ that exist among the ‘neogeo’/’open data’ people. The common things within this culture are expectations that APIs are easy to use, clean interfaces etc. But under the hood there are similarities in the way things work. There is a perceived complexity by the community of open data users towards INSPIRE datasets. Many of Open Data people are focused and interested in OpenStreetMap, and also look at companies such as MapBox as a role model, but also formats such as GeoJSON and TopoJSON. Data is versions and managed in git like process. The projection that is very common is web mercator. There are now not only raster tiles, but also vector tiles. So these characteristics of the audience can be used by data providers to provide help in using their data, but also there are intermediaries that deliver the data and convert it to more ‘digestible’ forms. He noted CitySDK by Waag.org which they grab from INSPIRE and then deliver it to users in ways that suite open data practices.He demonstrated the case of Environment Canada where they created a set of files that are suitable for human and machine use.

Ed Parsons finished the set of talks of the day (talk link goo.gl/9uOy5N) , with a talk about multi-channel approach to maximise the benefits of INSPIRE.  He highlighted that it’s not about linked data, although linked data it is part of the solution to make data accessibility. Accessibility always wins online – and people make compromises (e.g. sound quality in CD and Spotify). Google Earth can be seen as a new channel that make things accessible, and while the back-end is not new in technology the ease of access made a big difference. The example of Denmark use of minecraft to release GI is an example of another channel. Notice the change over the past 10 years in video delivery, for example, so the early days of the video delivery was complex and require many steps and expensive software and infrastructure, and this is somewhat comparable to current practice within geographic information. Making things accessible through channels like YouTube and the whole ability around it changed the way video is used, uploaded and consumed, and of course changes in devices (e.g. recording on the phone) made it even easier. Focusing on the aspects of maps themselves, people might want different things that are maps  and not only the latest searchable map that Google provide – e.g. the  administrative map of medieval Denmark, or maps of flood, or something that is specific and not part of general web mapping. In some cases people that are searching for something and you want to give them maps for some queries, and sometime images (as in searching Yosemite trails vs. Yosemite). There are plenty of maps that people find useful, and for that Google now promoting Google Maps Gallery – with tools to upload, manage and display maps. It is also important to consider that mapping information need to be accessible to people who are using mobile devices. The web infrastructure of Google (or ArcGIS Online) provide the scalability to deal with many users and the ability to deliver to different platforms such as mobile. The gallery allows people to brand their maps. Google want to identify authoritative data that comes from official bodies, and then to have additional information that is displayed differently.  But separation of facts and authoritative information from commentary is difficult and that where semantics play an important role. He also noted that Google Maps Engine is just maps – just a visual representation without an aim to provide GIS analysis tools.

Google Research Award – Identifying Learning Benefits of Google Earth Tours in Education

Image representing Google Earth as depicted in...

It is always nice to announce good news. Back in February, together with Richard Treves at the University of Southampton, I submitted an application to the Google’s Faculty Research Award program for a grant to investigate Google Earth Tours in education. We were successful in getting a grant worth $86,883 USD.  The project builds on my expertise in usability studies of geospatial technologies, including the use of  eye tracking and other usability engineering techniques for GIS and Richard’s expertise in Google Earth tours and education, and longstanding interest in usability issues.

In this joint UCL/Southampton project, UCL will be lead partner and we will appoint a junior researcher for a year to develop run experiments that will help us in understanding of the effectiveness of Google Earth Tours in geographical learning, and we aim to come up with guidelines to their use. If you are interested, let me know.

Our main contact at Google for the project is Ed Parsons. We were also helped by Tina Ornduff and Sean Askay who acted as referees for the proposal.
The core question that we want to address is “How can Google Earth Tours be used create an effective learning experience?”

So what do we plan to do? Previous research on Google Earth Tours (GETs) has shown them to be an effective visualization technique for teaching geographical concepts, yet their use in this way is essentially passive.  Active learning is a successful educational approach where student activity is combined with instruction to enhance learning.  In the proposal we suggest that there is great education value in combining the advantages of the rich visualization of GETs with student activities. Evaluating the effectiveness of this combination is the purpose of the project, and we plan to do this by creating educational materials that consist of GETs and activities and testing them against other versions of the materials using student tests, eye tracking and questionnaires as data gathering techniques.

We believe that by improving the techniques by which spatial data is visualized we are improving spatial information access overall.
A nice aspect of the getting the project funded is that it works well with a project that is led by Claire Ellul and Kate Jones and funded by JISC. The G3 project, or “Bridging the Gaps between the GeoWeb and GIS” is touching on similar aspects and we surely going to share knowledge with them.
For more background on Richard Treves, see his blog (where the same post is published!)