Algorithmic governance in environmental information (or how technophilia shape environmental democracy)

These are the slides from my talk at the Algorithmic Governance workshop (for which there are lengthy notes in the previous post). The workshop explored the many ethical, legal and conceptual issues with the transition to Big Data and algorithm based decision-making.

My contribution to the discussion is based on previous thoughts on environmental information and public use of it. Inherently, I see the relationships between environmental decision-making, information, and information systems as something that need to be examined through the prism of the long history that linked them. This way we can make sense of the current trends. This three area are deeply linked throughout the history of the modern environmental movement since the 1960s (hence the Apollo 8 earth image at the beginning),  and the Christmas message from the team with the reference to Genesis (see below) helped in making the message stronger .

To demonstrate the way this triplet evolved, I’m using texts from official documents – Stockholm 1972 declaration, Rio 1992 Agenda 21, etc. They are fairly consistent in their belief in the power of information systems in solving environmental challenges. The core aspects of environmental technophilia are summarised in slide 10.

This leads to environmental democracy principles (slide 11) and the assumptions behind them (slide 12). While information is open, it doesn’t mean that it’s useful or accessible to members of the public. This was true when raw air monitoring observations were released as open data in 1997 (before anyone knew the term), and although we have better tools (e.g. Google Earth) there are consistent challenges in making information meaningful – what do you do with Environment Agency DSM if you don’t know what it is or how to use a GIS? How do you interpret Global Forest Watch analysis about change in tree cover in your area if you are not used to interpreting remote sensing data (a big data analysis and algorithmic governance example)? I therefore return to the hierarchy of technical knowledge and ability to use information (in slide 20) that I covered in the ‘Neogeography and the delusion of democratisation‘ and look at how the opportunities and barriers changed over the years in slide 21.

The last slides show that despite of all the technical advancement, we can have situations such as the water contamination in Flint, Michigan which demonstrate that some of the problems from the 1960s that were supposed to be solved, well monitored, with clear regulations and processes came back because of negligence and lack of appropriate governance. This is not going to be solved with information systems, although citizen science have a role to play to deal with the governmental failure. This whole sorry mess and the re-emergence of air quality as a Western world environmental problem is a topic for another discussion…

Advertisements

Algorithmic Governance Workshop (NUI Galway)

Algorithmic Governance Workshop (source: Niall O Brolchain)

The workshop ‘Algorithmic Governance’ was organised as an intensive one day discussion and research needs development. As the organisers Dr John Danaher
and Dr Rónán Kennedy identified:

‘The past decade has seen an explosion in big data analytics and the use  of algorithm-based systems to assist, supplement, or replace human decision-making. This is true in private industry and in public governance. It includes, for example, the use of algorithms in healthcare policy and treatment, in identifying potential tax cheats, and in stopping terrorist plotters. Such systems are attractive in light of the increasing complexity and interconnectedness of society; the general ubiquity and efficiency of ‘smart’ technology, sometimes known as the ‘Internet of Things’; and the cutbacks to government services post-2008.
This trend towards algorithmic governance poses a number of unique challenges to effective and legitimate public-bureaucratic decision-making. Although many are already concerned about the threat to privacy, there is more at stake in the rise of algorithmic governance than this right alone. Algorithms are step-by-step computer coded instructions for taking some input (e.g. tax return/financial data), processing it, and converting it into an output (e.g. recommendation for audit). When algorithms are used to supplement or replace public decision-making, political values and policies have to be translated into computer code. The coders and designers are given a set of instructions (a project ‘spec’) to guide them in this process, but such project specs are often vague and underspecified. Programmers exercise considerable autonomy when translating these requirements into code. The difficulty is that most programmers are unaware of the values and biases that can feed into this process and fail to consider how those values and biases can manifest themselves in practice, invisibly undermining fundamental rights. This is compounded by the fact that ethics and law are not part of the training of most programmers. Indeed, many view the technology as a value-neutral tool. They consequently ignore the ethical ‘gap’ between policy and code. This workshop will bring together an interdisciplinary group of scholars and experts to address the ethical gap between policy and code.

The workshop was structured around 3 sessions of short presentations of about 12 minutes, with an immediate discussion, and then a workshop to develop research ideas emerging from the sessions. This very long post are my notes from the meeting. These are my takes, not necessarily those of the presenters. For another summery of the day, check John Danaher’s blog post.

Session 1: Perspective on Algorithmic Governance

Professor Willie Golden (NUI Galway)Algorithmic governance: Old or New Problem?’ focused on an information science perspective.  We need to consider the history – an RO Mason paper from 1971 already questioned the balance between the decision-making that should be done by humans, and that part that need to be done by the system. The issue is the level of assumptions that are being integrated into the information system. Today the amount of data that is being collected and the assumption on what it does in the world is a growing one, but we need to remain sceptical at the value of the actionable information. Algorithms needs managers too. Davenport in HBR 2013 pointed that the questions by decision makers before and after the processing are critical to effective use of data analysis systems. In addition, people are very concerned about data – we’re complicit in handing over a lot of data as consumers and the Internet of Things (IoT) will reveal much more. Debra Estrin 2014 at CACM provided a viewpoint – small data, where n = me where she highlighted the importance of health information that the monitoring of personal information can provide baseline on you. However, this information can be handed over to health insurance companies and the question is what control you have over it. Another aspect is Artificial Intelligence – Turing in 1950’s brought the famous ‘Turing test’ to test for AI. In the past 3-4 years, it became much more visible. The difference is that AI learn, which bring the question how you can monitor a thing that learn and change over time get better. AI doesn’t have self-awareness as Davenport 2015 noted in Just How Smart are Smart Machines and arguments that machine can be more accurate than humans in analysing images. We may need to be more proactive than we used to be.

Dr Kalpana Shankar (UCD), ‘Algorithmic Governance – and the
Death of Governance?’ focused on digital curation/data sustainability and implication for governance. We invest in data curation as a socio-technical practice, but need to explore what it does and how effective are current practices. What are the implications if we don’t do ‘data labour’ to maintain it, to avoid ‘data tumbleweed. We are selecting data sets and preserving them for the short and long term. There is an assumption that ‘data is there’ and that it doesn’t need special attention. Choices that people make to preserve data sets will influence the patterns of  what appear later and directions of research. Downstream, there are all sort of business arrangement to make data available and the preserving of data – the decisions shape disciplines and discourses around it – for example, preserving census data influenced many of the social sciences and direct them towards certain types of questions. Data archives influenced the social science disciplines – e.g. using large data set and dismissing ethnographic and quantitative data. The governance of data institutions need to get into and how that influence that information that is stored and share. What is the role of curating data when data become open is another question. Example for the complexity is provided in a study of a system for ‘match making’ of refugees to mentors which is used by an NGO, when the system is from 2006, and the update of job classification is from 2011, but the organisation that use the system cannot afford updating and there is impacts on those who are influenced by the system.

Professor John Morison (QUB), ‘Algorithmic Governmentality’. From law perspective, there is an issue of techno-optimism. He is interested in e-participation and participation in government. There are issue of open and big data, where we are given a vision of open and accountable government and growth in democratisation – e.g. social media revolution, or opening government through data. We see fantasy of abundance, and there are also new feedback loops – technological solutionism to problems in politics with technical fixes. Simplistic solutions to complex issues. For example, an expectation that in research into cybersecurity, there are expectations of creating code as a scholarly output. Big Data have different creators (from Google to national security bodies) and they don’t have the same goals. There is also issues of technological authoritarianism as a tool of control. Algorithmic governance require to engage in epistemology, ontology or governance. We need to consider the impact of democracy – the AI approach is arguing for the democratisation through N=all argument. Leaving aside the ability to ingest all the data, what is seemed to assume that subjects are not viewed any more as individuals but as aggregate that can be manipulated and act upon. Algorithmic governance, there is a false emancipation by promise of inclusiveness, but instead it is responding to predictions that are created from data analysis. The analysis is arguing to be scientific way to respond to social needs. Ideas of individual agency disappear. Here we can use Foucault analysis of power to understand agency.  Finally we also see government without politics – arguing that we make subjects and objects amenable to action. There is not selfness, but just a group prediction. This transcend and obviates many aspects of citizenship.

Niall O’Brolchain (Insight Centre), ‘The Open Government’. There is difference between government and governance. The eGov unit in Galway Insight Centre of Data Analytics act as an Open Data Institute node and part of the Open Government Partnership. OGP involve 66 countries, to promote transparency, empower citizens, fight corruption, harness new technologies to strengthen governance. Started in 2011 and involved now 1500 people, with ministerial level involvement. The OGP got set of principles, with eligibility criteria that involve civic society and government in equal terms – the aim is to provide information so it increase civic participation, requires the highest standards of professional integrity throughout administration, and there is a need to increase access to new technologies for openness and accountability. Generally consider that technology benefits outweigh the disadvantages for citizenship. Grand challenges – improving public services, increasing public integrity, public resources, safer communities, corporate accountability. Not surprisingly, corporate accountability is one of the weakest.

Discussion:

Using the Foucault framework, the question is about the potential for resistance that is created because of the power increase. There are cases to discuss about hacktivism and use of technologies. There is an issue of the ability of resisting power – e.g. passing details between companies based on prediction. The issue is not about who use the data and how they control it. Sometime need to use approaches that are being used by illegal actors to hide their tracks to resist it.
A challenge to the workshop is that the area is so wide, and we need to focus on specific aspects – e.g. use of systems in governments, and while technology is changing. Interoperability.  There are overlaps between environmental democracy and open data, with many similar actors – and with much more government buy-in from government and officials. There was also technological change that make it easier for government (e.g. Mexico releasing environmental data under OGP).
Sovereignty is also an issue – with loss of it to technology and corporations over the last years, and indeed the corporate accountability is noted in the OGP framework as one that need more attention.
There is also an issue about information that is not allowed to exists, absences and silences are important. There are issues of consent – the network effects prevent options of consent, and therefore society and academics can force businesses to behave socially in a specific way. Keeping of information and attributing it to individuals is the crux of the matter and where governance should come in. You have to communicate over the internet about who you are, but that doesn’t mean that we can’t dictate to corporations what they are allowed to do and how to use it. We can also consider of privacy by design.

Session 2: Algorithmic Governance and the State

Dr Brendan Flynn (NUI Galway), ‘When Big Data Meets Artificial Intelligence will Governance by Algorithm be More or Less Likely to Go to War?’. When looking at autonomous weapons we can learn about general algorithmic governance. Algorithmic decision support systems have a role to play in very narrow scope – to do what the stock market do – identifying very dangerous response quickly and stop them. In terms of politics – many things will continue. One thing that come from military systems is that there are always ‘human in the loop’ – that is sometime the problem. There will be HCI issues with making decisions quickly based on algorithms and things can go very wrong. There are false positive cases as the example of the USS Vincennes that uses DSS to make a decision on shooting down a passenger plane. The decision taking is limited by the decision shaping, which is handed more and more to algorithms. There are issues with the way military practices understand command responsibility in the Navy, which put very high standard from responsibility of failure. There is need to see how to interpret information from black boxes on false positives and false negatives. We can use this extreme example to learn about civic cases. Need to have high standards for officials. If we do visit some version of command responsibility to those who are using algorithms in governance, it is possible to put responsibility not on the user of the algorithm and not only on the creators of the code.

Dr Maria Murphy (Maynooth), ‘Algorithmic Surveillance: True
Negatives’. We all know that algorithmic interrogation of data for crime prevention is becoming commonplace and also in companies. We know that decisions can be about life and death. When considering surveillance, there are many issues. Consider the probability of assuming someone to be potential terrorist or extremist. In Human Rights we can use the concept of private life, and algorithmic processing can challenge that. Article 8 of the Human Right Convention is not absolute, and can be changed in specific cases – and the ECHR ask for justifications from governments, to show that they follow the guidelines. Surveillance regulations need to explicitly identify types of people and crimes that are open to observations. You can’t say that everyone is open to surveillance. When there are specific keywords that can be judged, but what about AI and machine learning, where the creator can’t know what will come out? There is also need to show proportionality to prevent social harm. False positives in algorithms – because terrorism are so rare, there is a lot of risk to have a bad impact on the prevention of terrorism or crime. The assumption of more data is better data, we left with a problem of generalised surveillance that is seen as highly problematic. Interestingly the ECHR do see a lot of potential in technologies and their potential use by technologies.

Professor Dag Weise Schartum (University of Oslo), ‘Transformation of Law into Algorithm’. His focus was on how algorithms are created, and thinking about this within government systems. They are the bedrock of our welfare systems – which is the way they appear in law. Algorithms are a form of decision-making: general decisions about what should be regarded, and then making decisions. The translation of decisions to computer code, but the raw material is legal decision-making process and transform them to algorithms. Programmers do have autonomy when translating requirements into code – the Norwegian experience show close work with experts to implement the code. You can think of an ideal transformation model of a system to algorithms, that exist within a domain – service or authority of a government, and done for the purpose of addressing decision-making. The process is qualification of legal sources, and interpretations that are done in natural language, which then turn into specification of rules, and then it turns into a formal language which are then used for programming and modelling it. There are iterations throughout the process, and the system is being tested, go through a process of confirming the specification and then it get into use. It’s too complex to test every aspect of it, but once the specifications are confirmed, it is used for decision-making.  In terms of research we need to understand the transformation process in different agency – overall organisation, model of system development, competences, and degree of law-making effects. The challenge is the need to reform of the system: adapting to changes in the political and social change over the time. Need to make the system flexible in the design to allow openness and not rigidness.

Heike Felzman (NUI Galway), ‘The Imputation of Mental Health
from Social Media Contributions’ philosophy and psychological background. Algorithms can access different sources – blogs, social media and this personal data are being used to analyse mood analysis, and that can lead to observations about mental health. In 2013, there are examples of identifying of affective disorders, and the research doesn’t consider the ethical implication. Data that is being used in content, individual metadata like time of online activities, length of contributions, typing speed. Also checking network characteristics and biosensing such as voice, facial expressions. Some ethical challenges include: contextual integrity (Nissenbaum 2004/2009) privacy expectations are context specific and not as constant rules. Secondly, lack of vulnerability protection – analysis of mental health breach the rights of people to protect their health. Third, potential negative consequences, with impacts on employment, insurance, etc. Finally, the irrelevance of consent – some studies included consent in the development, but what about applying it in the world. We see no informed consent, no opt-out, no content related vulnerability protections, no duty of care and risk mitigation, there is no feedback and the number of participants number is unlimited. All these are in contrast to practices in Human Subjects Research guidelines.

Discussion:

In terms of surveillance, we should think about self-surveillance in which the citizens are providing the details of surveillance yourself. Surveillance is not only negative – but modern approach are not only for negative reasons. There is hoarding mentality of the military-industrial complex.
The area of command responsibility received attention, with discussion of liability and different ways in which courts are treating military versus civilian responsibility.

Panel 3: Algorithmic Governance in Practice

Professor Burkhard Schafer (Edinburgh), ‘Exhibit A – Algorithms as
Evidence in Legal Fact Finding’. The discussion about legal aspects can easily go to 1066 – you can go through a whole history. There are many links to medieval law to today. As a regulatory tool, there is the issue with the rule of proof. Legal scholars don’t focus enough on the importance of evidence and how to understand it. Regulations of technology is not about the law but about the implementation on the ground, for example in the case of data protection legislations. In a recent NESTA meeting, there was a discussion about the implications of Big Data – using personal data is not the only issue. For example, citizen science project that show low exposure to emission, and therefore deciding that it’s relevant to use the location in which the citizens monitored their area as the perfect location for a polluting activity – so harming the person who collected data. This is not a case of data protection strictly. How can citizen can object to ‘computer say no’ syndrome? What are the minimum criteria to challenge such a decision? What are the procedural rules of fairness. Have a meaningful cross examination during such cases is difficult in such cases. Courts sometimes accept and happy to use computer models, and other times reluctant to take them. There are issues about the burden of proof from systems (e.g. to show that ATM was working correctly when a fraud was done). DNA tests are relying on computer modelling, but systems that are proprietary and closed. Many algorithms are hidden for business confidentiality and there are explorations of these issues. One approach is to rely on open source tools. Replication is another way of ensuring the results. Escrow ownership of model by third party is another option. Next, there is a possibility to questioning software, in natural language.

Dr Aisling de Paor (DCU), ‘Algorithmic Governance and Genetic Information’ – there is an issue in law, and massive applications in genetic information. There is rapid technological advancement in many settings, genetic testing, pharma and many other aspects – indications of behavioural traits, disability, and more. There are competing rights and interests. There are rapid advances in this area – use in health care, and the technology become cheaper (already below $1000). Genetic information. In commercial settings use in insurance, valuable for economic and efficiency in medical settings. There is also focus on personalised medicine. A lot of the concerns are about misuse of algorithms. For example, the predictive assumption about impact on behaviour and health. The current state of predictability is limited, especially the environmental impacts on expressions of genes. There is conflicting rights – efficiency and economic benefits but challenge against human rights – e.g. right to privacy . Also right for non-discrimination – making decisions on the basis of probability may be deemed as discriminatory. There are wider societal and public policy concerns – possible creation of genetic underclass and the potential of exacerbate societal stigma about disability, disease and difference. Need to identify gaps between low, policy and code, decide use, commercial interests and the potential abuses.

Anthony Behan (IBM but at a personal capacity), ‘Ad Tech, Big Data and Prediction Markets: The Value of Probability’. Thinking about advertising, it is very useful use case to consider what happen in such governance processes. What happen in 200 milliseconds for advertising, which is the standards on the internet. The process of real-time-bid is becoming standardised. Start from a click – the publisher invokes an API and give information about the interactions from the user based on their cookie and there are various IDs. Supply Side Platform open an auction. on the demand side, there are advertisers that want to push content to people – age group, demographic, day, time and objectives such as click through rates. The Demand Side platform looks at the SSPs. Each SSP is connected to hundreds of Demand Side Platforms (DSPs). Complex relationships exist between these systems. There are probability score or engage in a way that they want to engage, and they offer how much it is worth for them – all in micropayment. The data management platform (DMP) is important to improve the bidding. e.g., if they can get information about users/platform/context at specific times places etc is important to guess how people tend to behave. The economy of the internet on advert is based on this structure. We get abstractions of intent – the more privacy was invaded and understand personality and intent, the less they were interested in a specific person but more in the probability and the aggregate. Viewing people as current identity and current intent, and it’s all about mathematics – there are huge amount of transactions, and the inventory become more valuable. The interactions become more diverse with the Internet of Things. The Internet become a ‘data farm’ – we started with a concept that people are valuable, to view that data is valuable and how we can extract it from people. Advertising goes into the whole commerce element.

I’ll blog about my talk ‘Algorithmic Governance in Environmental Information (or How Technophilia Shapes Environmental Democracy) later.

 Discussion:

There are issues with genetics and eugenics. Eugenics fell out of favour because of science issues, and the new genetics is claiming much more predictive power. In neuroscience there are issues about brain scans, which are not handled which are based on insufficient scientific evidence. There is an issue with discrimination – shouldn’t assume that it’s only negative. Need to think about unjustified discrimination. There are different semantic to the word. There are issues with institutional information infrastructure.

Data and the City workshop (day 2)

The second day of the Data and City Workshop (here are the notes from day 1) started with the session Data Models and the City.

Pouria Amirian started with Service Oriented Design and Polyglot Binding for Efficient Sharing and Analysing of Data in Cities. The starting point is that management of the city need data, and therefore technologies to handle data are necessary. In traditional pipeline, we start from sources, then using tools to move them to data warehouse, and then doing the analytics. The problems in the traditional approach is the size of data – the management of the data warehouse is very difficult, and need to deal with real-time data that need to answer very fast and finally new data types – from sensors, social media and cloud-born data that is happening outside the organisation. Therefore, it is imperative to stop moving data around but analyse them where they are. Big Data technologies aim to resolve these issues – e.g. from the development of Google distributed file system that led to Hadoop to similar technologies. Big Data relate to the technologies that are being used to manage and analyse it. The stack for managing big data include now over 40 projects to support different aspects of the governance, data management, analysis etc. Data Science is including many areas: statistics, machine learning, visualisation and so on – and no one expert can know all these areas (such expert exist as much as unicorns exist). There is interaction between data science researchers and domain experts and that is necessary for ensuring reasonable analysis. In the city context, these technologies can be used for different purposes – for example deciding on the allocation of bikes in the city using real-time information that include social media (Barcelona). We can think of data scientists as active actors, but there are also opportunities for citizen data scientists using tools and technologies to perform the analysis. Citizen data scientists need data and tools – such as visual analysis language (AzureML) that allow them to create models graphically and set a process in motion. Access to data is required to facilitate finding the data and accessing it – interoperability is important. Service oriented architecture (which use web services) is an enabling technology for this, and the current Open Geospatial Consortium (OGC) standards require some further development and changes to make them relevant to this environment. Different services can provided to different users with different needs [comment: but that increase in maintenance and complexity]. No single stack provides all the needs.

Next Mike Batty talked about Data about Cities: Redefining Big, Recasting Small (his paper is available here) – exploring how Big Data was always there: locations can be seen are bundles of interactions – flows in systems. However, visualisation of flows is very difficult, and make it challenging to understand the results, and check them. The core issue is that in N locations there are N^2 interactions, and the exponential growth with the growth of N is a continuing challenge in understanding and managing cities. In 1964, Brian Berry suggested a system on location, attributes and time – but temporal dimension was suppressed for a long time. With Big Data, the temporal dimension is becoming very important. An example of how understanding data is difficult is demonstrated with understanding travel flows – the more regions are included, the bigger the interaction matrix, but it is then difficult to show and make sense of all these interactions. Even trying to create scatter plots is complex and not helping to reveal much.

The final talk was from Jo Walsh titled Putting Out Data Fires; life with the OpenStreetMap Data Working Group (DWG) Jo noted that she’s talking from a position of volunteer in OSM, and recall that 10 years ago she gave a talk about technological determinism but not completely a utopian picture about cities , in which OpenStreetMap (OSM) was considered as part of the picture. Now, in order to review the current state of OSM activities relevant for her talk, she asked in the OSM mailing list for examples. She also highlighted that OSM is big, but it’s not Big Data- it can still fit to one PostGres installation. There is no anonymity in the system – you can find quite a lot about people from their activity and that is built into the system. There are all sort of projects that demonstrate how OSM data is relevant to cities – such as OSM building to create 3D building from the database, or use OSM in 3D modelling data such as DTM. OSM provide support for editing in the browser or with offline editor (JOSM). Importantly it’s not only a map, but OSM is also a database (like the new OSi database) – as can be shawn by running searches on the database from web interface. There are unexpected projects, such as custom clothing from maps, or Dressmap. More serious surprises are projects like the humanitarian OSM team and the Missing Maps projects – there are issues with the quality of the data, but also in the fact that mapping is imposed on an area that is not mapped from the outside, and some elements of colonial thinking in it (see Gwilym Eddes critique) . The InaSAFE project is an example of disaster modeling with OSM. In Poland, they extend the model to mark details of road areas and other details. All these are demonstrating that OSM is getting close to the next level of using geographic information, and there are current experimentations with it. Projects such as UTC of Mappa Marcia is linking OSM to transport simulations. Another activity is the use of historical maps – townland.ie .
One of the roles that Jo play in OSM is part of the data working group, and she joined it following a discussion about diversity in OSM within the community. The DWG need some help, and their role is geodata thought police/Janitorial judicial service/social work arm of the volunteer fire force. DWG clean up messy imports, deal with vandalisms, but also deal with dispute resolutions. They are similar to volunteer fire service when something happens and you can see how the sys admins sparking into action to deal with an emerging issue. Example, someone from Ozbekistan saying that they found corruption with some new information, so you need to find out the changeset, asking people to annotate more, say what they are changing and why. OSM is self policing and self regulating – but different people have different ideas about what they are doing. For example, different groups see the view of what they want to do. There are also clashes between armchair mapping and surveying mappers – a discussion between someone who is doing things remotely, and the local person say that know the road and asking to change the editing of classification. DWG doesn’t have a legal basis, and some issues come up because of the global cases – so for example translated names that does not reflect local practices. There are tensions between commercial actors that do work on OSM compared to a normal volunteer mappers. OSM doesn’t have privileges over other users – so the DWG is recognised by the community and gathering authority through consensus.

The discussion that follows this session explored examples of OSM, there are conflicted areas such as Crimea nad other contested territories. Pouria explained that distributed computing in the current models, there are data nodes, and keeping the data static, but transferring the code instead of data. There is a growing bottleneck in network latency due to the amount of data. There are hierarchy of packaging system that you need to use in order to work with distributed web system, so tightening up code is an issue.
Rob – there are limited of Big Data such as hardware and software, as well as the analytics of information. The limits in which you can foster community when the size is very large and the organisation is managed by volunteers. Mike – the quality of big data is rather different in terms of its problem from traditional data, so while things are automated, making sense of it is difficult – e.g. tap in but without tap out in the Oyster data. The bigger the dataset, there might be bigger issues with it. The level of knowledge that we get is heterogeneity in time and transfer the focus to the routine. But evidence is important to policy making and making cases. Martijn – how to move the technical systems to allow the move to focal community practice? Mike – the transport modelling is based on promoting digital technology use by the funders, and it can be done for a specific place, and the question is who are the users? There is no clear view of who they are and there is wide variety, different users playing different roles – first, ‘policy analysts’ are the first users of models – they are domain experts who advise policy people. less thinking of informed citizens. How people react to big infrastructure projects – the articulations of the policy is different from what is coming out of the models. there are projects who got open and closed mandate. Jo – OSM got a tradition of mapping parties are bringing people together, and it need a critical mass already there – and how to bootstrap this process, such as how to support a single mapper in Houston, Texas. For cases of companies using the data while local people used historical information and created conflict in the way that people use them. There are cases that the tension is going very high but it does need negotiation. Rob – issues about data citizens and digital citizenship concepts. Jo – in terms of community governance, the OSM foundation is very hands off, and there isn’t detailed process for dealing with corporate employees who are mapping in their job. Evelyn – the conventions are matters of dispute and negotiation between participants. The conventions are being challenged all the time. One of the challenges of dealing with citizenship is to challenge the boundaries and protocols that go beyond the state. Retain the term to separate it from the subject.

The last session in the workshop focused on Data Issues: surveillance and crime 

David Wood talked about Smart City, Surveillance City: human flourishing in a data-driven urban world. The consideration is of the smart cities as an archetype of the surveillance society. Especially trying to think because it’s part of Surveillance Society, so one way to deal with it is to consider resistance and abolishing it to allow human flourishing. His interest is in rights – beyond privacy. What is that we really want for human being in this data driven environment? We want all to flourish, and that mean starting from the most marginalised, at the bottom of the social order. The idea of flourishing is coming from Spinoza and also Luciano Floridi – his anti-enthropic information principle. Starting with the smart cities – business and government are dependent on large quant of data, and increase surveillance. Social Science ignore that these technology provide the ground for social life. The smart city concept include multiple visions, for example, a European vision that is about government first – how to make good government in cities, with technology as part of a wider whole. The US approach is about how can we use information management for complex urban systems? this rely on other technologies – pervasive computing, IoT and things that are weaved into the fabric of life. The third vision is Smart Security vision – technology used in order to control urban terrain, with use of military techniques to be used in cities (also used in war zones), for example biometrics systems for refugees in Afghanistan which is also for control and provision of services. The history going back to cybernetics and policing initiatives from the colonial era. The visions overlap – security is not overtly about it (apart from military actors). Smart Cities are inevitably surveillance cities – a collection of data for purposeful control of population. Specific concerns of researchers – is the targeting of people that fit a profile of a certain kind of people, aggregation of private data for profit on the expense of those that are involved. The critique of surveillance is the issue of sorting, unfair treatment of people etc. Beyond that – as discussed in the special issue on surveillance and empowerment– there are positive potentials. Many of these systems have a role for the common good. Need to think about the city within neoliberal capitalism, separate people in space along specific lines and areas, from borders to building. Trying to make the city into a tamed zone – but the danger parts of city life are also source for opportunities and creativity. The smart city fit well to this aspect – stopping the city from being disorderly. There is a paper from 1995 critique pervasive computing as surveillance and reduce the distance between us and things, the more the world become a surveillance device and stop us from acting on it politically. In many of the visions of the human in pervasive computing is actually marginalised. This is still the case. There are opportunities for social empowerment, say to allow elderly to move to areas that they stop exploring, or use it to overcome disability. Participation, however, is flawed – who can participate in what, where and how? additional questions are that participation in highly technical people is limited to a very small group, participation can also become instrumental – ‘sensors on legs’. The smart city could enable to discover the beach under the pavement (a concept from the situationists) – and some are being hardened. The problem is corporate ‘wall garden’ systems and we need to remember that we might need to bring them down.

Next Francisco Klauser talked about Michel Foucault and the smart city: power dynamics inherent in contemporary governing through code. Interested in power dynamics of governing through data. Taking from Foucault the concept of understanding how we can explain power put into actions. Also thinking about different modes of power: Referentiality – how security relate to governing? Normativity – looking at what is the norm and where it is came from? Spatiality – how discipline and security is spread across space. Discipline is how to impose model of behaviour on others (panopticon). Security work in another way – it is free things up within the limits. So the two modes work together. Power start from the study of given reality. Data is about the management of flows. The specific relevance to data in cities is done by looking at refrigerated warehouses that are used within the framework of smart grid to balance energy consumption – storing and releasing energy that is preserved in them. The whole warehouse has been objectified and quantified – down to specific product and opening and closing doors. He see the core of the control through connections, processes and flows. Think of liquid surveillance – beyond the human.

Finally, Teresa Scassa explored Crime Data and Analytics: Accounting for Crime in the City. Crime data is used in planning, allocation of resources, public policy making – broad range of uses. Part of oppositional social justice narratives, and it is an artefact of the interaction of citizen and state, as understood and recorded by the agents of the state operating within particular institutional cultures. Looking at crime statistics that are provided to the public as open data – derived from police files under some guidelines, and also emergency call data which made from calls to the policy to provide crime maps. The data that use in visualisation about the city is not the same data that is used for official crime statistics. There are limits to the data – institutional factors: it measure the performance of the police, not crime. It’s how police are doing their job – and there are lots of acts of ‘massaging’ the data by those that are observed. The stats are manipulated to produce the results that are requested. The police are the sensors, and there is unreporting of crime according to the opinion of police person – e.g. sexual assault, and also the privatisation of policing who don’t report. Crime maps are offered by private sector companies that sell analytics, and then provide public facing option – the narrative is controlled – what will be shared and how. Crime maps are declared as ‘public awareness or civic engagement’ but not transparency or accountability. Focus on property offence and not white collar one. There are ‘alternalytics’ – using other sources, such as victimisation survey, legislation, data from hospital, sexual assault crisis centres, and crowdsourcing. Example of the reporting bottom up is harrassmap to report cases that started in Egypt. Legal questions are how relationship between private and public sector data affect ownership, access and control. Another one is how the state structure affect data comparability and interoperability. Also there is a question about how does law prescribe and limit what data points can be collected or reported.

The session closed with a discussion that explored some examples of solutionism  like crowdsourcing that ask the most vulnerable people in society to contribute data about assault against them which is highly problematic. The crime data is popular in portals such as the London one, but it is mixed into multiple  concerns such as property price. David – The utopian concept of platform independence, and assuming that platforms are without values is inherently wrong.

The workshop closed with a discussion of the main ideas that emerged from it and lessons. How are all these things playing out. Some questions that started emerging are questions on how crowdsourcing can be bottom up (OSM) and sometime top-down, with issues about data cultures in Citizen Science, for example. There are questions on to what degree the political aspects of citizenship and subjectivity are playing out in citizen science. Re-engineering information in new ways, and rural/urban divide are issues that bodies such as Ordnance Survey need to face, there are conflicts within data that is an interesting piece, and to ensure that the data is useful. The sensors on legs is a concept that can be relevant to bodies such as Ordnance Survey. The concept of stack – it also relevant to where we position our research and what different researchers do: starting from the technical aspects to how people engage, and the workshop gave a slicing through these layers. An issue that is left outside is the business aspect – who will use it, how it is paid. We need the public libraries with the information, but also the skills to do things with these data. The data economy is important and some data will only produced by the state, but there are issues with the data practices within the data agencies within the state – and it is not ready to get out. If data is garbage, you can’t do much with it – there is no economy that can be based on it. An open questions is when data produce software? when does it fail? Can we produce data with and without connection to software? There is also the physical presence and the environmental impacts. Citizen engagement about infrastructure is lacking and how we tease out how things open to people to get involved. There was also need to be nuanced about the city the same way that we focused on data. Try to think about the way the city is framed: as a site to activities, subjectivity, practices; city as a source for data – mined; city as political jurisdiction; city as aspiration – the city of tomorrow; city as concentration of flows; city as a social-cultural system; city as a scale for analysis/ laboratory. The title and data and the city – is it for a city? Back to environmental issues – data is not ephemeral and does have tangible impacts (e.g. energy use in blockchain, inefficient algorithms, electronic WEEE that is left in the city). There are also issues of access and control – huge volumes of data. Issues are covered in papers such as device democracy. Wider issues that are making link between technology and wider systems of thought and considerations.

Data and the City workshop (day 1)

The workshop, which is part of the Programmable City project (which is funded by the European Research Council), is held in Maynooth on today and tomorrow. The papers and discussions touched multiple current aspects of technology and the city: Big Data, Open Data, crowdsourcing, and critical studies of data and software. The notes below are focusing on aspects that are relevant to Volunteered Geographic Information (VGI), Citizen Science and participatory sensing – aspects of Big Data/Open data are noted more briefly.

Rob Kitchin opened with a talk to frame the workshop, highlighting the history of city data (see his paper on which the talk is based). We are witnessing a transformation from data-informed cities to data-driven cities. Within these data streams we can include Big Data, official data, sensors, drones and other sources. The sources also include volunteered information such as social media, mapping, and citizen science. Cities are becoming instrumented and networked and the data is assembled through urban informatics (focusing on interaction and visualisation) and urban science (which focus on modelling and analysis( . There is a lot of critique – with relations to data, there are questions about the politics of urban data, corporatisation of governance, the use of buggy, brittle and hackable urban systems, and social and ethical aspects.  Examples to these issues include politics: accepting that data is not value free or objective and influenced by organisations with specific interest and goals. Another issue is the corporatisation of data, with questions about data ownership and data control. Further issues of data security and data integrity when systems are buggy and brittle – there have been cases of hacking into a city systems already. Social, Political, and ethical aspects include data protection and privacy, dataveillance/surveillance, social sorting through algorithms, control creep, dynamic pricing and anticipatory governance (expecting someone to be a criminal). There are also technical questions: coverage, integration between systems, data quality and governance (and the communication of information about quality), and skills and organisational capabilities to deal with the data.
The workshop is to think critically about the data, and asking questions on how this data is constructed and run.

The talk by Jim Thatcher & Craig Dalton – explored provenance models of data. A core question is how to demonstrate that data is what is saying it is and where it came from. In particular, they consider how provenance applies to urban data. There is an epistemological leap from an individual (person) to a data point(s) – per person there can be up to 1500 data attribute per person in corporate database. City governance require more provenance in information than commercial imperatives. They suggest that data user and producers need to be aware of the data and how it is used.

Evelyn Ruppert asked where are the data citizens? Discuss the politics in data, and thinking about the people as subjects in data – seeing people as actors who are intentional and political in their acts of creating data. Being digital mediates between people and technology and what they do. There are myriad forms of subjectivation – there are issues of rights and how people exercise these rights. Being a digital citizens – there is not just recipient of rights but also the ability to take and assert rights. She used the concept of cyberspace as it is useful for understanding rights of the people who use it, while being careful about what it means. There is conflation of cyberspace and the Internet and failures to see it as completely separate space. She sees Cyberspace is the set of relations and engagements that are happening over the Internet. She referred to her recent book ‘Being Digital Citizens‘. Cyberspace has relationships to real space – in relations to Lefebvre concepts of space. She use speech-act theory that explore the ability to act through saying things, and there is a theoretical possibility of performativity in speech. We are not in command of what will happen with speech and what will be the act. We can assert acts through the things we do, and not only in the thing we say and that’s what is happening with how people use the Internet and construct cyberspace.

Jo Bates talked about data cultures and power in the city. Starting from hierarchy in dat and information. Data can be thought as ‘alleged evidence’ (Buckland) – data can be thought as material, they are specific things – data have dimensionality, weight and texture and it is existing something. Cox, in 1981, view the relationship between ideas, institutions and material capabilities – and the tensions between them – institutions are being seen as stabilising force compare to ideas and material capabilities, although the institutions may be outdated. She noted that sites of data cultures are historically constituted but also dynamic and porous – but need to look at who participate and how data move.

The session followed by a discussion, some of the issues: I’ve raised the point of the impact of methodological individualism on Evelyn and Jim analysis – for Evelyn, the digital citizenship is for collectives, and for Jim, the provenance and use of devices is done as part of collectives and data cultures. Jo explored the idea of “progressive data culture” and suggested that we don’t understand what are the conditions for it yet – the inclusive, participatory culture is not there. For Evelyn, data is only possible through the action of people who are involved in its making, and the private ownership of this data does not necessarily make sense in the long run. Regarding hybrid space view of cyberspace/urban spaces – they are overlapping and it is not helpful to try and separate them. Progressive data cultures require organisational change at government and other organisations. Tracey asked about work on indigenous data, and the way it is owned by the collective – and  noted that there are examples in the arctic with a whole setup for changing practices towards traditional and local knowledge. The provenance goes all the way to the community, the Arctic Spatial Data Infrastructure there are lots of issues with integrating indigenous knowledge into the general data culture of the system. The discussion ended with exploration of the special case of urban/rural – noting to the code/space nature of agricultural spaces, such as the remote control of John Deere tractors, use of precision agriculture, control over space (so people can’t get into it), tagged livestock as well as variable access to the Internet, speed of broadband etc.

The second session looked at Data Infrastructure and platforms, starting with Till Straube who looked at Situating Data Infrastructure. He highlighted that Git (GitHub) blurs the lines between code and data, which is also in functional programming – code is data and data is code. He also looked at software or conceptual technology stacks, and hardware is at the bottom. He therefore use the concept of topology from Science and Technology Studies and Actor-Network Theory to understand the interactions.

Tracey Lauriaultontologizing the city – her research looked at the transition of Ordnance Survey Ireland (OSi) with their core GIS – the move towards object-oriented and rules based database. How is the city translated into data and how the code influence the city? She looked at OSi, and the way it produce the data for the island, and providing infrastructure for other bodies (infrastructure). OSi started as colonial projects, and moved from cartographical maps and digital data model to a full object-oriented structure. The change is about understanding and conceptualising the mapping process. The ontology is what are the things that are important for OSi to record and encode – and the way in which the new model allows to reconceptualise space – she had access to a lot of information about the engineering, tendering and implementation process, and also follow some specific places in Dublin. She explore her analysis methods and the problems of trying to understand how the process work even when you have access to information.

The discussion that follows explored the concept of ‘stack’ but also ideas of considering the stack at planetary scale. The stack is pervading other ways of thinking – stack is more than a metaphor: it’s a way of thinking about IT development, but it can be flatten. It gets people to think how things are inter-relations between different parts. Tracey: it is difficult to separate the different parts of the system because there is so much interconnection. Evelyn suggested that we can think about the way maps were assembled and for what purpose, and understanding how the new system is aiming to give certain outcomes. To which Tracey responded that the system moved from a map to a database, Ian Hacking approach to classification system need to be tweaked to make it relevant and effective for understanding systems like the one that she’s exploring. The discussion expanded to questions about how large systems are developed and what methodologies can be used to create systems that can deal with urban data, including discussion of software engineering approaches, organisational and people change over time, ‘war stories’ of building and implementing different systems, etc.

The third and last session was about data analytics and the city – although the content wasn’t exactly that!

Gavin McArdle covered his and Rob Kitchin paper on the veracity of open and real-time urban data. He highlighted the value of open data – from claims of transparency and enlighten citizens to very large estimation of the business value. Yet, while data portals are opening in many cities, there are issues with the veracity of the data – metadata is not provided along the data. He covered spatial data quality indicators from ISO, ICA and transport systems, but questioned if the typical standard for data are relevant in the context of urban data, and maybe need to reconsider how to record it. By looking at 2 case studies, he demonstrated that data is problematic (e.g. indicating travel in the city of 6km in 30 sec). Communicating the changes in the data to other users is an issue, as well as getting information from the data providers – maybe possible to have meta-data catalogue that add information about a dataset and explanation on how to report veracity. There are facilities in Paris and Washington DC, but they are not used extensively

Next, Chris Speed talked about blockchain city – spatial, social and cognitive ledgers, exploring the potential of distributed recording of information as a way to create all forms of markets in information that can be controlled by different actors.

I have closed the session with a talk that is based on my paper for the workshop, and the slides are available below.

The discussion that followed explored aspects of representation and noise (produced by people who are monitored, instruments or ‘dirty’ open data), and some clarification of the link between the citizen science part and the philosophy of technology part of my talk – highlighting that Borgmann use of ‘natural’,’cultural’ and ‘technological’ information should not be confused with the everyday use of these words.

Beyond quantification: a role for citizen science and community science in a smart city

Arduino sensing in MaltaThe Data and the City workshop will run on the 31st August and 1st September 2015, in Maynooth University, Ireland. It is part of the Programmable City project, led by Prof Rob Kitchin. My contribution to the workshop is titled Beyond quantification: a role for citizen science and community science in a smart city and is extending a short article from 2013 that was published by UCL’s Urban Lab, as well as integrating concepts from philosophy of technology that I have used in a talk at the University of Leicester. The abstract of the paper is:

“When approaching the issue of data in Smart Cities, there is a need to question the underlying assumptions at the basis of Smart Cities discourse and, especially, to challenge the prevailing thought that efficiency, costs and productivity are the most important values. We need to ensure that human and environmental values are taken into account in the design and implementation of systems that will influence the way cities operate and are governed. While we can accept science as the least worst method of accumulating human knowledge about the natural world, and appreciate its power to explain and act in the world, we need to consider how it is applied within the city in a way that does leave space for cultural, environmental and religious values. This paper argues that a specific form of collaborative science – citizen science and community science – is especially suitable for making Smart Cities meaningful and democratic. The paper use concepts from Albert Borgmann’s philosophy of technology – especially those of the Device Paradigm and Focal Practices, to identify the areas were sensing the city can gain meaning for the participants.”

The paper itself can be accessed here.

Other papers from the same workshop that are already available include:

Rob Kitchin: Data-Driven, Networked Urbanism

Gavin McArdle & Rob Kitchin: Improving the Veracity of Open and Real-Time Urban Data

Michael Batty: Data About Cities: Redefining Big, Recasting Small

More details on the workshop will appear on the project website