Transport data in an insecure world

"Data tends to gain value when it is correlated with other data"

Transport Systems Catapult CEO Paul Campion talks Data security, ownership, privacy and GDPR and how they effect data use in the transport sector.

A commentator on my recent blog post about the need for transport data sharing made an important observation. He commented: “So long as there are no bad actors attempting to misuse the data this would be perfect. Any thought about how to benefit from technology should consider how it could be abused, and how to safeguard against such abuse.” Well said, and a good challenge. So let me try to respond to that.

Simon Bradley’s excellent 2015 book “The Railways. Nation, Network and People” taught me many things, not least just how dangerous the railways were in their early years. The first and most famous casualty, the Liverpool MP Sir William Huskisson, who was killed in 1830 on the opening day of the Liverpool and Manchester Railway, is perhaps to be compared to the unfortunate pedestrian killed by an Uber autonomous car on trial recently; it is understandable, if not excusable, that new transport technologies may have unrecognised dangers on first introduction.

Thirty years later, the railways were still taking a heavy toll. Between 1860 and 1864 there were an average of 53 train crashes a year, growing to 145 by 1870-74 and in 1861. That represented 141 fatalities to passengers and bystanders (railway workers deaths not included.) The UK rail network over the last ten years, by contrast, has not suffered a single passenger fatality, despite a significantly greater number of journeys and passengers miles than 150 years ago.

What happened in that century and a half is that society came to better understand the risks of the new technology and, through the political process, caused the industry to seek ways to ameliorate those risks. The risks were managed in a variety of ways, not least engineering and the way that the technology was used and managed through processes, procedures, protocols and rules.

The widespread use of data about the consumers of goods and services is, very roughly speaking, about twenty years old (a simplification, of course, but the sort of collection and use of data that my commentator was concerned about can, I think, be associated with the use of the internet, whose widespread use is about two decades old). The techniques of data collection and use, their commercialisation and the abuses, deliberate, indirect and unintended of the use of that data are only now being to come to the notice of society as a whole (although warning voices have been heard for many years).

I am a technological optimist. I believe that the many, highly talented engineers working in the IT, communications and transport industries are perfectly capable of coming up with the technical tools and techniques to enable the safe and secure use of data – if the social, commercial and legal requirements request them so to do.

Who owns data?

Let us think, then, about what the constraints and safeguards might look like in a shared transport data future.

Firstly, a few words about words. Data can be tricky to think about because it behaves differently to a lot of the usual, physical stuff that we are comfortable thinking about in the transport world. Let me make a statement that might be too obvious to be helpful: if I own a car you do not. That is in the nature of physical things: if a car is on my drive it is not on yours. Of course, that is a simplification of the actual real-world situation because the car on my drive is quite likely to be leased which means that I have the beneficial ownership rights to the car even though the legal rights reside somewhere else. That is fine by me: I am happy that I get to drive the car around at my discretion) knowing that someone else had the right to sell the car after, say, three years. But, anyway, at least it is unequivocally on my drive.

But what does it mean to say I ‘own’ data? When I sign up to the loyalty scheme of a chain of coffee shops I agree to tell them certain things about me – in return for payment. Specifically, I tell them my email and perhaps my birthday, and allow them to send me emails, and to collect data about when I visit their stores and what I buy there, in return for free coffee.

They pay for costs of collecting that data and for holding and processing it and they are doing so because that makes them more money. In other words, the coffee shop thinks they ‘own’ the data about me. But the data is about me, so why don’t I own it? Where data is concerned the word ‘own’ can be a false friend. There is a bundle or rights and obligations that can be distributed amongst a set of different people, and the data can be in multiple places at the same time. It is different from a car.

The Impact of GDPR

Society’s growing awareness and concerns about this new technology is causing the lawmakers to take action (thankfully more quickly than in the historical case of the railway). The EU’s GDPR legislation is a major step forward and is likely to be the de facto common denominator in the western world.

GDPR sets out some basic rights and obligations on the various players. In the example I just gave I have the right to enter into a contractual relationship with a third party (e.g. the coffee shop) to agree that they can collect data about me in return for payment. They have the obligation to tell me how they are going to use that data (and with whom they are going to share it), and to respond to my requests to correct or erase the data if I so choose.

This sounds simple and fair but is, in fact, a dramatically different situation to where we have been. GDPR will force some companies to radically re-engineer their businesses to be able to comply.

Now I am not saying that GDPR is the final word, or that GDPR magically makes the world of data safe. My commentator’s use of the phrase “bad actors” is a reminder that passing laws is one thing and enforcing them another. Speed limits on roads are a safety measure intended to limit the number of people killed and maimed on the roads (24,101 seriously injured or killed in 2016 in the UK makes the 1861 railway look very safe) but speed limits are regularly flouted and measures to enforce them (like safety cameras) are resented by many people.

All the same, comprehensive legislation setting out the minimum standards is welcome and a good base from which business can work to create a stronger and more consistent base. The aim needs to be safety by design…like the railway.

Thinking about solutions

A couple of thoughts about how we can begin to think about the solutions to the challenge.

Data is a collective noun and by this we mean two things:

  1. In the Transport context, as a general observation, the more data the more useful. It may be interesting to know the route, travel time and speed at any given moment of an individual vehicle, but if I am an operator or an authority, the data only generates actionable information when I can look at the routes and speeds of all the vehicles in motion at any given moment. If I can add to that the travel plans for vehicles not yet in motion I may be able to make very valuable interventions. To put it another way, data tends to gain value when it is correlated with other data.
  2. Different datasets have different sources and uses. Think of a Connected car. The stream of data from engine or transmission components may be very valuable to the manufacturer of the vehicle. It may also have value to the provider of the fuel. The engine manufacturer many not see value in the information about which station the radio is tuned to, or the proximity of the vehicle to the nearest branch of the coffee chain whose loyalty card the driver is holding – but the coffee shop might. The route I am taking might be very helpful to the local authority, and perhaps my insurance company, but may not be interesting to the manufacturer of the tyres on the car.

As we think about how we engineer our business models to commercialise the data, and how we engineer our systems to respect the privacy and ensure the security of the datasets, it is vital to keep in mind these two thoughts.

The security of the in-vehicles systems is very important: I don’t want just anyone to have the ability to interact with the braking systems in my car, for instance. By contrast there is a limited privacy concern about how close my car is to its next service but I might not want my wife knowing I am driving to our favourite restaurant on my own to discuss a birthday surprise for her – even though this information is not really a security threat to me if it does get disclosed.

Different datasets have different risk factors and have different values in different situations. The reason this gets complicated is that without the sort of systematic safeguards being introduced by GDPR, it is not easy to know where the data is going, once it is collected, and to predict the ways in which correlated data can produce new insights.

This is a hugely important topic and I am very glad that my commentator raised it. I am optimistic that it is solvable. In fact, the TSC is here to help the UK companies who ARE solving it to be successful because that is good for the UK economy and for jobs, but also because it can help us to benefit from better transport systems, sooner.

You can keep up to date with out latest insights and news by following our LinkedIn page

Cookies on Catapult explained

To comply with EU directives we now provide detailed information about the cookies we use. To find out more about cookies on this site, what they do and how to remove them, see our information about cookies. Click OK to continue using this site.