De-identification isn’t anonymization. How do we truly anonymize location data while maximizing data value for development of (location-based) services?
In Part I of this article, we identified some of the many ways that location data privacy can be breached by linking relatively simple insights and publicly available information. In my eyes, it’s a very positive thing that those breaches were made public, because they serve as an excellent reminder of why companies like HERE must prioritize customer privacy.
The nuance between Privacy and Security
If you consider the examples of privacy breaches in Part I, you will notice that all of the data-reconstructions that exposed private information were carried out using publicly available data. At no point was there a so-called security breach, because at no point did anyone use an ill-gotten key or password to access information that they never should have had.
Privacy issues arise when you openly provide information to data consumers who can then choose between using information with a positive or a negative intent. In the first case, developers or researchers can use rich open data to build smarter data-driven solutions. In the second case, they can reveal information never intended to be shared by adding outside information to augment their insights into that data. We cannot distinguish those two in advance, as they are the same data consumers with different intentions.
Any company that provides consumer data to outside parties, including HERE, is capable of inadvertently providing information which may provide identifiability of the people who the data relates to. Some companies have the advantage that the data is confined internally to improve their own services. But for companies like HERE, in the business of providing open data that will power solutions we have yet to imagine, we must adopt a thoughtful approach on prior considerations before disclosing information.
We need to understand what the data is to be used for
If you publish or disclose data with an over-reaching approach to anonymization and privacy, that limited data is less likely to be useful for gaining insights for smart data-driven services. If you publish or disclose extensive rich data considering solely its value for creating amazing new services, there is a much bigger chance it will reveal more than is intended, even if unintentionally.
There is no single, perfect solution, or we are still to find one if such even exists. Our approach to answering this challenge is to first carefully consider how the data is expected to be used, and whether the data is potentially too revealing for those intended uses. Use case specification also helps us understand the opportunities for data anonymization while maintaining high quality of data-driven services.
Take the example of estimating traffic. If there is no congestion, we don’t need redundant speed updates from vehicles. Similarly, if there is a slow-down, we don’t need all vehicles stuck in traffic to report the same situation. In fact, we don’t need to publish any information about individual cars at all. Instead, we can simply report when a threshold for traffic jam is reached, and the number of cars above that threshold.
By tailoring our data for the intended use case, we specifically limit the processed information only to that data which is informative for our services, yet unrevealing.
Our commitment to protecting privacy
No one can claim a perfect location data privacy solution. However, we can hold ourselves accountable by quantifying privacy risk on one side and data value on the other side. Only then we can tune our anonymization solutions to realize a win-win situation where privacy is protected while maximizing data value for our services.
Therefore, within our group we are developing variety of privacy-as-a-service solutions from anonymization tookit to black-box vulnerability testing with variety of reconstruction attacks on private information. It isn’t sufficient to prove our methods on paper, thus, we make sure our approach keeps privacy secure in very real-world scenarios.
We believe services of the future will come from bringing together many different parties to collaboratively share and learn from all the diverse data sources in the world. By continuing to advance our techniques and capabilities in providing valuable data, we’re creating a trusted data resource for all our partners, present and future.