homehome Home chatchat Notifications


Anonymizing smartphone data is no longer enough -- users can be identified with just a few details

There are solutions to anonymize data, but they need to be carefully implemented.

Mihai Andrei
January 27, 2022 @ 1:12 am

share Share

Vast amounts of data from users are available to smartphone companies. Companies ensure us that this data is anonymized — devoid of personal indicators that could pinpoint individual users. But these insurances are hollow, a new study claims: a skilled attacker can identify individuals in anonymous datasets.

Image credits: Olia Nayda.

When the pandemic started and lockdowns were enforced, the world seemed to grind to a halt. You could see that easily just by looking around, but the data also confirmed it. For instance, mobility trends published by the likes of Apple and Google showed that a significant part of the population had stopped commuting to work, and people were increasingly using more cars and less public transit.

At first, users were understandably spooked by the data. Do tech companies know where I go and what I do? That’s not how it goes, the companies assured us. The data is anonymized — they know a user went somewhere and did something, but they don’t know who that user is. Other apps also scoop vast quantities of data from your smartphone, either for ad targeting or for other purposes, though in many cases, they are still legally mandated to make the data anonymized, removing all identifiable bits like names and phone numbers.

But that’s no longer enough. With just a few details (like for instance, how they communicate with an app like WhatsApp), researchers were able to identify many users from anonymized data. Yves-Alexandre de Montjoye, associate professor at Imperial College London and one of the study authors, told AFP it’s time to “reinvent what anonymisation means”.

What is anonymous?

The researchers started by looking at anonymized data from around 40,000 smartphone users, mostly gathered from messaging apps. They then “attacked” the data — mimicking a process a malicious actor would do. Essentially, this involved searching for patterns in the data to see whether it could be figured out who individual users are.

With only the direct contacts included in the dataset, they were able to pinpoint individual users 15% of the time. When, in addition, further interactions between those primary contacts were included, they were able to identify 52% of the users.

This doesn’t mean that we should give up on anonymization, the researchers explain. However, we should strengthen what this anonymization means, making sure that the data is indeed anonymous.

“Our results provide evidence that disconnected and even re-pseudonymised interaction data remain identifiable even across long periods of time,” the researchers wrote. “These results strongly suggest that current practices may not satisfy the anonymisation standard set forth by (European regulators) in particular with regard to the linkability criteria.”

“Our results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together,” the researchers conclude.

Researchers suggest restricting large datasets to simple questions-and-answers systems or using differential privacy systems that add arbitrary substitutions that ensure data privacy,

The study was published in Nature Communications.

share Share

AI thought X-rays are connected to eating refried beans or drinking beer

Instead of finding true medical insights, these algorithms sometimes rely on irrelevant factors — leading to misleading results.

AI is scheming to stay online — and then lying to humans

An alarming third party report almost looks like a prequel to Terminator.

The David Mayer case: ChatGPT refuses to say some names. We have an idea why

Who are David Mayer and Brian Hood?

How CCTV Cameras and AI Can Prevent Floods in Cities

Researchers have developed an AI system using CCTV cameras to monitor culverts, potentially reducing urban flooding by detecting blockages in real-time.

Elon Musk’s social media posts have had a ‘sudden boost’ since July, new research reveals

Is the former Twitter platform now just used as a megaphone?

The world's first wooden satellite was launched into space

The satellite is made from magnolia wood, which was historically used for samurai sheaths.

Fast fashion company replaces models with AI and brags about it

The clothes they are "wearing" are real. But everything else is very, very fake.

AI could diagnose heart disease in dogs before it's too late

Heart murmurs often go undiagnosed in dogs. This new tool could help.

Researchers encode data in DNA hundreds of times faster than before — with panda pics

Two images were stored in and retrieved from DNA sequences faster than ever before. This could be a game-changer for our data storage.

The unlikely story of how a pastry AI came to be used to detect cancer

The journey of this particular AI was as unexpected as it gets.