Twenty-Four Numbers

A Worked Example in Deanonymisation

Twenty-Four Numbers banner
"They removed your name. They feel safe now. You shouldn't."

The Wrong Question

Most conversations about anonymisation get stuck on the same loop:

This is column thinking. It treats a dataset like a spreadsheet where privacy is a property of individual cells. Delete the dangerous cells, ship the rest, sleep well.

The problem is that identity isn't a column. It's an emergent property of combinations - the shape your behaviour leaves across multiple data points, multiple sources, and a little bit of public record.

A single hashed user ID tells you nothing. A hashed user ID with a postcode, a browser fingerprint, and a timestamp tells you a lot. Stitch in a second dataset and the hash becomes a name.

This post is a worked example of that idea. No fancy maths. No reidentification papers. Just a 24x2 matrix and a few public sources anyone can use from a laptop.

Exhibit A: A Matrix

Here are 24 rows of 2 values. They came out of a "fully anonymised" dataset. No name, no device ID, no account. Just numbers.

53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4476, -2.2391
53.4612, -2.2418
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4836, -2.2438
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4838, -2.2447
53.4815, -2.2410
53.4831, -2.2365
53.4612, -2.2418
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392

Look at it for a moment. A reasonable person would call this "anonymous data". There's no identifier. Two values per row. It could be anyone. It could be no one.

It isn't.

Exhibit B: The Reveal

The rows are ordered. Row N is hour N of a single 24-hour day. The two values are latitude and longitude.

Now re-read it.

HourLat, LonWhat it tells us
00:00 – 06:0053.4475, -2.2392Stationary for seven hours overnight. This is where they sleep.
07:0053.4475, -2.2392Still home. Probably waking up.
08:0053.4476, -2.2391Tiny drift - phone moved within the building. Getting ready.
09:0053.4612, -2.2418In transit. On a recognisable corridor between A and B.
10:00 – 17:0053.4839, -2.2446 (mostly)Stationary for eight hours during typical working hours. This is where they work.
13:0053.4836, -2.2438Small lunch-time hop, 50–80 metres away. A café or canteen.
18:0053.4815, -2.2410Leaving the work cluster.
19:0053.4831, -2.2365Stop on the way home. A pub, gym, or shop.
20:0053.4612, -2.2418Back on the commute corridor.
21:00 – 23:0053.4475, -2.2392Home again.

We haven't been told a single "PII" field. We have, however, been told two things that - combined - are devastating: where this person sleeps and where this person works.

Exhibit C: Turning Coordinates Into a Person

Nothing below requires a subpoena, a leak, or a vendor account. Everything is open data.

Step 1 - Geocode the home pin

Drop 53.4475, -2.2392 into any maps tool. You get a street, a small cluster of buildings, and - in most of the UK - a postcode that resolves to fewer than fifty addresses. In a low-density area, fewer than five. In a suburban semi, exactly one.

We haven't named anyone yet. We have a doormat.

Step 2 - Geocode the work pin

53.4839, -2.2446 resolves to a specific block. Maps tells us the building. The 13:00 lunch drift tells us it's not a single-tenant tower (otherwise the user would just stay put). It's a multi-occupancy building with food nearby - a normal city office.

Fire up Companies House and search by registered office address. You will typically get a list of every company that calls that building home. Often dozens. Sometimes one or two.

For each company you now have:

If the company is small, you may be done already. The director's home postcode is on the filing - and it might match Step 1.

Step 3 - Filter by working hours

The pin sits in that building from 10:00 to 17:00, with a small lunch hop. That eliminates night-shift operations, hospitality businesses, and anything that wouldn't have a desk worker there during those exact hours.

What's left looks like a professional services firm, a tech company, an agency. The behavioural fingerprint narrows the candidate set further.

Step 4 - Cross-reference with LinkedIn

Search LinkedIn for people whose current employer matches one of the candidate companies and whose location is within commuting distance of the home pin. For most small-to-mid companies, this returns a list you can read in an afternoon.

Step 5 - Reverse the home pin against open registers

The morning and evening pin gives you a postcode. Plug that postcode into:

Intersect the LinkedIn shortlist with the names that appear at that postcode. The intersection is usually one person. Sometimes two - a household.

Step 6 - Confirm with the 19:00 stop

The evening detour at 53.4831, -2.2365 is a behavioural tell. Is it a gym? A specific pub? A regular Pilates studio? Many of those venues have public member lists, social media tags, or check-in patterns that surface the same name a third time.

Three independent paths to the same name is no longer a coincidence. It's an identification.

What Just Happened

We started with a dataset that - by every conventional definition - was anonymous:

We ended with a name, an employer, a home address, and a likely evening haunt.

The data didn't betray the subject. The combination did. Two columns of numbers became identity the moment we knew what they meant and could pair them with public registers.

This is the mistake column thinking makes. It assumes privacy lives in the cells you can see. It doesn't. Privacy lives in the joinability of your data with everything else in the world.

Properties That Look Innocent and Aren't

How to Actually Think About Anonymisation

  1. Threat-model the join, not the column. Ask: what public or purchasable dataset could be merged with this? If you can think of one, your release isn't anonymous.
  2. Aggregate before you publish, not after. Pre-compute group statistics. Don't ship the rows and trust analysts to behave.
  3. Bound the precision. Coordinates to three decimal places, timestamps to the day, ages to a band. Precision is a privacy budget - spend it deliberately.
  4. Suppress small cells. If a row, a group, or an intersection contains fewer than k people, it identifies them. Drop it or merge it.
  5. Use formal tools when stakes are real. k-anonymity, l-diversity, t-closeness, and - better - differential privacy.
  6. Assume adversaries are patient and have Companies House open in another tab. Because they do.

Closing

"PII removal" is a comforting phrase. It implies a clean operation: identify the bad fields, delete them, declare the dataset safe. The framing is wrong. Personally identifying information is not a set of fields; it's a property that emerges when your data meets the rest of the world's data.

Twenty-four pairs of numbers told us where someone lives, where they work, what kind of job they probably have, and gave us a credible shortlist of who they are - without ever naming them.

The next time someone tells you a dataset is anonymous, ask them what it joins to.