Twenty-Four Numbers
A Worked Example in Deanonymisation
"They removed your name. They feel safe now. You shouldn't."
The Wrong Question
Most conversations about anonymisation get stuck on the same loop:
- "We stripped the names."
- "We hashed the email addresses."
- "We replaced the user IDs with UUIDs."
- "We removed the date of birth."
This is column thinking. It treats a dataset like a spreadsheet where privacy is a property of individual cells. Delete the dangerous cells, ship the rest, sleep well.
The problem is that identity isn't a column. It's an emergent property of combinations - the shape your behaviour leaves across multiple data points, multiple sources, and a little bit of public record.
A single hashed user ID tells you nothing. A hashed user ID with a postcode, a browser fingerprint, and a timestamp tells you a lot. Stitch in a second dataset and the hash becomes a name.
This post is a worked example of that idea. No fancy maths. No reidentification papers. Just a 24x2 matrix and a few public sources anyone can use from a laptop.
Exhibit A: A Matrix
Here are 24 rows of 2 values. They came out of a "fully anonymised" dataset. No name, no device ID, no account. Just numbers.
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
53.4476, -2.2391
53.4612, -2.2418
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4836, -2.2438
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4839, -2.2446
53.4838, -2.2447
53.4815, -2.2410
53.4831, -2.2365
53.4612, -2.2418
53.4475, -2.2392
53.4475, -2.2392
53.4475, -2.2392
Look at it for a moment. A reasonable person would call this "anonymous data". There's no identifier. Two values per row. It could be anyone. It could be no one.
It isn't.
Exhibit B: The Reveal
The rows are ordered. Row N is hour N of a single 24-hour day. The two values are latitude and longitude.
Now re-read it.
| Hour | Lat, Lon | What it tells us |
|---|---|---|
| 00:00 – 06:00 | 53.4475, -2.2392 | Stationary for seven hours overnight. This is where they sleep. |
| 07:00 | 53.4475, -2.2392 | Still home. Probably waking up. |
| 08:00 | 53.4476, -2.2391 | Tiny drift - phone moved within the building. Getting ready. |
| 09:00 | 53.4612, -2.2418 | In transit. On a recognisable corridor between A and B. |
| 10:00 – 17:00 | 53.4839, -2.2446 (mostly) | Stationary for eight hours during typical working hours. This is where they work. |
| 13:00 | 53.4836, -2.2438 | Small lunch-time hop, 50–80 metres away. A café or canteen. |
| 18:00 | 53.4815, -2.2410 | Leaving the work cluster. |
| 19:00 | 53.4831, -2.2365 | Stop on the way home. A pub, gym, or shop. |
| 20:00 | 53.4612, -2.2418 | Back on the commute corridor. |
| 21:00 – 23:00 | 53.4475, -2.2392 | Home again. |
We haven't been told a single "PII" field. We have, however, been told two things that - combined - are devastating: where this person sleeps and where this person works.
Exhibit C: Turning Coordinates Into a Person
Nothing below requires a subpoena, a leak, or a vendor account. Everything is open data.
Step 1 - Geocode the home pin
Drop 53.4475, -2.2392 into any maps tool. You get a street, a small cluster of buildings, and - in most of the UK - a postcode that resolves to fewer than fifty addresses. In a low-density area, fewer than five. In a suburban semi, exactly one.
We haven't named anyone yet. We have a doormat.
Step 2 - Geocode the work pin
53.4839, -2.2446 resolves to a specific block. Maps tells us the building. The 13:00 lunch drift tells us it's not a single-tenant tower (otherwise the user would just stay put). It's a multi-occupancy building with food nearby - a normal city office.
Fire up Companies House and search by registered office address. You will typically get a list of every company that calls that building home. Often dozens. Sometimes one or two.
For each company you now have:
- A list of directors (full names, month and year of birth, sometimes a partial address).
- Filing history that hints at company size.
- PSC (people with significant control) records.
If the company is small, you may be done already. The director's home postcode is on the filing - and it might match Step 1.
Step 3 - Filter by working hours
The pin sits in that building from 10:00 to 17:00, with a small lunch hop. That eliminates night-shift operations, hospitality businesses, and anything that wouldn't have a desk worker there during those exact hours.
What's left looks like a professional services firm, a tech company, an agency. The behavioural fingerprint narrows the candidate set further.
Step 4 - Cross-reference with LinkedIn
Search LinkedIn for people whose current employer matches one of the candidate companies and whose location is within commuting distance of the home pin. For most small-to-mid companies, this returns a list you can read in an afternoon.
Step 5 - Reverse the home pin against open registers
The morning and evening pin gives you a postcode. Plug that postcode into:
- The open electoral register (where the subject hasn't opted out).
- 192.com / publicly-listed phone directories.
- Land Registry title information (for owners).
- Companies House director addresses (some still show full addresses pre-2018; service addresses can also coincide).
Intersect the LinkedIn shortlist with the names that appear at that postcode. The intersection is usually one person. Sometimes two - a household.
Step 6 - Confirm with the 19:00 stop
The evening detour at 53.4831, -2.2365 is a behavioural tell. Is it a gym? A specific pub? A regular Pilates studio? Many of those venues have public member lists, social media tags, or check-in patterns that surface the same name a third time.
Three independent paths to the same name is no longer a coincidence. It's an identification.
What Just Happened
We started with a dataset that - by every conventional definition - was anonymous:
- No name.
- No email.
- No phone number.
- No device ID.
- No account ID.
- No date of birth.
- No demographic field.
We ended with a name, an employer, a home address, and a likely evening haunt.
The data didn't betray the subject. The combination did. Two columns of numbers became identity the moment we knew what they meant and could pair them with public registers.
This is the mistake column thinking makes. It assumes privacy lives in the cells you can see. It doesn't. Privacy lives in the joinability of your data with everything else in the world.
Properties That Look Innocent and Aren't
- Coarse location + timestamp. Even rounded to a city block, daily routine collapses anonymity sets fast. The classic result: four spatio-temporal points are enough to uniquely identify 95% of people in a mobility dataset.
- Browser fingerprint fragments. User-Agent + screen size + timezone + language is often unique within a user base.
- Salary band + job title + company size + region. Especially in senior roles.
- Date of first event + date of last event + count. A rough behavioural shape.
- Postcode + age bracket + gender. The original Sweeney result. Still works.
- Step counts and heart-rate baselines. Persistent, distinctive, and increasingly leaked.
- "Hashed" identifiers with a small input space. Hashing a phone number is a rainbow-table exercise, not a privacy control.
How to Actually Think About Anonymisation
- Threat-model the join, not the column. Ask: what public or purchasable dataset could be merged with this? If you can think of one, your release isn't anonymous.
- Aggregate before you publish, not after. Pre-compute group statistics. Don't ship the rows and trust analysts to behave.
- Bound the precision. Coordinates to three decimal places, timestamps to the day, ages to a band. Precision is a privacy budget - spend it deliberately.
- Suppress small cells. If a row, a group, or an intersection contains fewer than k people, it identifies them. Drop it or merge it.
- Use formal tools when stakes are real. k-anonymity, l-diversity, t-closeness, and - better - differential privacy.
- Assume adversaries are patient and have Companies House open in another tab. Because they do.
Closing
"PII removal" is a comforting phrase. It implies a clean operation: identify the bad fields, delete them, declare the dataset safe. The framing is wrong. Personally identifying information is not a set of fields; it's a property that emerges when your data meets the rest of the world's data.
Twenty-four pairs of numbers told us where someone lives, where they work, what kind of job they probably have, and gave us a credible shortlist of who they are - without ever naming them.
The next time someone tells you a dataset is anonymous, ask them what it joins to.