For years, online anonymity worked on a simple assumption: use a different name on Reddit than on LinkedIn, keep your professional life separate from your forum posts, and you are protected. A new research paper from ETH Zurich has dismantled that assumption.
The paper, Large-Scale Online Deanonymization with LLMs, shows that AI agents can now systematically identify anonymous internet users by cross-referencing their writing across platforms. What once required a private investigator working for days can be done by an AI agent in minutes, at almost zero cost.
What the Researchers Found
The ETH Zurich team, led by doctoral researcher Daniel Paleka, ran two benchmark experiments to measure how well LLM agents could pierce online anonymity.
Cross-Platform Matching
Given an anonymised Hacker News account, could the system find the matching LinkedIn profile? The pipeline combined embedding-based search to shortlist candidates, followed by LLM reasoning to confirm identity. The results were precise enough to be alarming.
Split-Account Matching
A Reddit user's post history was split into two halves, either by time period or by community. The question was whether an LLM could recognise both halves as the same person. LLM-based approaches significantly outperformed traditional methods that look only at activity patterns.
Real-World Test
Anthropic provided the team with anonymised transcripts of interviews with 125 scientists. Using only those transcripts and a web search tool, the LLM agent successfully identified 9 of the 125 individuals. No hacking. No private data. Just reading what people had written and connecting it to publicly available information.
"If your operational security requires that no one ever spend hours or days investigating who you are, this security model is now broken."
Daniel Paleka, ETH Zurich
Why This Is Different From Before
Deanonymisation attacks are not new. Researchers have matched writing styles and correlated metadata for over a decade. What has changed is the barrier to entry.
Previously, only well-resourced actors, such as intelligence agencies or large corporations, could execute these attacks. They required specialist expertise, custom tooling, and hours of manual effort.
LLMs change all of that:
- No technical expertise required to operate
- Works directly on unstructured text across platforms
- Reasons the way a human investigator would, but without fatigue or cost constraints
- Scales gracefully: as the candidate pool grows from hundreds to tens of thousands, accuracy degrades slowly rather than sharply
The conclusion from the paper is direct: "LLMs fundamentally change the picture, enabling fully automated deanonymisation attacks that operate on unstructured text at scale."
The Fingerprint You Do Not Know You Are Leaving
No single detail identifies you. The attack works through aggregation.
Your timezone in post timestamps. A mention of a local neighbourhood. A comment about your field of work. A reference to a recent life event. None of these is identifying on its own. Taken together across Reddit, Hacker News, X, and LinkedIn, they create a fingerprint that an LLM can recognise and match.
EFF technologist Jacob Hoffman-Andrews noted that users routinely underestimate how correlatable their digital footprints are. The LLM does not need you to make an obvious mistake. It needs you to be human, to write consistently, to reference your world, to be yourself.
What This Means for India
India's internet population is among the largest in the world, and Hyderabad sits at the centre of the country's technology sector. The implications here are worth understanding directly.
Consider who relies on pseudonymous online identity in India:
- Journalists and activists covering politically sensitive subjects, who use separate personas to report without exposure
- Whistleblowers in the corporate and public sectors, who surface wrongdoing through anonymous channels
- Tech workers discussing workplace conditions, salary negotiations, or industry practices on platforms like Reddit or Blind
- Startup founders and employees sharing candid views on investors or competitors under handles they believe are safely separate from their professional identities
The attack described in this paper does not require state-level resources. A motivated employer, a competitor, or a bad-faith journalist could run the same pipeline today. The cost and expertise barriers that historically made targeted deanonymisation rare are now gone.
What Can Be Done
The paper identifies mitigations at three levels.
For Platforms
Restrict API access, enforce aggressive rate limits, detect and block scraping, and limit bulk data exports. The attack depends on pulling large volumes of post history cheaply. Raising that cost limits the attack's scale.
For LLM Providers
Monitor usage patterns to detect deanonymisation-style queries, and deploy refusal guardrails when requests clearly aim to identify real individuals from anonymous text. This is an area where safety guardrails become a meaningful privacy protection.
For Users
The most honest mitigation is also the hardest: behavioural change. Compartmentalising writing style across platforms, avoiding specific details about location and occupation, and minimising cross-platform activity all raise the cost of an attack. But most people will not sustain those habits, and the paper does not pretend otherwise.
The Broader Question
The internet was built on layered anonymity: true identity for commerce, pseudonymous identity for community participation, anonymous identity for sensitive speech. That model has eroded steadily, but it has not disappeared.
This research is an empirical demonstration that the erosion has gone further than most people realise. The question it raises is not only technical. It is a policy question about what kind of internet we want, and what protections, legal, technical, and social, we are willing to build to preserve the conditions for anonymous speech.
For Hyderabad's technology community, which is deeply embedded in global platforms and building the AI systems that enable these attacks, the paper is a direct challenge: the tools being built here have privacy implications that extend far beyond individual products. The researchers are calling for the field to engage with those implications seriously.
The paper "Large-Scale Online Deanonymization with LLMs" is available on arXiv. The research was conducted at ETH Zurich.