Large-Scale Online Deanonymization with LLMs

Posted by DalasNoin 1 day ago

Large-Scale Online Deanonymization with LLMs(simonlermen.substack.com)

Pdf: https://arxiv.org/pdf/2602.16800 (via https://arxiv.org/abs/2602.16800)

193 points | 164 commentspage 4

comrh 3 hours ago|

we need the scramble suits from a scanner darkly but for your online text

bitwize 8 hours ago||

Somebody I know irl has figured out I'm me here on Hackernews, based on the fact that my writing style here matches my verbal style. Fingerprinting people based on their words is one of the things I actually expect LLMs to be really absurdly good at.

georgeburdell 10 hours ago||

Good thing I always lie on the internet

greesil 10 hours ago||

But do you lie with the same writing style?

yu3zhou4 10 hours ago||

Liar paradox

zikduruqe 9 hours ago||

Everything I type is a lie.

zoklet-enjoyer 9 hours ago||

I used to make new accounts every few months but got lazy. Time to start doing that again.

GorbachevyChase 9 hours ago|

You may want to also do a little stylistic obfuscation. ChatGPT, please rewrite my response in the style of Michelangelo from the Ninja Turtles.

zoklet-enjoyer 2 hours ago||

Also don't make usernames that reference old message boards or any of my interests. Maybe sprinkle in some mentions of fake hobbies and jobs and places I've lived too.

casey2 9 hours ago||

The obvious retort is to just use an AI to rewrite everything you post, but this will open other attack vectors.

Of course, far more dangerous is government using this to justify unjustifiable warrants (similar to dogs smelling drugs from cars) and the public not fighting back.

DalasNoin 9 hours ago|

We essentially don't use stylometry but semantic information revealed from peoples' comments – clues and interests.

(We use a little stylometry in a single experiment in section 5)

Zigurd 10 hours ago||

What this tells me is that major social media sites, some of which claim to be developing frontier models, have no excuse for a bots waging influence campaigns on their sites.

DalasNoin 10 hours ago|

We do advocate for stricter controls on data access on social platforms because of this. There is a bit of an unfortunate trade-off, but I think allowing mass-scraping or downloads of data from social sites can be misused in increasingly more ways.

reducesuffering 10 hours ago||

I remember their being a previous post about stylometry analysis of HN accounts. And people confirmed the top account correlations. It basically identified all the HN alt accounts

jacquesm 6 hours ago|

And HN asked the author to take it down if I'm not mistaken.

ranger_danger 10 hours ago||

IMO This is just taking advantage of OPSEC failures. Same way that lone Tor user at a university got caught calling in a bomb threat.

aplomb1026 9 hours ago||

[dead]

DalasNoin 9 hours ago||

We use semantic information inferred from comments and submissions. I think using stylometry would be a great addition, but it would be hard to google for "guy who writes fanciful using many puns" rather then "indie developer in Switzerland". I think stylometry could be better used for verification, once you have a small set of candidates stylometry could further narrow down the candidates and be used to make a decision.

switchbak 9 hours ago||

Time to scrub those naughty Glassdoor rants!

newzino 6 hours ago|

[dead]