Reproducing Hacker News writing style fingerprinting

Posted by grep_it 3 days ago

Reproducing Hacker News writing style fingerprinting(antirez.com)

325 points | 154 commentspage 3

SnorkelTan 2 days ago|

I remember the original post the author is referring to. I was captivated by it and thought it was cool. When I ran the original mentioned in the post, it detected my one of my alt's that I forgot about. OP's newer implementation using different methodologies did not detect the alt. For reference, the alt was created in 2010 and the last post was in 2012. Perhaps my writing style has changed?

SchemaLoad 2 days ago|

I usually just create a new account every time I get a new computer or reinstall OS. I thought most of the results here were noise, but after closer inspection it just found 10 accounts I forgot having. Actually incredible and a little scary how well it works.

giancarlostoro 3 days ago||

I tried my name, and I don't think a single "match" is any of my (very rarely used) throw away alts ;) I guess I have a few people I talk like?

delichon 3 days ago||

I got 3 correct matches out of 20, and I've had about 6 accounts total (using one at a time), with at least a fair number of comments in each. I guess that means that my word choices are more outliers than yours or there is just more to match. So it's not really good enough to reliably identify alt accounts, but it is quite suggestive.

giancarlostoro 3 days ago||

I think if you rule out insanely common words, it might get scary accurate.

lolinder 3 days ago||

Actually, the way that these things work is usually by focusing exclusively on the usage patterns of very common (top 500) words. You get better results by ignoring content words in favor of the linking words.

giancarlostoro 3 days ago||

Interesting, I think it also doesn't help that outside of a throw away on a blue moon, I don't really use alts...

antirez 3 days ago|||

When they are rarely used (a small amount of total words produced), they don't have meaningful statistical info for a match, unfortunately. A few users here reported finding actual duplicated accounts they used in the past.

nozzlegear 2 days ago||

I've had several accounts over the last decade, but this wasn't able to find any of the old ones, even after expanding the results to 50 users. I personally chalk it up to my own writing style changing (intentionally and unintentionally) over the years.

Boogie_Man 3 days ago||

No matches higher than .7something and no mutual matches let's go boys I'm a special unique snowflake

atiedebee 3 days ago||

It looks like I don't use the word "and" very often. I do notice that I tend to avoid concatenating sentences like that, lthough it is likely that there just isn't enough data on my account as I haven't been on HN for that long.

morkalork 3 days ago||

I wonder if such an analysis could tease apart the authors of intentionally anonymous publications. Things like peer review notes for papers or legal opinions (afaik in countries that are not the USA, the authors of a dissenting supreme court decision are not named).

0xWTF 2 days ago||

There are some interesting similarities in o.g. accounts aaronsw, pg, and jedberg.

  - aaronsw and jedberg share danielweber
  - aronsw and jedberg share wccrawford
  - aaronsw and pg share Natsu
  - aaronsw and pg share mcphage

byearthithatius 3 days ago||

This is so cool. The user who talks most like me, and I can confirm he does, is ajb257

GenshoTikamura 2 days ago||

Such a nice scientific way to detect and mute those who go against the agenda's grain, oh I mean don't contribute anything meaningful to the community

nottorp 3 days ago||

Interesting, the top 3 similar accounts to me are two USers and an Australian. I'm Romanian (and living in Romania). I probably read too many books and news in English :)

Well, and worked a lot with americans over text based communication...

jmward01 3 days ago|

I think an interesting use of this is potentially finding LLMs trained to have the style of a person. Unfortunately now, just because a post has my style it doesn't mean it was me. I promise I am not a bot. Honest.

More comments...