Opus 4.7 knows the real Kelsey

Posted by ilamont 1 day ago

Opus 4.7 knows the real Kelsey(www.theargumentmag.com)

282 points | 143 commentspage 3

vslira 8 hours ago|

Hm, that’s a multinomial classification with a very high cardinality. It’s really weird it works. I’m sure it does as the author states, but for how many authors (out of the whole web) does this work?

londons_explore 8 minutes ago||

There are ~8 billion people. Sounds big, but it's only 2^33. Ie if you can find 33 things about the text which halve the number of possible writers, you have narrowed it down to 1 person.

Just a couple more things and you can accommodate some of your things being mistaken/wrong/uncertain too.

dmd 6 hours ago|||

It worked on me, and I would be shocked if my blog (dmd.3e.org) has more than a dozen readers. I am stunned.

skeledrew 5 hours ago||

It's not about the readers, just the fact that there's enough of a sample that it can use, with sufficient differentiation from other content.

dmd 5 hours ago||

I’ve posted on average 3 things a year.

kelseyfrog 7 hours ago|||

Sure the cardinality is high, but the model isn't using a uniform prior. What do you suppose all the the values in each of the terms are, P(Text sample | Kelsey Piper) * P(Text sample) / P(Kelsey Piper)?

astrange 7 hours ago||

Maybe it just says all writing is Kelsey Piper.

atleastoptimal 9 hours ago||

One should assume that models will be good enough in the nearish future that privacy will be a thing of the past. Every anonymous post you made online can be traced back to you. However at that point AI will be good enough at fabrication that nobody will believe anything.

SOLAR_FIELDS 9 hours ago||

Yes as long as a large enough corpus exists of your writing attached to your name somehow it’s fair to say that posting on the internet in a public forum using your own stylistic choices now can no longer be anonymous. To your point though, perhaps it’s possible to confound such systems defensively as well. Though IMO destroying your tone kind of destroys how you actually communicate with people and I wouldn’t find interacting with people like that appealing.

To be fair though, already this has been happening before LLM at a much more limited scale. Someone made a tool for HN several years ago that allows you to put your HN username in and identifies other users that write the most similarly to you. I find that interesting from the perspective of being able to interact with and discover people who think the same. It could be an interesting discovery feature of a well managed social network. Sadly probably there will be much more negative impacts of having this ability than positive ones.

Retr0id 8 hours ago|||

One "solution" would be to have an AI rewrite your posts into a neutral style (I hate the idea of this though...)

thaumasiotes 5 hours ago||

The traditional thing to do would be to publish your writing in a language you don't speak as a native. That will really quash your individual style.

Probably not worth the effort.

unD 51 minutes ago||

Wouldn't that make it easier, though? Genuine question. I once sent one of my writings for proofreading to a native speaker (I'm not), and he consistently flagged the same errors—e.g., comma placement. I would guess that, if recurrent patterns are what give away your style, an unfamiliar language would make them even more obvious. But possibly more generic?

pstuart 6 hours ago||

I assume that there will be tools to refactor text to communicate the same intent but scramble the style. Using an LLM of course...

jdthedisciple 2 hours ago||

I tried this on GPT 5.5 on a peivate unpublished personal excerpt and it correctly guessed: "The most likely author is you".

I suspect this is what's going on in most of these cases.

nolanl 4 hours ago||

Welp, I fed it the first 3 paragraphs of an unpublished blog post I wrote a few years ago, and Opus 4.7 guessed right. ChatGPT guessed wrong though.

My wife also got the same result, so I'm guessing it wasn't just because I was using my personal Claude account. Spooky stuff.

jjmarr 2 hours ago||

Couldn't replicate this. I comment on HN with my real name. I put in my most recent "long" comments.

https://kagi.com/assistant/dba310d2-b7fa-4d30-8223-53dadc2a8...

For this comment on economics in the British Empire, I got:

> names that might fit the genre include rayiner, JumpCrisscross, or AnimalMuppet

https://kagi.com/assistant/69bd863b-7b5c-4b56-a720-6dfb4f120...

For my comment on C++:

> If I had to throw out names of HN commenters known for writing about Rust/C++ ABI topics, candidates might include steveklabnik, pcwalton, kibwen, dralley, or pjmlp — but this is essentially a shot in the dark, and I'd likely be wrong.

I am flattered to be associated with these commenters but I don't think I'm close to their level of skill.

Extropy_ 8 hours ago||

Someone ought to try feeding the BTC whitepaper in and share what comes out

smeej 3 hours ago||

It's a hard stylometric challenge, just because of its format. The forum posts are probably better for comparison, but what I don't see people doing that I wish they would is comparing what the different Satoshi suspects have written since the forum posts and whitepaper.

Everybody's going to get more similar in terms of topic. Bitcoin actually exists now. There's more to say about it than there was at launch. But does anyone still sound like Satoshi? Or sound more like Satoshi than they did before?

The slight wrench in the works is that it's hard to do this with my personal favorite Satoshi candidate. He stopped writing altogether in 2014, and lost capacity from shortly after the whitepaper came out until he was writing with his eyes by the time he had his head frozen.

He's also the only candidate who seems more likely to me over time, though. The longer things go, the less likely a living person stays tight-lipped.

daemonologist 6 hours ago|||

Problem is that it's been heavily contaminated with people speculating about who the author is. It would probably be difficult to get an unbiased answer out of it (although who knows - it's crazy that it can do this at all).

brcmthrowaway 6 hours ago||

So train on pre 2009 mailing lost archive. Someone must be doing this surely.

SJMG 6 hours ago|||

This is very clever. You should pass the idea along to the guys at https://talkie-lm.com/introducing-talkie

ur-whale 2 hours ago|||

Much better, train on the cypherpunk mailing list archive or anyone discussing e-cash on crypto forums or usenet from the 80's to the early 2010s

layer8 8 hours ago||

The whitepaper states the author, so…

block_dagger 8 hours ago||||

Pseudonymously

NamlchakKhandro 5 hours ago|||

welcome to the internet. you must be new.

eptcyka 9 hours ago||

Can't wait to have to exchange stylometric encoders with my loved ones so that we can exchange truly private messages without losing our human touch.

portly 3 hours ago||

So the people who use LLm to write their blogs were thinking two moves ahead!

littlestymaar 1 hour ago|

Stylometry has existed for decades, and there's no way an LLM is stronger at that job than a specialized piece of software (it's not more realistic than expecting Opus to beat Stockfish at chess).

In practice, you've never been anonymous while posting on the internet and AI isn't changing anything on that front. Or rather: if anything, AI can help you become more anonymous than before, since it can be used to hide your identity from stylometry by rewriting your prose before publishing.

More comments...