Top
Best
New

Posted by turtlesoup 6 days ago

Show HN: Are You in the Weights?(www.intheweights.com)
With more traffic moving off-web and into LLMs, I got curious about what traces we leave "in the weights". My design partner and I built a site in the past few weeks that checks recognition across frontier and small models. It queries many of them in parallel, clusters the responses, and tells you how strongly they recognize you. Happy to answer any questions here!
470 points | 247 commentspage 2
comrade1234 6 days ago|
Apparently I'm an American volcanologist. Pretty cool.

(I nuke my online accounts regularly to not be tracked - started because I had a stalker but now it's just for the best. I know that this goes against hn rules but yeah it's a bad rule)

bananamogul 6 days ago||
I have an unusual name, and have published a book with some minor fame (which is the first google result for my name). Querying ChatGPT, Claude, Grok, etc. gives a reasonably accurate summary of my public info.

OTOH, this tool describes me as a "security researcher known for talks and writing on JavaScript, Node.js, and web security."

I am not a security researcher and have never given any such talk and know precious little about Node.js or web security.

bostik 6 days ago||
Hah. My chosen name collision with my online handle makes the models consistent. They all are certain that I am an adhesives manufacturer. (Good!)

On the other hand, the tool did make an assessment of sorts: NO STABLE PERSON FOUND.

thenickdude 6 days ago||
I have a beanie with your name on it somewhere! It was free swag from the adhesives company.
NooneAtAll3 6 days ago||
at least ai knows not to mix you up with the horses
embedding-shape 6 days ago||
What exactly is the "N strength · Top N%" referring to? My name is most likely 100% unique in the world, seems I'm in about 50% of the weights, but I'm really not sure I understand what those yellow numbers mean.

A completely made up name got "110 strength · Top 60%" and "hits" in GPT-5.5 and "Gemini 3.1 Lite", not sure what to make of that either.

turtlesoup 6 days ago|
This is directional; models self-report confidence on their answers and the strength is a linear combination of the confidence plus a bonus for every model that got clustered in.

Models are notoriously uncalibrated especially for self-reporting confidence so I would treat it lightly. Hopefully I can study this a bit later on!

zingar 6 days ago||
Bahaha apparently only in their hallucinations. I’m not a professional rugby player or a neurologist.
brianwawok 6 days ago||
If there is someone else with the same name, I’m not sure that is a hallucination? But if there isn’t then yes.
zingar 5 days ago||
The rugby player, absolutely not, I know the sport well. But interestingly this fictional player did play for my local team.

The neurologist I could believe exists somewhere in the world with the same last name.

cshimmin 6 days ago|||
Interesting, I wonder if the rugby thing is a common bias. I did find myself in the weights, as the top result. But apparently there are also Australian rugby versions of me!
sieste 6 days ago||
German football goalkeeper here :)
turtlesoup 6 days ago|||
We need a name for these pure hallucinations, something like lucies or looseys

Usually the hallucinations have some logic to them like a person with a similar spelling in some of the training sets. LLMs are mysterious!

zamadatix 6 days ago||
GP didn't give enough information to know if this was actually a hallucination or not, let alone what type of hallucination. I.e. it's only a hallucination if no rugby player or surgeon is a John Doe, not if John Doe the GP isn't those things.

I wonder how much of hallucinating/"mistakes" in LLMs is because the training data is full of us filling in additional info we humans commonly feel or interpret as implied rather than something which manifests from the architecture of the LLM itself. I assume only a small percentage, but also a non-zero one.

quickthrowman 6 days ago||
Strange, there’s a neurosurgeon and Australian Rules Football player that share my uncommon name. I already knew about them from googling myself previously. Eerily similar!
epihelix 6 days ago||
Is there any reason to assume it wouldn't be? A lot of training data comes from the open web, after all, and Google also searches Google books, so a Google search is basically a model training data search.

The only interesting thing is how small the models have to be, to lose knowledge of you.

rorylawless 6 days ago||
This was listed as a hallucination but is the most accurate for my name: “A NAME THAT MAY REFER TO AN INDIVIDUAL, BUT I CAN’T IDENTIFY A SINGLE WELL-KNOWN PERSON WITH CERTAINTY FROM THE QUERY ALONE.”
matheusmoreira 6 days ago||
Same result here. That reasonable response was buried in page 2 of the hallucinated results.

Meanwhile, Gemini 3.1 Lite said with great confidence that I was a military police officer who gained national attention in 2024 after being involved in a high-profile confrontation. Other AIs said I was a footballer. Not sure if it's hilarious or worrying...

none_to_remain 6 days ago||
I got similar from ChatGPT - I took the wording to imply it knew exactly who I was but was going to keep quiet as I am not a "public figure".
Jaxkr 6 days ago||
This must be a remarkably expensive demo/toy to operate.
turtlesoup 6 days ago|
Not cheap for sure but it's all for fun! I have done some optimizations to try to get cost as low as possible; the final clustering actually uses Kimi K2 for this reason. More info on https://intheweights.com/about
jubilanti 6 days ago||
Because you don't have a privacy policy or anything really, I assume you're harvesting IP addresses and selling matches to the highest bidder.
tptacek 6 days ago|||
He stands to make dozens of fractions of a penny doing that! Must be pretty tempting.
somenameforme 6 days ago|||
There's a nice feature in Brave for sites with obvious privacy implications: right click -> Open link in private window with Tor.
floren 6 days ago||
Well, guess we'll have to wait a bit to see if we're in the weights... I got a 429, as I'm sure many others are (and thus mashing retry).
turtlesoup 6 days ago|
Didn't expect to hit the front page! Trying my best to keep it up
jubilanti 6 days ago||
Please place a large obvious notice that everything you type into that box will immediately be made public.

Please disable pagination on the "latest" leaderboard, with that every query is public.

turtlesoup 6 days ago||
Just disabled latest!
chrismorgan 6 days ago||
It’s funny, seeing the block (rather than line) cursor in the text box, my fingers itched to press i to enter Insert mode before typing my name.
6stringmerc 6 days ago|
Fascinating! I’d like to learn more about how to interpret the results to be honest, the About is awesome and helpful.

I scored 1,100 total on my music moniker. It has been used in SoundCloud and also via streaming services/releases via DistroKid. Represented in all the models but of course not disproportionally large fame so to speak. It’s just a very unique setup, somewhat designed to stand out.

My writing account, newer within the past few years, is just under 1,000. The Kimi and DeepSeek pick that up a lot more. I wonder if they train on Medium more than the others…

Thanks for sharing!

More comments...