Posted by ilamont 1 day ago
I suspect this is what's going on in most of these cases.
https://kagi.com/assistant/dba310d2-b7fa-4d30-8223-53dadc2a8...
For this comment on economics in the British Empire, I got:
> names that might fit the genre include rayiner, JumpCrisscross, or AnimalMuppet
https://kagi.com/assistant/69bd863b-7b5c-4b56-a720-6dfb4f120...
For my comment on C++:
> If I had to throw out names of HN commenters known for writing about Rust/C++ ABI topics, candidates might include steveklabnik, pcwalton, kibwen, dralley, or pjmlp — but this is essentially a shot in the dark, and I'd likely be wrong.
I am flattered to be associated with these commenters but I don't think I'm close to their level of skill.
(Like TFA, I found Opus’s explanations/rationales implausible.)
You can get something of an intuitive sense of what I mean if I ask you to pick a neuron in your brain and tell me when it fires. You can't even pick a neuron in your brain. You can't even tell whether a broad section of your brain is firing. It is only through scientific examination that we have any idea what parts of the brain are doing what; we certainly have no direct access to that information. There are entire cultures who thought the seat of cognition was the heart or the gut. That's how bad our access to our own neural processes is.
So "why" explanations always need to be taken with a grain of salt when a neural net (again, yes, fully including humans) tries to "explain" what it is doing.
Contrast this with a symbolic reasoner, which has nothing but "why" some claim is true (if it yields the full logic train as its answer and not just "yes"/"no"), no pathway for any other form of information to emerge.
I pasted in a number of passages from books on my bookshelf. Predictably, stuff that I read for my English degree in university is largely in the training data and easily identifiable. Stuff from regional authors or is slightly adjacent to the cultural mainstream makes no impression.
the article here isn't about the LLM recognizing works that were in the training data. EG, The Old Man and the Sea off the shelf. It's about pegging the author of novel texts, like, say, some letter written by Hemmingway that gets discovered next week and was never before digitized.
But I'm sure the scanning operations will start scouring the earth even harder for any books unaffected by slop containing niche knowledge and text in order for their models to have an edge over the ones trained only on pirate collections and the Internet.
I wonder if secondhand bookshops and deceased estates are seeing bulk buyers of their stock suddenly appearing. Maybe broke governments/municipalities will start selling them entire libraries and archives to ingest.
Is now the best and easiest time to leave something "forever"? Even after many generations of models, a model may still trigger a set of "memories" that know you and what you wrote.
Exciting and concerning.