Posted by PaulHoule 6 hours ago
Psychological instruments and concepts (like MBTI) are constructed from the semantics of everyday language. Personality models (being based on self-report, and not actual behaviour) are not models of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories).
The real and more interesting part of the paper is the use of statistical techniques to isolate sub-networks which can then be used to emit outputs more consistent with some desired personality configuration. There is no obvious reason to me that this couldn't be extended to other types of concepts, and it kind reads to me like a way of doing a very cheap, training-free sort of "fine-tuning".
Could be very TARS like, lol.
It'd also be interesting to do a similar rolling record of episodic memory, so your agent has a more human like memory of interactions with you.
Another thing to consider about LLMs is that the nature of the training and the core capability of transformers is to mimic the function of the processes by which the training data was produced; by training on human output, these LLMs are in many cases implicitly modeling the neural processes in human brains which resulted in the data. Lots of hacks, shortcuts, low resolution "good enough" approximations, but in some cases, it's uncovering precisely the same functions that we use in processing and producing information.
I would argue this is deeply false, my classic go-to examples being that neural networks have almost no real relations to any aspects of actual brains [1] and that modeling even a single cortical neuron requires an entire, fairly deep neural network [2]. Neural nets really have nothing to do with brains, although brains may have loosely inspired the earliest MLPs. Really NNs are just very powerful and sophisticated curve (manifold) fitters.
> Could be very TARS like, lol.
I just rewatched Interstellar recently and this is such a lovely thought in response to the paper!
[1] https://en.wikipedia.org/wiki/Biological_neuron_model
[2] https://www.sciencedirect.com/science/article/pii/S089662732...
Everything in a model is a correlation of behavior with context and context with behavior.
"Mind set" is a factor across the continuum of scales.
Are we solving a math problem or deciding on entertainment? We become entirely "different brains" in those different contexts, as we configure our behavior and reasoning patterns accordingly.
The study is still interesting. The representation, clustering, and bifurcations of roles may simply be one end of a continuum, but they are still meaningful things to specifically investigate.
It's not surprising to find clustered sentiment from a slice of statistically correlated language. I wouldn't call this a "personality" any more than I would say the front grill of a car has a "face".
Deterministically isolating these clusters however, could prove to be an incredibly useful technique for both using and evaluating language models.
Those that do find correlations between self-reported personality and actual behaviours tend to find those to be in a range of something like 0.0 to 0.3 or so, maybe 0.4 if you are really lucky. Which means "personality" measured this way is explaining something like 16% of the variance in behaviour, at max.
On top of that, a confounding issue is that human nature is to anthropomorphize things. What is more likely to be anthropomorphized than a construct of written language - the now primary method of knowledge transfer between humans? I can’t help but feel that this wishful bias contributes to missing the due diligence of choosing an appropriate metric with which to measure.
More useful framing: how do these subnetworks produce outputs that observers evaluate as personality-consistent? Personality isn't an internal property - it's a judgment made by people watching behavior.
Partly, yes, but personality is also an internal property, or it is coherent and correct enough to generally say that it has internal aspects. I.e. a person's personality is the set of (relatively) stable and difficult-to-change patterns that manifest in their behaviour in broad contexts, and these patterns are almost certainly encoded internally in the brain in some form. It is not much different than saying a person's intelligence / IQ is partly internal.
Otherwise, I do agree with your more careful framing, and I wish people thought and spoke more carefully about these things, and doubly so for LLMs.
See also: https://en.wikipedia.org/wiki/Newspeak
Usually it results in an "equal and opposite backlash". Once they started calling children "Special" in school, "Special" became the ultimate insult.
EDIT: For a neuroscience reference that also argues why the general perspective is obviously false: https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/. But really, these things ought to be obvious from introspection.
There was a fad called "structuralism" that liked to imagine that such and such is "structured like a language" but then when we got a paradigm for language it was one of those "normal science" paradigms that Kuhn warned you about, like you could write papers grounded in the Chomsky theory for a lifetime but it wouldn't help you learn to read Chinese more quickly or speak German without an accent or program a computer to parse tweets. That is, the structure of language is absolutely useless except for writing papers about linguistics -- and the "language instinct" becomes some peripheral that grafts onto an animal but you need the rest of the animal for it to work.
Now LLMs may not be a model for how we do it but they are certainly going to bring back structuralist and "wordcel" positions because they do seem to show, somehow, that "language is all you need" to accomplish whatever it is LLMs accomplish.
People will try to bring back these obviously false models of cognition, but, so far, the dismal performance of LLMs on e.g. SpatialBench [1], and, almost certainly ARC-AGI-3, or e.g. the kind of data and effort required to get something like V-JEPA-2 [2], will be strong counter-examples to this. And, yeah, obviously animal cognition, esp. smart animals like birds, or the crazy stuff we see in chimp and gorilla ethology (border patrols, genocides, humor, theory of mind, bla bla bla).
Agents who only speak Rust have no conception of what runtime errors are, for instance. Fascists won't understand concepts like "universal human rights" as in their worldview there is nothing universal about humanity as a whole.
It's the opposite. People make up new concepts all the time for which they have no words, to then give it a name. Language is composable, words and names are just a mean to improve communication, make it faster, more efficient.
> Agents who only speak Rust have no conception of what runtime errors are, for instance.
Agents don't really learn. They have a fixed set of data and everything new has to be pressed into the prompt. This is unrelated to language.
This is also sort of a wordcel take, in that it neglects that there are plenty of mental structures that are not solely linguistic. I.e. visuo-spatial models, auditory models, kinaesthetic, proprioceptive, emotional, gustatory, or even maybe intuitive models, and symbolic models (which have both linguistic and visuo-spatial aspects). Yes, your models constrain your perception of reality, but it is not clear how important language really is to many of those models (and there is strong evidence it may not matter at all to a lot of cognition [3]).
[1] https://en.wikipedia.org/wiki/Linguistic_relativity
[2] https://plato.stanford.edu/archives/sum2015/entries/relativi...
> evidence from neuroimaging and neurological patients
Has "neuroimaging" successfully modelled those "universal human rights" the OP was mentioning? If yes, how did it look?
More generally, positing that all languages are, in the end, interchangeable (because that's what the opponents of something similar to Sapir-Worf are saying) is very reactionary and limited in itself, and its telling them me calling those anti-Sapir-Worf people "reactionaries" will for sure tickle in them something that wouldn't have happened had I used a different "neuoroimaged" concept which, supposedly, should have meant the same thing for them (but it doesn't).
See any of my links, but especially the third. Animal cognition and human neuroscience studies strongly disprove the importance of language to cognition. Conflating language and thought is so obviously false in 2026 it is extraordinary that people still think like this.
I was ignoring the comment about fascists because it is simplistic and low-quality, and will similarly not be responding to whatever you (incorrectly) think I was claiming about universal human rights. I only wanted to correct the extremely false (or at least hugely overstated) assumptions about language and perception of reality.