Language Model Contains Personality Subnetworks

Posted by PaulHoule 6 hours ago

Language Model Contains Personality Subnetworks(arxiv.org)

39 points | 25 comments

D-Machine 2 hours ago|

The personality thing seems kind of tautological / uninteresting, as I have pointed out before: https://news.ycombinator.com/item?id=46905692.

Psychological instruments and concepts (like MBTI) are constructed from the semantics of everyday language. Personality models (being based on self-report, and not actual behaviour) are not models of actual personality, but the correlation patterns in the language used to discuss things semantically related to "personality". It would be thus extremely surprising if LLM-output patterns (trained on people's discussions and thinking about personality) would not also result in learning similar correlational patterns (and thus similar patterns of responses when prompted with questions from personality inventories).

The real and more interesting part of the paper is the use of statistical techniques to isolate sub-networks which can then be used to emit outputs more consistent with some desired personality configuration. There is no obvious reason to me that this couldn't be extended to other types of concepts, and it kind reads to me like a way of doing a very cheap, training-free sort of "fine-tuning".

observationist 45 minutes ago||

Some sort of software like ComfyUI with variable application of model specific personality traits would be great - increase conscientiousness, decrease neuroticism, increase openness, etc. Make it agentic; have it do intermittent updates based on a record of experiences, and include all 27 emotional categories, with an autonomous update process so it adapts to interactions in real time: https://www.pnas.org/doi/10.1073/pnas.1702247114

Could be very TARS like, lol.

It'd also be interesting to do a similar rolling record of episodic memory, so your agent has a more human like memory of interactions with you.

Another thing to consider about LLMs is that the nature of the training and the core capability of transformers is to mimic the function of the processes by which the training data was produced; by training on human output, these LLMs are in many cases implicitly modeling the neural processes in human brains which resulted in the data. Lots of hacks, shortcuts, low resolution "good enough" approximations, but in some cases, it's uncovering precisely the same functions that we use in processing and producing information.

D-Machine 33 minutes ago||

> Another thing to consider about LLMs is that the nature of the training and the core capability of transformers is to mimic the function of the processes by which the training data was produced; by training on human output, these LLMs are in many cases implicitly modeling the neural processes in human brains which resulted in the data. Lots of hacks, shortcuts, low resolution "good enough" approximations, but in some cases, it's uncovering precisely the same functions that we use in processing and producing information.

I would argue this is deeply false, my classic go-to examples being that neural networks have almost no real relations to any aspects of actual brains [1] and that modeling even a single cortical neuron requires an entire, fairly deep neural network [2]. Neural nets really have nothing to do with brains, although brains may have loosely inspired the earliest MLPs. Really NNs are just very powerful and sophisticated curve (manifold) fitters.

> Could be very TARS like, lol.

I just rewatched Interstellar recently and this is such a lovely thought in response to the paper!

[1] https://en.wikipedia.org/wiki/Biological_neuron_model

[2] https://www.sciencedirect.com/science/article/pii/S089662732...

Nevermark 1 hour ago|||

Agreed.

Everything in a model is a correlation of behavior with context and context with behavior.

"Mind set" is a factor across the continuum of scales.

Are we solving a math problem or deciding on entertainment? We become entirely "different brains" in those different contexts, as we configure our behavior and reasoning patterns accordingly.

The study is still interesting. The representation, clustering, and bifurcations of roles may simply be one end of a continuum, but they are still meaningful things to specifically investigate.

devmor 2 hours ago||

Thank you, I came here to say so much in less eloquent terms.

It's not surprising to find clustered sentiment from a slice of statistically correlated language. I wouldn't call this a "personality" any more than I would say the front grill of a car has a "face".

Deterministically isolating these clusters however, could prove to be an incredibly useful technique for both using and evaluating language models.

D-Machine 1 hour ago||

It's not even really the researchers' fault, academic psychological personality research is in general philosophically very weak / poor, in that they also almost always conflate "models of / talking about personality" with actual personality, and rarely actually check if things like the MBTI or Five-Factor Model actually correlate meaningfully with real behaviours.

Those that do find correlations between self-reported personality and actual behaviours tend to find those to be in a range of something like 0.0 to 0.3 or so, maybe 0.4 if you are really lucky. Which means "personality" measured this way is explaining something like 16% of the variance in behaviour, at max.

devmor 1 hour ago||

I don’t think this is even limited to this part of academia - or academia at all, but I do think it’s a bit irresponsible of them to assume prior rigor in those personality tests.

On top of that, a confounding issue is that human nature is to anthropomorphize things. What is more likely to be anthropomorphized than a construct of written language - the now primary method of knowledge transfer between humans? I can’t help but feel that this wishful bias contributes to missing the due diligence of choosing an appropriate metric with which to measure.

D-Machine 42 minutes ago||

Yup, I agree it is a general problem, and related to a tendency to over-anthropomorphize. At least in this case there was still something pretty good in the paper anyway.

tl2do 50 minutes ago||

The word "personality" smuggles in biological assumptions. Asking "does this model have personality?" feels unproductive because the term implies something it can't be.

More useful framing: how do these subnetworks produce outputs that observers evaluate as personality-consistent? Personality isn't an internal property - it's a judgment made by people watching behavior.

D-Machine 11 minutes ago|

> Personality isn't an internal property - it's a judgment made by people watching behavior.

Partly, yes, but personality is also an internal property, or it is coherent and correct enough to generally say that it has internal aspects. I.e. a person's personality is the set of (relatively) stable and difficult-to-change patterns that manifest in their behaviour in broad contexts, and these patterns are almost certainly encoded internally in the brain in some form. It is not much different than saying a person's intelligence / IQ is partly internal.

Otherwise, I do agree with your more careful framing, and I wish people thought and spoke more carefully about these things, and doubly so for LLMs.

est 1 hour ago||

is this somehow related ?

https://www.anthropic.com/research/persona-selection-model

sarducci 3 hours ago|

to me this suggests that language strongly influences behavior

mitthrowaway2 2 hours ago||

My interpretation is that it's the other way around. The language model trainer's job is to find the network weights that make the model best at compressing the data in the training set. So what this means is that, say, professional work-speak text samples and hacker l33t-speak text samples are different enough that they end up being predicted by different sparse sub-networks; it was apparently too hard to find a smaller solution in which the same sub-network weights predict both outputs.

yorwba 2 hours ago|||

All LLM behavior is mediated through language by construction. That doesn't mean the same applies to humans.

soulofmischief 2 hours ago|||

I think specifically, certain psychological modes require different levels of articulation, and language is one way to get there in a bandwidth-limited system.

PaulHoule 2 hours ago||

People are fascinated by controlling the vocabulary for political purposes but I think it mostly doesn't work. "Illegal Alien" is the exception that proves the rule.

Usually it results in an "equal and opposite backlash". Once they started calling children "Special" in school, "Special" became the ultimate insult.

D-Machine 2 hours ago||

It is a wordcel problem, i.e. the belief that language is all there is for modeling reality, even though this is obviously false and has been clearly disproven by decades of research in psychology, cognitive science, and neuroscience. At best we can say that sometimes language has a strong influence on our perceptions of reality.

EDIT: For a neuroscience reference that also argues why the general perspective is obviously false: https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/. But really, these things ought to be obvious from introspection.

PaulHoule 1 hour ago||

Also in my dealing with birds and animals of all sorts I've come to believe that they are very capable in many forms of cognition without the use of language.

There was a fad called "structuralism" that liked to imagine that such and such is "structured like a language" but then when we got a paradigm for language it was one of those "normal science" paradigms that Kuhn warned you about, like you could write papers grounded in the Chomsky theory for a lifetime but it wouldn't help you learn to read Chinese more quickly or speak German without an accent or program a computer to parse tweets. That is, the structure of language is absolutely useless except for writing papers about linguistics -- and the "language instinct" becomes some peripheral that grafts onto an animal but you need the rest of the animal for it to work.

Now LLMs may not be a model for how we do it but they are certainly going to bring back structuralist and "wordcel" positions because they do seem to show, somehow, that "language is all you need" to accomplish whatever it is LLMs accomplish.

D-Machine 58 minutes ago||

> Now LLMs may not be a model for how we do it but they are certainly going to bring back structuralist and "wordcel" positions because they do seem to show, somehow, that "language is all you need" to accomplish whatever it is LLMs accomplish.

People will try to bring back these obviously false models of cognition, but, so far, the dismal performance of LLMs on e.g. SpatialBench [1], and, almost certainly ARC-AGI-3, or e.g. the kind of data and effort required to get something like V-JEPA-2 [2], will be strong counter-examples to this. And, yeah, obviously animal cognition, esp. smart animals like birds, or the crazy stuff we see in chimp and gorilla ethology (border patrols, genocides, humor, theory of mind, bla bla bla).

[1] https://spicylemonade.github.io/spatialbench/

[2] https://arxiv.org/abs/2506.09985

uoaei 2 hours ago||

Language constrains your perception of reality to only the set of concepts conceivable within that language.

Agents who only speak Rust have no conception of what runtime errors are, for instance. Fascists won't understand concepts like "universal human rights" as in their worldview there is nothing universal about humanity as a whole.

PurpleRamen 1 hour ago|||

> Language constrains your perception of reality to only the set of concepts conceivable within that language.

It's the opposite. People make up new concepts all the time for which they have no words, to then give it a name. Language is composable, words and names are just a mean to improve communication, make it faster, more efficient.

> Agents who only speak Rust have no conception of what runtime errors are, for instance.

Agents don't really learn. They have a fixed set of data and everything new has to be pressed into the prompt. This is unrelated to language.

D-Machine 2 hours ago||||

This is IMO largely false, and empirically things like Sapir-Worf and strong linguistic relativism, or that language == thought are widely considered disproven [1-3].

This is also sort of a wordcel take, in that it neglects that there are plenty of mental structures that are not solely linguistic. I.e. visuo-spatial models, auditory models, kinaesthetic, proprioceptive, emotional, gustatory, or even maybe intuitive models, and symbolic models (which have both linguistic and visuo-spatial aspects). Yes, your models constrain your perception of reality, but it is not clear how important language really is to many of those models (and there is strong evidence it may not matter at all to a lot of cognition [3]).

[1] https://en.wikipedia.org/wiki/Linguistic_relativity

[2] https://plato.stanford.edu/archives/sum2015/entries/relativi...

[3] https://pmc.ncbi.nlm.nih.gov/articles/PMC4874898/

paganel 59 minutes ago||

Disproven by whom and under which context?

> evidence from neuroimaging and neurological patients

Has "neuroimaging" successfully modelled those "universal human rights" the OP was mentioning? If yes, how did it look?

More generally, positing that all languages are, in the end, interchangeable (because that's what the opponents of something similar to Sapir-Worf are saying) is very reactionary and limited in itself, and its telling them me calling those anti-Sapir-Worf people "reactionaries" will for sure tickle in them something that wouldn't have happened had I used a different "neuoroimaged" concept which, supposedly, should have meant the same thing for them (but it doesn't).

D-Machine 54 minutes ago||

> Disproven by whom and under which context?

See any of my links, but especially the third. Animal cognition and human neuroscience studies strongly disprove the importance of language to cognition. Conflating language and thought is so obviously false in 2026 it is extraordinary that people still think like this.

I was ignoring the comment about fascists because it is simplistic and low-quality, and will similarly not be responding to whatever you (incorrectly) think I was claiming about universal human rights. I only wanted to correct the extremely false (or at least hugely overstated) assumptions about language and perception of reality.

PaulHoule 2 hours ago|||

I'd argue that people can put words together to make new meanings or coin new words when they have to. The real magic of language is not "we have words for everything" but we have grammar.