Top
Best
New

Posted by vinhnx 10/29/2025

When models manipulate manifolds: The geometry of a counting task(transformer-circuits.pub)
98 points | 17 comments
Rygian 11/3/2025|
> The task we study is linebreaking in fixed-width text.

I wonder why they focused specifically on a task that is already solved algorithmically. The paper does not seem to address this, and the references do not include any mentions of non-LLM approaches to the line-breaking problem.

Legend2440 11/3/2025||
They study it because it already has a known solution.

The point is to see how LLMs implement algorithms internally, starting with this simple easily understood algorithm.

catgary 11/3/2025|||
I think this is an interesting direction, but I think that step 2 of this would be to formulate some conjectures about the geometry of other LLMs, or testable hypotheses about how information flows wrt character counting. Even checking some intermediate training weights of Haiku would be interesting, so they’d still be working off of the same architecture.

The biology metaphor they make is interesting, because I think a biologist would be the first to tell you that you need more than one datapoint.

Rygian 11/3/2025|||
That makes sense; however it does not seem like they check the LLM outputs against the known solution. Maybe I missed that in the article.
omnicognate 11/3/2025|||
There's also a lot of analogising of this to visual/spatial reasoning, even to the point of talking about "visual illusions", when its clearly a counting task as the title says.

It makes it tedious to figure out what they actually did (which sounds interesting) when it's couched in such terms and presented in such an LLMified style.

dist-epoch 11/3/2025||
it's not strictly a counting task, the LLM sees same-sized-tokens, but a token corresponds to a variable number of characters (which is not directly fed into the model)

like the difference between Unicode code-points and UTF-8 bytes, you can't just count UTF-8 bytes to know how many code-points you have

omnicognate 11/3/2025||
There's an aspect of figuring out what to count, but that doesn't make this task visual/spatial in any sense I can make out.
djoldman 11/3/2025||
A superior LLM for line length optimization:

https://www.youtube.com/watch?v=Y65FRxE7uMc

lccerina 11/3/2025|
Utter disrespect for using the term "biology" relating to LLM. No one would call the analysis of a mechanical engine "car biology". It's an artificial system, call it system analysis.
lewtun 11/3/2025|||
The analogy stems from the notion that neural nets are "grown" rather than "engineered". Chris Olah has an old, but good post with some specific examples: https://colah.github.io/notes/bio-analogies/
UltraSane 11/3/2025||
It makes sense if you define "biology" as "incredibly complicated system not designed by humans that we kind of poke at to try to understand it."
lccerina 11/4/2025|||
"not designed by humans"? Since when? Unless you count cortical organoids /wetware (grown in some instrumented petri dish) every artificial neural network, doesn't matter how complicated, it is designed by humans. With equations and rules designed by humans. Backpropagation, optimization algorithms, genetic selections etc... all designed by humans.

There is no biology here, and there are so many other words that describe perfectly what they are doing here, without twisting the meaning of another word.

UltraSane 11/4/2025||
By not designed I'm talking about the synaptic weights
lccerina 11/5/2025||
Still designed by humans. The loss function, backpropagation and all other mechanisms didn't just appear magically in the neural network. Someone decided which loss function to use, which architecture or which optimization techniques. Only because it takes a big GPU a lot of number crunching to assign those weights, it doesn't mean it's biological.

In the same way, a weather forecast model using a lot of complicated differential equations is not biological. A finite element model analyzing some complicated electromagnetic field, or the aerodynamics of a car is not biological. Just because someone around 70-75 years ago called them 'perceptrons' or 'neurons' instead of thingamajigs does not make them biology.

UltraSane 11/5/2025||
"Still designed by humans." No they are not. They are learned via backpropagation. This is the entire reason why neural networks work so well and why we have no idea how they work when they get big.
lccerina 11/7/2025||
And who designed backpropagation? It is not a magical property of artificial neurons or some law of nature or god's miracle. A bunch of mathematicians banged their head on the problem of backpropagation, tossed it to a computer, and voilà , neural networks made sense. Neural networks work so well because someone chooses the right loss function for the right problem. Wrong loss function -> wrong results. It's not magic. Nor it's biology.
addaon 11/3/2025|||
Sure, but it makes no sense at all if you define biology as “the smell of a freshly opened can of tennis balls.” The original comment is probably better understood using a standard definition of the words it used, rather than either of our definitions.