Posted by bgavran 6 hours ago
In the 1980s structural editors were quite popular (fe the basic editor in the ZX81). Using these, it is impossible for the programmer to create text that is not a valid program.
Has the author written anything on how this applies to transformer architectures specifically? The attention mechanism seems like a place where a richer type theory would be genuinely useful.
https://cybercat.institute/2025/05/07/neural-alchemy/
https://cybercat.institute/2026/02/20/categorical-semantics-...
https://cybercat.institute/2025/10/16/dependent-optics-ii/
> The reason I put off starting the series for so long is one of the same reasons blocking the writing of the paper: some of the introductory material is some of the most difficult to write. It has been such a long time that I no longer know how to adequately explain why the problem is so difficult.
My sympathies to Jules
Then it switches to using an encoding that would be more semantic, but I think the argument is a bit flimsy: it compares chess to the plethora of languages that LLM can spout somewhat correct code for (which is behind the success of this generally incorrect approach). What I found more dubious is that it brushed off syntactical differences to say "yeah but they're all semantically equivalent". Which, it seems to me, is kind of the main problem about this; basically any proof is an equivalence of two things, but it can be arbitrarily complicated to see it. If we consider this problem solved, then we can get better things, sure...
I think without some e.g. Haskell PoC showing great results these methods will have a hard time getting traction.
Please correct any inaccuracies or incomprehension in this comment!
On existing techniques - Type-Constrained Generation paper is discussed in the blog post (under Constrained Decoding), and I'd group typed holes in the same bucket.
The problem with those methods is that they're inference time: they don't update the weights. In this case, constrained decoding prevents the model from saying certain things, without changing what the model wants to say. This is especially problematic the more complex your type systems get, without even taking into account that type inference is for many of these undecidable.
Meaning, if I give you a starting string, in the presence of polymorphisms and lambdas you might not always be able to tell whether it completes to a term of a particular type.
On the syntactic difference: I'd gently reframe. The question isn't whether syntactically different programs are semantically equivalent, it's that regardless of which form you pick, the existing methods don't let the model learn the constructor choice.
That's what the next section is about.
>The problem with those methods is that they're inference time
I agree, I just thought it was missing some prior art (not affiliated with these papers :-P)
What is not clear to me at all is, is this the draft of a research idea? Or is there already some implementation coming in a later post?
It seems to me that such an idea would be workable on a given language with a given type system, but it seems to me there would be a black magic step to train a model that would work in a language-agnostic manner. Could you clarify?