Vera: a programming language designed for machines to write

Posted by unignorant 7 hours ago

Vera: a programming language designed for machines to write(github.com)

67 points | 59 commentspage 2

hybrid_study 5 hours ago|

I love the ## Why README section! Every repo should have one :-)

2001zhaozhao 5 hours ago||

> There are no variable names. @Int.0 is the most recent Int binding; @Int.1 is the one before.

You already lost me here. There's a reason variable names are a thing in programming, and that's to semantically convey meaning. This matters no matter whether a human is writing the code or a LLM.

kgeist 3 hours ago||

>The short answer is that variable names are one of the things that confuses LLMs rather than helps them. Unlike with humans, names undermine a model's efforts to keep track of state over larger scales. Models confuse similarly named variables in different parts of the codebase easily

So I wonder, doesn't this apply to function names too, which the author keeps in? I've seen LLMs use wrong functions/classes as well.

I think a proper harness, LSP and tests already solve everything Vera is trying to solve. They mostly cite research from 2021 before coding harnesses and agentic loops were a thing, back when they were basically trying to one-shot with relatively weak models (by modern standards)

onlyrealcuzzo 3 hours ago|||

> You already lost me here.

Agreed.

I'm working on a language designed for machines to write and humans to understand and review.

It doesn't seem worthwhile to have code nobody can understand.

foltik 3 hours ago|||

So there are variable names, they’re just inscrutable context dependent numbers.

ycombinatornews 4 hours ago||

Same here, reminds of JIRA’s field_17190 in MCP responses instead of description (and in similar excel-like systems)

Good luck managing hallucinations on that context

DonHopkins 5 hours ago||

This is exactly the wrong approach. LLMs are good at writing programming languages they already know, that are well represented in the training data, not at writing programming languages that they have never seen before, so that you have to include the entire programming language manual and lots of example code in every prompt.

atgreen 3 hours ago|

This is not my experience. I've been experimenting with something very similar to vera. However my language transpiles into multiple languages (Java, Typescript, Common Lisp, Rust, C++, Python, C# and Swift). The transpiler is written in the language itself (there's a separate bootstrap transpiler written in Common Lisp). But where I'm going is that Claude, at least, is extremely capable at writing decent code in my new language with barely any prompting; just minimal guidance on the language itself and no examples.

sas41 5 hours ago||

I find the claims regarding LLMs and their mistake prone nature around variable names very confusing.

It appears that me and creator have had vastly different experiences with LLMs and their capabilities with complex code bases and complicated business logic.

My observations point to LLMs being much more successful when variables and methods have explicit, detailed names, it's the best way to keep them on track and minimize the chance of confusion, next closest thing being explicit comments and inline documentation.

Poorly named and poorly documented things in a codebase only cause it to reason more on what it could be, often reaching a (wrong) conclusion, wasting tokens, wasting time.

Perhaps this diversion in philosophy is due to fundamental differences in how we view the tool at hand.

I do not trust the machine, as such I review it's output, and if the variables lacked names, that would be significantly harder. But if I had a "Jesus, take the wheel!" attitude, perhaps I'd care far less.

ginko 6 hours ago||

Is there any evidence that using structural references rather than names allows large language models to generate better code? This bit just feels like obfuscation for obfustcation’s sake.

Dragon-Hatcher 6 hours ago|

I've read the FAQ (https://github.com/aallan/vera/blob/main/FAQ.md) that provides the justification for this and it is, IMO, fairly weak. The main argument is that misleading names can confuse models. I have no problem believing this bit I'm not sure why we should assume code will have misleading names. In fact, the same document says that in tests they've had LLMs mix up the indices, which is exactly the problem I would foresee. It seems especially messy that the name for the same variable will change in different places in the code. The utility of De Bruijn indices is easy substitutability of expressions, which seems like totally the wrong thing to optimize for in a programming language.

Edit: the more I think about it the more this seems like a really bad idea. Three more issues come to mind: 1) it becomes impossible to grep for a variable, which I know agents do all the time. 2) editing code at the top of the function, say introducing a new variable, can require editing all the code in the rest of the function, even if it was semantically unchanged! 3) they say it is less context for the LLM to track but now, instead of just having to know the name of one variable, you have to keep track of every other variable in the function

firebot 4 hours ago|

Why not prolog or one of the other logic languages? It's really old, should be lots of good training data for it and the declarative nature would seem to be a great fit for llms.