Vera: a programming language designed for machines to write

Posted by unignorant 6 hours ago

Vera: a programming language designed for machines to write(github.com)

67 points | 59 comments

danpalmer 4 hours ago|

> The empirical literature shows that models are particularly vulnerable to naming-related errors like choosing misleading names, reusing names incorrectly, and losing track of which name refers to which value.

I think Vera might be missing something here. In my experience, LLMs code better the less of a mental model you need, vs the more is in text on the page.

Go – very little hidden, everything in text on the page, LLMs are great. Java, similar. But writing Haskell, it's pretty bad, Erlang, not wonderful. You need much more of a mental model for those languages.

For Vera, not having names removes key information that the model would have, and replaces it with mental modelling of the stack of arguments.

drob518 1 hour ago||

My Spidey sense was tingling when I saw that, too. An additional issue is how humans are supposed to read the code at all so that they can provide help to the LLM if it’s off track. If the code is only usable by models, the models need to be good enough to deal with binary feedback (“Code doesn’t work.”). The human won’t be able to read the code and steer the model. Given the levels of steering required today, that makes me quite nervous.

mannykannot 1 hour ago|||

This will serve as an interesting empirical test, then: will LLMs do better with Vera than with Go or other languages? The testing so far seems inconclusive (https://github.com/aallan/vera-bench), but the authors make this interesting observation:

"No LLM has ever been trained on Vera. There are no Vera examples on GitHub, no Stack Overflow answers, no tutorials — the language was created after these models' training cutoffs. Every token of Vera code in these results was written by a model that learned the language entirely from a single document (SKILL.md [https://veralang.dev/SKILL.md]) provided in the prompt at evaluation time."

If LLMs do much better with Vera (or something like it) than with traditional languages, we may be entering a time when most machine-written code will be difficult for humans to review - but maybe that ship has already sailed.

robviren 4 hours ago|||

I too have found the models do well with Go. I will say despite the backwards compatibility guarantee library API changes, what counts as "good" patterns, and new language additions do add some friction to the experience. Almost always works but it can be a bit inconsistent in how the code shows up.

rapind 4 hours ago|||

> But writing Haskell, it's pretty bad,

I’m surprised by this. Most likely significant white space is a big part of the problem (LLMs seem horrible at white space). Functional with types has been a win for me with Gleam.

drob518 1 hour ago||

But LLMs do Python quite well, so white space isn’t necessarily a problem.

mannykannot 1 hour ago||

Yes - a point supported the Vera benchmark: https://github.com/aallan/vera-bench

Animats 1 hour ago|||

The same logic applies to comments. No comments are better than wrong comments.

sornaensis 4 hours ago|||

I'm curious what issues you had with haskell? I have had the opposite experience and find them dreadful at Java et al.

Surely, denser languages should be better for LLMs?

hgoel 3 hours ago|||

The context window also limits how deeply the model can "think", and it does this in natural language. So a language suited to LLMs would have balanced density, if it's too dense, the model spends many tokens working through the logic, if it's too sparse, it spends many tokens to read/write the code.

I think in the context of already trained LLMs, the languages most suited to LLMs are also the ones most suited to humans. Besides just having the most code to train on, humans also face similar limitations, if the language is too dense they have to be very careful in considering how to do something, if it's too sparse, the code becomes a pain to maintain.

cjbgkagh 1 hour ago||

I generally agree that humans and LLMs benefit similarly from programming language features. I would tweak that a bit and suggest that their ability floor is higher than the human lowest common denominator so I would skew towards the more advanced human programming languages. There are many typing / analyzer features that would be frustrating for humans to use given they’ll cause the type checking to be slower. This is much less of a problem for LLMs in that they’re very patient and are much better at internalizing the type system so they don’t need to trigger it anywhere nearly as often.

danpalmer 4 hours ago||||

Density is a double edged sword. On the one hand you want to minimise context usage, but on the other hand more text on the page means more that the LLM can work with.

zem 3 hours ago|||

my (uninformed) speculation is that you want resilience and error correction, which implies some level of redundancy rather than pure density.

smohare 4 hours ago||

[dead]

still_grokking 2 hours ago||

Why would anybody use a vibe-coded and vibe-desinged language which effectively does not exist yet instead of an established one with such features, like Scala?

https://arxiv.org/html/2510.11151v1

davidw 2 hours ago|

Also isn't it an advantage for LLM coding to use an existing language that has a lot of code that LLM's have already stol... I mean ingested?

fragmede 2 hours ago|||

Depends. A professor told me AI is really good at writing bad pandas code because it's seen a lot of bad pandas code, so starting from scratch isn't necessarily the worst thing.

still_grokking 2 hours ago|||

Exactly! Completely new languages without large amounts of reference material are terrible for LLMs.

rs545837 1 hour ago||

I agree 100% with this thinking approach, I've been working in this domain for quite a few months now.

The right granularity for agents isn't files or lines, it's entities: functions, classes, methods. That's how both humans and agents actually think about code.

We built sem(Ataraxy-Labs/sem) which extracts entities from 30+ languages via tree-sitter and builds a cross-file dependency graph, so building semantic version control and semantic diff. weave (same org) takes it further and does git merges at entity level. Matches functions by name, merges their bodies independently.

The dependency graph also answers questions LLMs can't. I love the analysis based on ASTs.

rtpg 4 hours ago||

The lack of naming seems to indicate a fundamental misunderstanding of how LLM coding agents are successful, and just makes me doubt anything about this project being useful and workable.

svachalek 4 hours ago|

Yeah it seems based on 2023 research which is ancient, back when we didn't have coding agents at all, and on some 1980s sci fi concepts of "how machines think" (beedeeboop) rather than the all too human coding agents we have.

If I had to design one of these, I'd go for:

1. Token minimization (which may be circular, I'm sure tokens are selected for these models at least in part based on syntax of popular languages)

2. As many compile time checks as possible (good for humans, even better for machines with limited context)

3. Maximum locality. That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase. Because of how context and attention work. This is the one I don't see much in commercially popular languages. It's more of a declarative thing, "configuration driven development".

gavmor 2 hours ago|||

Features written in one file, rather than "cohesive" modules with a single "responsibility" in one file?

So, orthogonal to the accepted, common code organization idiom (no matter how infrequently adhered to)?

Fascinating! Just the other day I decomposed a massive Demeter violation into stepwise proxying "message passing." I was concerned that implementing this entire feature—well, at least a solid chunk of it— as a single, feature-scoped module would cause the next developers eyes to glaze over upon encountering such a ball-of-mud, such a dense vortex of spaghetti.

But, as I drove home that evening, I couldn't help wonder if I hadn't, instead, merely buried the gordian lede behind so many ribbons of silk.

Octoth0rpe 3 hours ago||||

> That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase.

This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

Throwing out another idea: I wonder if we could see some kind of equivalent of c header files for more modern languages so that an llm just has to read the equivalent of a .h file to start using a library.

preommr 2 hours ago|||

> This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

my solution (as someone that's building something tangential) is to use granular levels of scope - there should be an implicit single file that gets generated from a package at a certain phase of the static tool processing. But the package is still split into files for flexibility and DevEx (developper experience). Files/Folder organization is super useful for humans. For tooling, the pacakge can be taken collected together, and taken as a single unit, but still decomposed based on things like namespace, and top-level definitions that define things like classes, specifications, etc. That way the tooling has control over how much context to pass in.

lesam 2 hours ago||||

I think AST aware code reading is criminally underused by agents - you don't need a header file if you can see a listing of all the functions in a library.

Similarly, I don't read the whole file a function is in while editing it in an IDE, why should a coding agent get the whole file polluting its context by default?

gavmor 2 hours ago||

Check out Ataraxy-Labs/weave for AST-aware git merges.

But, I wonder, do AST-aware tools cleave to the LLM training manifold the way coding-tutorial slop does?

still_grokking 2 hours ago|||

Why would you need "header files" when a LSP server can give you just the outline of some file?

drivingmenuts 3 hours ago|||

> all too human coding agents

There is no actual thought occurring. Arguably, we can say the same about a lot of humans at any given moment, but with machines there never is. It's all statistics.

offbyone42 1 hour ago||

I feel like this misses how LLMs work.

Yes, you’re adding this layer of verification, but LLMs don’t think in ASTs or use formal logic.

They are statistical predictors, just predicting what the next token will be.

There is a reason they perform best with TS/PY and not Haskell. The difference in size of the code corpus for each language.

The premise behind this seems to ignore all of that.

solomonb 4 hours ago||

I think Hindley Milner (for decidability) + Linear Types (for resource management) + Refinement Types (for lightly asserting invariants) + Delimited Continuation based Effects (for tracking effectful code) + Unison style Content Addressability (for corralling code changes, documentation, and tests) would make a really nice language for an LLM.

still_grokking 2 hours ago|

That's in large parts Scala.

It doesn't have Hindley-Milner type inference, but it has very strong type inference.

We will get linearity soon thanks to and as part of the Capybara[1] effort.

Refinement types are already long a reality.

The whole new effect tracking thing is based on delimited continuations.

The Unison style content addressability comes up now and then, maybe it will become a reality at some point. It's though mostly not a language thing but more a build system thing.

Scala is already great for for LLMs also for other reasons:

https://arxiv.org/html/2510.11151v1

[1] https://2025.workshop.scala-lang.org/details/scala-2025/6/Sy...

rickcarlino 40 minutes ago||

Reminds me of http://cobra-language.com/

hahahacorn 56 minutes ago||

I think the best language for LLMs is going to be as close to English as you can get with the compiler guarantees offered by Vera (or something similar).

Seemingly opposing forces.

unignorant 5 hours ago||

This isn't my project, but I shared it here because it has a few important ideas I've been thinking about in my own work. Effect type systems in particular are a really good fit for LLMs because they allow you to reason very precisely about a program's capabilities before runtime (basically, using the type system for capability proofs). This helps you trust agent-created code (for example, you know it can't do IO), or, if the code does require certain capabilities, run it in a sandbox (e.g., mock network or filesystem). This kind of language design also provides a safer foundation for complex meta-systems of agents-that-create-agents, depending on how the runtime is implemented, though Vera may be somewhat limited in that particular respect.

The major design decision I'm a little skeptical about is removing variable names; it would be interesting to see empirical data on that as it seems a bit unintuitive. I would expect almost the opposite, that variable names give LLMs some useful local semantics.

still_grokking 2 hours ago|

You're looking for Scala… ;-)

https://news.ycombinator.com/item?id=47957121

hyperhello 5 hours ago|

> Division by zero is not a runtime error — it is a type error. The compiler checks every call site to prove the divisor is non-zero.

Elaborate a little here.

hgoel 3 hours ago|

Presumably an analyzer that makes it an error to not have an immediately traceable zero check.

C# can do something similar with null references. It can require you to indicate which arguments and variables are capable of being null, and then compiler error/warning if you pass it to something that expects a non-null reference without a null check.

hyperhello 3 hours ago||

But that’s because null is a static type. Zero isn’t a static type. How can I know if a calculation produces zero if I can’t predict the result of it at compile time?

cjbgkagh 1 hour ago|||

Post type check analyzers can work with more than just the type information, you can really do whatever you want at this stage. The normal highly optimized type checker handles the bulk of the checking and the post type check analyzers can work on the residual. You wouldn’t type check a file that doesn’t parse, and you wouldn’t run the analyzers on code that doesn’t type check.

The problem is these checks can be rather slow and people don’t want to wait a long time for their type checking and analyzers to finish. But LLMs can both wait longer and by internalizing the logic can reduce the number of times it will need to trigger them.

Edit: I’ll need to examine this project to know where (or if) they draw the distinction between normal type checking and a post type check analyzer. If they blend the two and throw the whole thing into Z3 it’ll work but it’ll be needlessly slow.

Edit: What I’m calling a post type check anyalizer they’re calling a contract verifier and it’s a distinct stage with ‘check’ (type check) then ‘verify’ (Z3).

hgoel 2 hours ago|||

I think it's about if there's a possibility of it being zero. Of course there's no way to tell at compile time that a value will definitely be zero.

So, in pseudocode

int div(int a, int b): return a / b;

Would probably be a compile time error, but

int div(int a, int b): return b == 0 ? ERR : (a /b);

Would not, or at least that's what I'd expect.

rdevilla 1 hour ago|||

> Of course there's no way to tell at compile time that a value will definitely be zero.

Yes there is. Dependently typed languages like Idris can inspect terms at the value-level during compile time. Rather, instead of proving that the divisor will be zero, you must instead statically prove that the divisor cannot be zero; otherwise the code will not typecheck.

hyperhello 1 hour ago||

Okay,

int integer_division(int a, int b) { if (b!=0) return a/b; raise(SIGFPE); }

Great.

rdevilla 1 hour ago||

You don't appear to understand the difference between runtime and static analysis/compile time, or term-level and type-level.

hyperhello 1 hour ago||

Great! Explain it to us while I read to my kid!

rdevilla 55 minutes ago|||

Don't get mad because you're too lazy to even ask the AI. You are first to be replaced in the workforce.

Or maybe it's over your head and you should just stick to reading children's fiction after all. Want some colouring books too?

hyperhello 14 minutes ago||

Yes! We can always use more books and toys here!

cjbgkagh 58 minutes ago|||

The ‘let me google that for you’ is set to be replaced with ‘let me ask ChatGPT for you’.

still_grokking 2 hours ago|||

Or it's just some AI brain fart…

The whole things looks vibe-coded, and vibe-designed.

More comments...