Top
Best
New

Posted by indigodaddy 14 hours ago

If AI writes your code, why use Python?(medium.com)
532 points | 580 comments
pshirshov 9 minutes ago|
No reason, unless the project is simple. The more you can offload onto your compiler/typer - the shorter is the feedback loop, the better agents work.

Lack of strictly enforced static typing make agents fail much sooner with Python. In my opinion, Rust and Scala are the best targets for agentic flows - and, coincidentally, they have the most advanced typers among mainstream languages.

But any statically typed language behaves better than any dynamically/duck typed language. When I say "better" I mean delivery time and the amount of shipped defects.

Another thing which helps (but not generally applicable) - ask your agent to verify critical protocols with formal proof in TLA+/lean/coq. Agents are bad at formal proofs - but generally are much better than most of the humans.

elcritch 3 minutes ago||
For me it's all about Nim + LLMs. I'm greedy and want both fast-to-ship and fast-to-run? Readability comparable to Python but with strict static typing that LLMs can't "cheat".

I actually (mostly) enjoy reading the code that the LLMs create in Nim. It's quick to read and look for refactor or cleanups. Compile times in seconds so the LLMs is usually the slow piece. It's fun and productive. With Python + LLMs I'm seeing them just create ever more layers of unmanageable cruft.

Recently I wanted "magic" behavior to get OpenAPI types and swagger.json along with auto parsing my rest APIs for me. I had Codex make a library for me using compile time reflection and a sprinkling of macros. Done, simple.

bryanrasmussen 3 hours ago||
One obvious reason is Python's extreme readability, it has often been described as being as close to executable pseudo-code as one can get.

If you're using an LLM to write code I think the rules would be

1. Use a language you know really well so you can read it easily, and add to it as needed.

2. Use a language that has a large training set so the LLM can be most efficient.

3. Use a language that is easy to read.

If your language has a small training set or you don't intend to do much addition or you don't really know any language that well or are restricted from using choice 1 for some reason, 2 and 3 move up, and python has a large training set and it is easy to read.

simonask 3 hours ago||
Python is locally readable. Reasoning about larger systems in Python is where things get really hard, because you have to describe how many small individually readable things interact with each other in a very limited vocabulary.
bazoom42 3 hours ago|||
For larger systems you create your own modules and abstractions, so comprehensibility at higher level does not depend so much on the language.
sundarurfriend 2 hours ago||
The tools the language gives you to create those abstractions make a lot of difference, however.
mbreese 44 minutes ago||
But every abstraction that an LLM has to write is a choice. Your way of writing Python may not match that choice. The next run of the agent might not choose the same way.

Because the language gives you many different tools, an LLM generated codebase can get inconsistent and overly complicated quickly. The flexibility of Python is a downside when you’re having an LLM generate the code. If you’re working in an existing codebase, it’s great - those choices were already made and it can match your style.

When an LLM has to derive its own style is when things can devolve into a jumbled mess.

bryanrasmussen 3 hours ago||||
hmm, yeah given LLM's ability to churn out lots of code quickly and be overly verbose in that code that is a potential downside. That it could in a quick one time edit create so much intellectual overhead that Python might be the wrong language to understand what is going on.

What language do you feel is easier to reason about in the large?

hiAndrewQuinn 3 hours ago|||
Haskell would be my vote, and Rust too, actually, both because of their very strong type systems. The type system lets you very quickly figure out what something is before you figure out what something does, and it turns out that separating those two concerns as hard as those two languages do often results in doing the whole one-two punch faster.
lukan 1 hour ago||
Haskell does not qualify for a large training set, though. (Nor for readability in my opinion)

I think I have never seen haskell software made wih LLM's but well, aside from university, I have not seen Haskell code at all. (Also Haskell purists I would associate with people who avoid LLM's)

I would rather go with Rust given these choices.

But I have good results with typescript (or javascript for simpler things). Really large set of examples. Tools optimized for it, agents debugging in the browser works allmost out of the box. And well, a elaborate typesystem.

yakshaving_jgt 26 minutes ago||
[dead]
jimmaswell 1 hour ago||||
C# is as close to an ideal language as you can get for most things IMO. I find AI does a great job with it.
pjerem 56 minutes ago||
I do agree. C# is an hidden gem for IA. There are not that much different ways to get somewhere so the model have probably been trained on the framework and libraries everybody uses (the Microsoft ones).

Compared to most languages, including Java, C# will have a hard time letting you compile incoherent code.

You barely need any dependencies other than aspnetcore and efcore for most applications and your AI knows them well.

It’s easy to do TDD with it so it’s easy to keep your IA from hallucinating.

harperlee 2 hours ago|||
I'd say Java, because it has a massive footprint amenable for training, and a strong type system (does not have sum types though and those are trendy).

You'd have to steer the LLM to use the style you want, and not massively overarchitect things though, but that's going to be an issue nonetheless.

scared_together 2 hours ago||||
I’m curious about the design space of languages & frameworks which are lower level than LLM prompts but higher level than Python, Ruby and Common Lisp.

Do you have any recommendations for systems where reasoning about large systems is easier than in python?

skydhash 23 minutes ago||
You have to go into live programming, code in a system, and saving images. Readability is no longer a factor, what you want is easy access to documentation, quick navigation, and a playground.
ant6n 3 hours ago|||
That’s true. Once you have APIs and want to use classes to create larger structures, the language is full of warts.
cturner 7 minutes ago||
I have built large systems on python that use classes, for more than ten years. I came to it from Java, ten years.

As a rule, I avoid implementation inheritance. Occasionally I need to facade a library that assumes implementation inheritance to avoid it spreading into my codebase.

When the codebase hits a certain size, I hand-roll some decorators to create functionality like java interfaces. With that done, and a suite of acceptance tests, I find it scales up well.

ashishb 2 hours ago|||
Python is amazing for scripting.

Python is terrible for writing big systems.

Projects whose V1 is written in Go/Rust/C++ don't normally go out and re-write V2 in Python.

The reverse is really common.

Even many famous Python packages are now Python wrappers.

https://ashishb.net/programming/python-in-production/

quietbritishjim 4 minutes ago|||
> Projects whose V1 is written in Go/Rust/C++ don't normally go out and re-write V2 in Python.

That's because you would usually rewrite your Python program in something like C++ if you realise that it's too slow and you need the speed of a compiled language, despite the enormous extra complexity to create and maintain it that way.

You wouldn't go back the other way because it's very rare to go to all that extra effort writing in a more efficient language only to realise that the slower performance of Python would've been adaquate after all. And, thanks to sunk cost fallacy, even someone that does realise it is unlikely to make the switch back.

There's no way you could convince me that writing your program in C++ is easier to code in, even for a very large system, than Python. C# maybe.

> Even many famous Python packages are now Python wrappers.

Of course! That's precisely because Python is much simpler to code in. If your Python libraries are wrappers around native code then you get the speed benefit without having to drop into those languages. (Plus they can release the GIL, allowing true multithreaded Python.)

If native coding languages were good enough then there would be no need for Python wrappers - you'd just call into the library directly.

marliechiller 1 hour ago||||
I dont know if the reasoning for a rewrite is purely for maintainability though. Ive used python at scale and its fine if you have reasonably good code hygiene. The reason I'd want to rewrite in any of those languages is they're significantly faster _and_ are maintainable at scale.
kurtis_reed 2 hours ago|||
Python is faster to write so obviously you'll see things built in Python first more often than the reverse. What's that quote -- "Better to remain silent and be thought a fool..."
ashishb 1 hour ago||
Indeed. Python is faster to write and harder to maintain over the long run.

The "faster to write" advantage becomes less relevant if most code is going to be auto-generated.

The "harder to maintain" might still remain more relevant.

DaanDL 3 hours ago|||
I never really understood what exactly is so readable about python. I've been developing in Python for 8 years now, and before that I was a C# developer, and I don't find Python to be that more readable.

Sure there's less ceremony, and yes, you can have your project going with just a single file, but other than that...?

bazoom42 3 hours ago|||
C# is also a great language, but notice how it have been moving closer to Pyhon-style syntax. E.g. now you can initialize a list like [a, b, c]. They wouldn’t add that syntax if they didnt think it was an improvement.

Less ceremony and boilerplate means more readable code.

sundarurfriend 2 hours ago||||
"whitespace, not brackets" from a sibling comment touches on it, but a lot of people, beginners especially (but not uniquely), are put off by symbols when reading code. Python is less symbol-heavy than most languages, by using whitespace and syntax and words (eg. `and` not `&&`, explicit `lambda x:` rather than `x =>`) in their place. It doesn't go so far as COBOL as to be cumbersome, but far enough to make a difference to a lot of people.
dust-jacket 1 hour ago||||
Reaaaally?

I think a lot of the readability of python is in the fact you don't need to be recently familiar with it to pick up what its doing most of the time.

Over my career I've dipped in and out of rust, typescript, perl, swift, etc codebases. I'm no expert in any of these, but every single time I have to look something up to understand what this set of arcane symbols or syntax means.

When I dip into Python I just ... read it.

(None of this is to say I prefer Python, just that I really do get the readable thing)

trashb 1 hour ago||||
I agree, especially very "pythonic" structures if overly shortened are hard to decipher especially if you don't use or read python on a regular basis.

Often times when I am reading a medium or advanced python codebase I need to look into the function definitions and operator documentation to understand what is supposed to be returned. Where with C-like languages I feel it is easier to build that context because there is more context written and less tricky syntactic sugar.

bazoom42 1 hour ago||
> if overly shortened are hard to decipher especially if you don't use or read python on a regular basis.

Sure, but this is the case for any language.

lukan 35 minutes ago||||
" and before that I was a C# developer"

So .. you were already trained in reading abstract.

A beginner on the other hand sees lots of intimitading {} in C family languages everywhere. And Python does not need them and less is usually better in design.

fragmede 3 hours ago||||
The "other than that" is whitespace, not brackets. Whether that's a big deal is up to you, but the carry on effect of that is that the code is indented the way the control flow interprets it, so there are no bugs from misplaced braces. (Plenty of other bugs for other reasons, unfortunately.)
strangegecko 3 hours ago|||
I find brackets help me understand structure from a distance much better than whitespace.

Misplaced brackets seem like a thing from the past to me when we didn't have IDEs. I don't remember ever having a bug due to that.

zahlman 2 hours ago|||
> I find brackets help me understand structure from a distance much better than whitespace.

I can't imagine how. Whitespace physically lays out the block structure on the screen; braces expect you to count and balance matching symbols, and possibly scan for them within other line noise.

pmontra 44 minutes ago|||
Nevertheless it happens that while moving code around one wonders what indentation level that code should go. Undo, undo or git show the original code, look at it, retry more carefully.

Brackets would allow the editor to autoindent the pasted code.

No choice is perfect.

polytely 1 hour ago|||
Working in C# i feel basically still read code structure by the visual block structure / indentation. I dont think I've ever counted braces in my professional life. The IDE makes sure it is formatted correctly and ambiguity is basically impossible.
bazoom42 1 hour ago|||
So you would find bracketed code without any use of indentation easier to read than python?
pmontra 40 minutes ago||
It's no more 1990, when Python was born. Editors have been automatically indenting bracketed code for a long while. Probably notepad doesn't, or maybe plain vanilla vim.
johncearls 1 hour ago||||
Whitespace forcing proper indentation practices has always been one of my favorite aspects of python. I TA'd a data structures in C++ class and the lack of proper indentation making code unreadable was my biggest pet peeve. I always made the student fix their indentation before I would help them debug it.

I know that is mainly a beginner coding issue, but never having to deal with that issue was always one of the biggest advantages of python.

That said, I believe a lot of the stuff that was added in 3 and beyond (to make it more typesafe, accounting for unicode, etc) has made it a lot less readable over time. You can argue that it has made Python a better and safer language, but the pseudocode aspect has gotten worse. I kinda miss that.

jeltz 2 hours ago|||
Python and C are the only language in which I have experienced that class of bugs. And that is due to if statements without brackets in C and because Python has meaningful indentation which people have accidentally messed up when refactoring.

And today with autofotnatters I think only Python is still vulnerable.

zahlman 2 hours ago||
If you are messing up indentation accidentally during refactoring there is either something wrong with your tooling (including your text editor) or you are letting things get too far out of hand before starting the refactoring.
bjourne 1 hour ago||||
Other than that? Exactly that!
huflungdung 1 hour ago|||
[dead]
theshrike79 1 hour ago|||
My preferences are always Go first and Python if there are specific libraries that make my life easier.

Go is a simple target for LLMs as the language has changed very little and with the Jetbrains go-modern-guidelines[0] skill the LLM can use the handful of recent additions effectively

And with Python there are things like ruff and pydantic that can enforce contracts in the code.

[0] https://github.com/JetBrains/go-modern-guidelines

LukaD 1 hour ago|||
> 2. Use a language that has a large training set so the LLM can be most efficient.

I seriously doubt this is really the case. From my experience coding agents just love writing bad python code. It always needs explicit instructions for example to use uv instead of raw dogging pip. There is a lot of python code out there because it is being taught as a beginner language and because of that there is necessarily a lot python code written by beginners. That's my explanation at least for bad LLM generated python code.

javier123454321 1 hour ago|||
No, I think the argument from the article is pretty good. Use a language that has a lot of guard rails built in.
nicman23 56 minutes ago||
or a compiler that makes the llm sad
teleforce 2 hours ago|||
I think this is where D language make an excellent alternative to Python for AI assisted coding [1].

1) It's a very consistent language even if you compared to the other popular languages namely Python, Rust, C++ and Go. Try to perform doubly linked list with them and compare them all [1].

2) It's probably the most "Pythonic" among the compiled language according to Walter.

3) It utilizes GC by default, you can also manage your own memory and you can hybrid.

4) It compiled fast and run fast, heck it even has built-in REPL eco-system.

5) Regarding the small training set, with recent self-distillation fine-tuning approach it should be good enough, D (actually D2 version) has been around for more than a decade [2].

[1] Looking for a Simple Doubly Linked List Implementation:

https://forum.dlang.org/thread/osmecwfnpqahoytdqpkr@forum.dl...

[2] Awesome D:

https://github.com/dlang-community/awesome-d

PunchyHamster 2 hours ago|||
> One obvious reason is Python's extreme readability, it has often been described as being as close to executable pseudo-code as one can get.

But it's LLMs that read it not humans. At least that's the trend

> Use a language that has a large training set so the LLM can be most efficient.

It's pretty efficient with Rust.

subscribed 2 hours ago||
But plenty of humans like to be able to read the generated code and understand / edit that.
slifin 3 hours ago|||
I would assume it's important to know what's in that training set too

Because I get reliable generation out of "niche" languages already

Is it code with lots of SQL injections used in a different domain to your own?

It's maybe not good to conflate quantity with quality

fragmede 3 hours ago||
This is dated, but a professor told me that LLMs are really really good a generating bad pandas code because it's been trained on so much of it!
nicman23 57 minutes ago|||
c llm code is more readable as it probably trained on better code
psychoslave 57 minutes ago|||
Disagree, it's verbose, but it's full of needleslly verbose stuffs, use many _ for everything and the rest, and other opaque conventions. Not that any other dev ecosystem is free of any of these issues, but Python just don't shine much on them. If anything in term of script language, Ruby provides a far more solid ground for compact and readable exposure of ideas through something close to prosaic expression.
moffkalast 3 hours ago|||
So in short, use Javascript /s
fennecfoxy 2 hours ago||
I think that pseudocode aspect is what makes it hard/frustrating to read for me.

I'm more of a c++/TS/etc user, so I miss braces a lot. I think a basic Python script sure it's easy to read through, but a large project starts to get quite ugh.

I am very jealous of Python's numerous built-ins though. I was looking for a JS sum function the other day and was surprised to see node.js still doesn't have a built in + you still cannot reference operator functions.

FartinMowler 1 hour ago|||
But at least JS now has a built-in leftpad function ;) (called padStart).
fennecfoxy 1 hour ago|||
Lmao are people really -ing me because I don't like Python. Tribalism is present in all areas of human life I suppose.

You people should grow up. Programming languages are tools, not pets.

_boffin_ 10 hours ago||
Read the first few comments and surprised I didn’t see it, but training data. The voluminous amount of Python in the training data.

I could write in brainfuck with ai, but I presume, wouldn’t get the same results than if going with python.

My follow up question: with AI now, why care about a lang until you need to?

gertlabs 9 hours ago||
Surprisingly, LLMs are actually much worse at reasoning in Python than other common programming languages for agentic coding tasks.

Data here: https://gertlabs.com/rankings?mode=agentic_coding

BariumBlue 8 hours ago|||
Hah, I was just thinking that Python likely has a vast ocean of training data, but it's likely of lower quality, being much of it is written by beginners and those who aren't primarily programmers.
kraf 1 hour ago|||
That's what I'm thinking too. There is a lot of noise and I know teams where the majority of the people writing Python just have no idea what they're doing.

I'm working with Clojure which is used mostly by senior engineers and it still blows my mind how well Claude writes software in it even though it's a fringe language. It's even able to pick up in-house DSLs written with macros.

smoe 43 minutes ago||||
Having used Python on and off for 20 years, my experience with LLMs writing Python has been mixed. I don’t think that’s necessarily because of a low-quality dataset, but rather because Python’s applications are so broad and the language has gone through several paradigm shifts over time: sync vs. async, typed vs. untyped, scientific Python looking very different from web application code, some people really wishing it were an FP language, and others doing the clean-architecture OOP onion soup. It has gotten so fragmented.

Recently, I had a more pleasant experience using LLMs with Go. It reminds me a bit of Python 2.x, when the community seemed, in my view, more focused on embracing a stupid simple language, with everyone trying to write roughly similar "Pythonic" code.

dariusj18 7 hours ago||||
That was the hardest part of learning PHP, all the code examples online were just awful.
andai 2 hours ago||
Worked on a PHP project once. Every time I asked why something was done a certain way the answer was "dunno, we copy pasted this code snippet."

Certain popular PHP codebases appear to use a similar methodology.

librasteve 2 hours ago||||
I was (pleasantly) surprised by Claude Code doing Raku - also with a limited training set (~2000 Stack Overflow, a bunch of Rosetta, 2,500 modules). I put this down to the quality of the code for the core community who are all frankly uber-gremlins.
polytely 1 hour ago||
Yeah Raku feels so expressive and lovely to me with the help of an AI assistant. I've only done toy programs and scripts with it but it is actually so nice.
stefanfisk 7 hours ago||||
Reminds me of the time I asked Claude to write some Wordpress code for me. The results were…rough.
FireBeyond 6 hours ago||||
All my vibe coded projects (personal) are Go backend services, with Typescript/React frontend. And my thoughts were based on similar things. Like why I wouldn't use PHP for that, either.
topham 8 hours ago|||
There's a broken idea that AI know Python because they're written in Python.

Not how any of it works.

gertlabs 7 hours ago|||
While recent models are capable of generalizing to any language at this point, I do think there are weights from their pretraining corpus that still leak through into how they create their responses. We observed similar language performance patterns across models from different providers, btw.
dasyatidprime 4 hours ago|||
Not what anyone was talking about. Training corpus ≠ inference engine.
stingraycharles 5 hours ago||||
I’m super surprised that C++ scores so high, this does not match our experience at all, and for anything performance critical it always drops the ball completely.

I also don’t understand how these “games” map to real world complex problems. How are you measuring success? How does “adversarial customer service” map to “this LLM is better at C++ than the other” ? How are you sure you’re not just benchmarking language suitability for a problem ?

I have so many questions about this…

gertlabs 4 hours ago||
- The majority of the environments can be played where the agent writes code to work the environment towards a goal. So the model is problem solving, and it has to do so in a particular language, and some languages outperform others. We have a lot of data to back up the improved compiled language performance, but note these are for successful code submissions (failures are counted in a different metric). With the Languages chart we're moreso measuring how good the ideas they came up with were, once they already compiled/didn't fail basic environment rules.

- You need to run evals at scale to converge on this kind of behavior: these benchmarks run samples across a pool of hundreds of different types of environments

- Some games are too open-ended to support code play. The customer service game is an example of that, where models are called on every tick of the environment to make a decision (that's the 'decision making' part of the evals which is weighted lowest). Very interesting results but not testing coding ability, just general reasoning.

Not sure what issues you have with models writing C++ vs other languages, but I can imagine all sorts of C++ specific bottlenecks not directly related to the model's ability to reason in the language, like the dependencies, verbosity, extra effort to manage memory, etc. I have only done a little C/embedded work since agentic coding happened but I was pleasantly surprised.

stingraycharles 2 hours ago|||
I think my problem is that I’m not sure I understand whether you evals are testing language abilities or reasoning abilities.

It seems to present results as if they’re testing language abilities, but the problems seem to be reasoning problems.

bdamm 4 hours ago|||
I've found the current cream of the crop to be quite good at resource management. I've sic'd Opus on some very gnarly lambda context bugs and it has directly improved the stability of the product I'm working on right now in a very substantial way. It couldn't quite do it entirely by itself, but with the right nudges here and there, it has absolutely accellerated the debugging work. It is particularly good at analyzing crashes and piecing together the detective work of what preconditions must exist for certain crashes to occur.
isityettime 8 hours ago||||
I would love to see how they do with functional languages and especially Lisps here. I've noticed pretty good performance with Emacs Lisp relative to overall model strength, but I haven't used LLMs to application code in any such languages.

It would also be interesting to see how Python compares to other languages in its niche (Ruby, Perl, Raku).

Thanks for putting this together! It's interesting.

regularfry 2 hours ago|||
I've noticed that with clojure(script) unless you specifically instruct them to keep nesting levels low, they can hit a point where they make a paren placement error and can't debug their way out of it. Although in my case while one model made the error then couldn't find what it had done, a second model that I switched to was then able to identify it and back it out. So I suspect this is a transient weakness in today's models, not something fundamental.
librasteve 2 hours ago||||
I just did a side-by-side with Claude Code Python vs. Raku for DSL use ... https://slangify.org if you are interested.
gertlabs 7 hours ago|||
That's a good idea. Would you rather see Lisp or Scala? Any interest in Prolog? We are trying to be selective to keep the data concentrated, but we will eventually add a couple more, most likely to sample different programming paradigms.
isityettime 40 minutes ago|||
I think Clojure would probably make for a more interesting comparison because its syntax is more different from the other languages currently on there and it's less multi-paradigm than Scala is (it doesn't support OOP, it's more explicitly immutable-first). I think Scala is a lovely and cool language, but I'd be more interested in the Clojure comparison here.

Prolog night be interesting because I bet nobody is trying to train very hard on it, but I'm less directly interested in model performance with Prolog.

1659447091 4 hours ago||||
If you are taking request, I was hoping to see clojure on there.
andai 2 hours ago||
My spider sense tells me the immutable-ness would help with correctness, but I'm not sure how much difference it would make in practice. Would love to see some numbers.

A relative lack of training data might have a bigger effect though.

phillc73 5 hours ago|||
Just last night I was going down the rabbit hole of "what's the best programming language to use for vibe coding." I came to a short list of:

a) Typed Racket

b) OCaml

c) Julia

I would love to see those three added to your benchmarks. And Mistral Medium 3.5 added to the LLM list, please.

gertlabs 5 hours ago|||
Thanks for the recs, we will look into adding some of these, maybe OCaml for variety. I'm not familiar with Racket.

Mistral Medium 3.5 is on there, but you will have to scroll down pretty far to find it (does not perform well): https://gertlabs.com/rankings?mode=oneshot_coding

isityettime 15 minutes ago|||
Racket is a variety of Scheme that grew up as a teaching language, but now also has a few other notable niches as well.

Typed Racket is to Racket as TypeScript is to JavaScript: it adds some additional static checks to an otherwise dynamic language via gradual typing. This pair of languages might help begin answer the question "does gradual typing generally help LLMs, or does TypeScript outperform JavaScript for incidental reasons?".

Among Lisps, I'm most interested in seeing Clojure because it's a language I can see myself using with LLMs at work. But Typed Racket and Racket could make an especially interesting pair because of the gradual typing thing.

I'm not sure whether you want to include them in your project. The kind of selectivity you describe yourself as going for is hard for me, especially since I'm not the one doing the work. :)

PS: Aside from this benchmarking and comparison project: Racket is an interesting language and seems like a good place to start if you want to explore classic Scheme texts (Structure and Interpretation of Computer Programs, The Little Schemer, How to Design Programs) or newer ones that try to teach newer or more specialized ideas (e.g., The Little Typer). You may have to tweak the language a bit to stay faithful to some of those books, but that's something Racket is good at and there are already sources noting relevant differences online.

When a non-programmer in my life expressed curiosity about programming, we ended up starting HtDP together and it's been fun. I think Racket was a good choice for that.

phillc73 4 hours ago||||
Thanks for that, I hadn't scrolled down far enough.

Just want to be sure I'm reading the results correctly... When I compare GPT-5.5 with Mistral Medium 3.5, I see in the tables:

a) Mistral beats GPT in Java and C++

b) It's close for Rust

c) GPT-5.5 easily wins for Go, Javascript, Python and Typescript

Model choice really does appear to be language dependent (assuming I'm reading the results correctly).

gertlabs 4 hours ago||
The deeper you go into the filters (single models, cross correlated by specific languages), the smaller your sample sizes. A known limitation, tbh I doubt Mistral is better than GPT 5.5 at programming in any specific language and probably hit a few lower quality generations by GPT 5.5 by chance (but I could be wrong! We're always adding more samples so data improves over time. We always prioritize largest sample counts for near-frontier models first).
regularfry 2 hours ago|||
What's going on with Qwen3.6 27b? Filtered to Python it comes out at the top of the list, which seems... well, unlikely.
2ndorderthought 43 minutes ago||
Qwen3.6 27b is a really strong model.
andai 2 hours ago|||
Those are some fine languages, but how did you pick them? What was the criterion?
phillc73 2 hours ago||
The initial criteria was strongly typed and functional first. Using an LLM for answers, of course, that returned me a list that looked like:

- Haskell

- OCaml

- F#

- Scala

- Gleam

- Purescript

- Grain

- Idris

Then I asked if there were any Schemes or Lisps that met the initial requirements, which added a bunch more options (Typed Racket, Typol, Elm, ReScript etc).

Then I asked about Julia specifically, as it's a language I'm already reasonably familiar with and knew that it's possible to write it with static annotations.

Next I started filtering the list based on additional criteria; didn't want to target a JS compilation target, performance, size of package ecosystem, tooling, community, learning curve (I do want to review and understand the output).

There were a bunch of follow-up questions over a few hours of prompting, reading and a couple of beers. All this resulted in the shortlist of OCaml, Typed Racket and Julia.

Julia pretty much remains in there, even though it doesn't really meet the strongly typed initial criteria, based on my familiarity, the ecosystem especially for AI/ML tasks and performance factors.

I know zero about OCaml and find the thought of learning it a bit daunting. Typed Racket seems more approachable anyway.

fulafel 6 hours ago||||
What would comparing rates across languages tell in the context of this benchmark? Are the tasks the same or robustly difficulty-normalized across the languages?

Also somehow the 2 language comparison graphs (avg percentile and success rate) rank Python in dramatically different positions, with Python outranking Rust and Java in the success rate. What does the avg percentile mean in this context?

gertlabs 3 hours ago||
Success rate measures the amount of code submissions that played the game/environment without failing (compilation, breaking game rules, violating sandbox, etc.), so it makes sense Python would do better there.

Percentile compares only the submissions that didn't hard-fail. So they are a bit different, and we incorporate them both into the combined score.

robot-wrangler 5 hours ago||||
> Data here: https://gertlabs.com/rankings?mode=agentic_coding

Oh wow, we got "tribal domination", "market simulator" and "adversarial customer service". I don't know what those are but it sure sounds like big torment nexus milestones

Maybe we could at least play nicer games like hackenbush and act surprised when there's some wicked use-case that's isomorphic.

EDIT: Ok fine. I like "Rubik's Cube Chess" a lot. Never heard of it, is this analyzed formally at all? Hard to search for since there's tons of collisions

js8 6 hours ago||||
The LLMs are generally still pretty bad at (deductive) reasoning. IME they go along more with the things like variable names and comments than the actual program logic (it would be an interesting experiment to compare LLM's understanding of three identical programs with different identifiers, one with normal identifiers, one with obfuscated identifiers, and one with deliberately misleading identifiers). I also think this particular comparison comes down to typing, which helps to avoid LLM's reasoning go astray.

When we reason we need to typically propagate the constraints to arrive at a solution to these constraints. I think the best language to reason in could be something like Lean, which allows both constraints and actual code to be expressed at the same time. Although this might not be the case for current LLMs, as I explain above.

by364 4 hours ago||
wait till you look inside a neural network and realize they're incapable of deductive reasoning! amazing how many devs that talk about "AI" would probably have a hard time telling apart deductive and inductive reasoning.
js8 4 hours ago|||
That's actually untrue. Yes, training a neural network is mostly inductive reasoning process. However, the ability of LLMs to reason deductively (as a chain of thought, although it's probably not the only mechanism) is an emergent phenomenon, rising up from the training it on data and problems that exhibit deductive reasoning.

But of course, because the deductive reasoning is inductively taught, there might be various shortcuts which compromise the soundness of deductive reasoning. That's why my claim - LLMs are not as good at it as other algorithms, although they have many other strengths that make up for it.

kelseyfrog 4 hours ago|||
How so?
riedel 4 hours ago||||
My feeling is that for agentic tasks this is not only language design but also LSPs, error messages and static analysis capabilities that dominate the benchmarks. It would IMHO be interesting to look into better subsets of python and style/rewrite techniques as well as alternative linter and their effects on performance.
kevinautumn 3 hours ago|||
A strict compiler is basically a free feedback loop for the LLM.
andai 2 hours ago||
Also the human. (I like being told about my bugs when I write them, instead of at some generally much more unpleasant moment in the future.)
andai 2 hours ago||||
But then why does JS score 50% better? (Almost identical to TypeScript.)

Actually, JS can get a surprising amount of "intellisense" as well. Not sure if that was used here though.

gtrealejandro 4 hours ago|||
[dead]
bushbaba 8 hours ago||||
Cool to see my hunch be backed by data. Python is a scripting language with OOP bolted on. Means there’s not really a styling consistency that other languages have, with things tending to look like PHP, a collection of various scripts that invoke one another
toxik 5 hours ago|||
Python was designed with objects in mind from day one.
regularfry 2 hours ago||
"Designed" is doing a lot of work here. There are clearly bits that are just bolted on because they didn't want to change the syntax.
nsbk 4 hours ago|||
EVERYTHING in Python is an object. I’m not sure how that could have been bolted onto the language
w0m 7 hours ago||||
Huh. This surprises me. Digging, it seems it looks like it comes down to interpreted + dynamically typed vs compiled and statically typed.

TIL. If i were to start a truly vibe project; Go would have a significant leg up.

dnautics 7 hours ago||
and yet dynamically typed elixir wipes the floor with go.

https://github.com/Tencent-Hunyuan/AutoCodeBenchmark/blob/ma...

bontaq 5 hours ago||
LLMs get ridiculous with elixir, especially with the repl, runtime, and ability to hot reload / directly test functions. It's really surprising to me it hasn't caught on more but I guess you have to see it to believe it.
cultofmetatron 2 hours ago||
built my startup in elixir and can concur. elixir has a relatively consistent syntax that makes for a pretty good target for llms.

In my opinion, the only thing holding elixir back as an llm deliverable is that there's not as much training data for llms to work with.

Of course if we had a new AI that could be trained on a minimum of existing training data, common lisp would absolutely beat out everything else. everything you mentioned about elixir (repl, runtime, and ability to hot reload / directly test functions) are possible and were invented in lisp with an AST instead of a syntactic language as the ultimate build artifact. CL lets you recover from exceptions and rewind the stack before reloading your fixes and continuing. I can't even fathom the workloads an LLM could conceive of working with that.

hooloovoo_zoo 4 hours ago||||
Mm, the code is constrained to run inside a game 'tick'?
andai 2 hours ago||||
I thought it might have to do with the type system, but JavaScript type system is atrocious and it scores about 50% higher. So my theory does not make much sense.
rossjudson 8 hours ago||||
My standard joke here:

Q: Say, what does this Python code do?

A: Nobody f&%^ing knows.

thfuran 8 hours ago||
That’s Perl.
altmanaltman 7 hours ago||||
Hey they said it had a lot of training data, not necessarily high-quality python code training data.
ricardo_lien 7 hours ago||||
This surprised me, but I can understand it - Python sucks in many ways lol.
goodmattg 7 hours ago|||
[dead]
dillon 4 hours ago|||
I had an itch to give Perl another go after a 5 year hiatus. I wanted a super simple way to spawn a proxy I was building in Go, along with writing various integration tests. I used Claude Code to write the bulk of it and found Claude to be remarkable good at Perl. I told Claude to only use what’s built into Perl’s standard library rather than reaching for anything in CPAN. Turns out everything from HTTP clients, TLS and JSON are all builtin which makes it a very stable and easy way to replace what I would normally have implemented in shell scripts. My theory is because Perl hasn’t changed all that much and has a ton of training data that Claude is actually quite good at Perl for cases where you might think to write shell scripts.
hiAndrewQuinn 3 hours ago|||
Many are saying this! https://til.andrew-quinn.me/posts/llms-make-perl-great-again...
bensyverson 9 hours ago|||
Just use Go. LLMs have seen a ton of it, they write it well, it compiles practically instantly, and it has all the advantages of a typed compiled language.

I created a big Python codebase using AI, and the LLM constantly guesses arguments or dictionary formats wrong. Unit tests and stuff like pydantic help, but it's better to avoid that whole class of runtime errors altogether.

mbreese 9 hours ago|||
That’s what I’ve settled on. Python is so flexible that there are a million ways to organize code, pass arguments, etc. If you already have a code base to work from, an LLM can make new code in the style of the old code. But a fresh project? Once you get to a certain level of complexity it quickly can turn into write once, read never code (even if the code is passing tests).

This is where I’ve found that a compiled, strongly typed language (any one really) works well with an LLM. With the little bits of friction that is part of writing a language like Go, the LLM can produce pretty decent (and readable) code.

isityettime 8 hours ago||
TIMTOWTDI strikes back.
shepherdjerred 8 hours ago||||
Why use Go when you can use Rust?
wiseowise 3 hours ago|||
So I can test my feature today instead of waiting until it finishes compiling tomorrow.
baq 1 hour ago||
this is the top reason for a reasonably complex project, but it can be worked around by preplanning crates.

the other reason is if you really want async as is in vogue nowadays, function coloring - but this is rapidly becoming irrelevant, see article.

wiseowise 34 minutes ago||
> but it can be worked around by preplanning crates.

Maybe if you're working alone.

bfung 8 hours ago||||
1. Amount of Rust training data isn’t as much as Go.

2. Golang syntax and style is very verbose yet simple. There’s not as many options nor programming language to domain mapping needed as in Rust. Leads to needing less sophisticated LLM to spit out Golang than Rust successfully and efficiently.

adastra22 5 hours ago||
This must really depend on your niche. I assume you do web stuff or something? Good luck finding any golang examples in a lot of other fields. Rust, on the other hand, is taking over the world in systems programming.
krilcebre 3 hours ago|||
Been reading and drinking that kool-aid for some time until I realized it's just an internet bubble mumbo jumbo. Majority of systems are still written in C and C++, and will be for unforeseeable future.
coldtea 3 hours ago|||
>Good luck finding any golang examples in a lot of other fields.

There are go examples (and full blown programs) for anything, from servers to Kubernetes and Docker.

bensyverson 8 hours ago||||
In short, compile times and a more full-featured stdlib
Aerroon 8 hours ago||||
Doesn't Rust have long compile times? Does Go suffer from the same problem?
bdamm 3 hours ago|||
One of the design goals of Go was to be fast to compile. And they achieved it.
adastra22 5 hours ago|||
Go famously has stupidly fast compile times.
DeathArrow 5 hours ago||||
Because LLMs are better at Go? And because some people understand Go code easier and they might want to look at the code?
Alejandro2026 8 hours ago||||
why,i have same question
bionhoward 8 hours ago||
I’m heavy into rust and never really use golang, but one big benefit of go over rust is compile times are significantly quicker, which could be more fun if you’re running CI checks 50 billion times
coldtea 3 hours ago||
>which could be more fun if you’re running CI checks 50 billion times

Even running them 5 times it's WAY more fun

up2isomorphism 3 hours ago|||
why use Rust when you can use Zig?
2ndorderthought 38 minutes ago||
Why use zig when you can use odin?
morningsam 2 hours ago||||
>the LLM constantly guesses arguments or dictionary formats wrong [...] it's better to avoid that whole class of runtime errors altogether.

Use Mypy in strict mode and run it in the post-turn hook of your LLM harness so the LLM has no choice but to obey it. And don't use overly general dictionary types when the keys are known at development time; use TypedDicts for annotations if you must use dicts at runtime.

mountainriver 8 hours ago||||
Why? Go has a GC, is basically incompatible with C and very limited overall
Alejandro2026 8 hours ago|||
Go's limited syntax is actually a feature here,because it stops the LLM from trying to be too clever
spongebobstoes 7 hours ago||
LLMs use `any` types, `recover`, `init`, and other weird warts of golang

rust is a better language in every way for LLMs: more precise typing, better compiler errors, fewer performance footguns, no race conditions, clear interface definitions and implementations

golang is easier for humans to quickly get productive, but the language is lacking in helpful features for an LLM

baq 1 hour ago||||
'incompatible with C' isn't a serious problem nowadays and won't be a problem at all in a couple years.
badc0ffee 6 hours ago|||
CGO exists.
trimbo 8 hours ago||||
Yup, adopting Go is exactly what I've done too.

Typed, garbage collected, fast to compile and run, stdlib that includes just enough to work out of the box. I really don't like writing it by hand but for the LLM it's perfect.

hirvi74 9 hours ago||||
But what is the selling point for Go? I get that it is allegedly hailed to be a simple language with basically no batteries included, but why is that a selling point? Does Go excel at anything no other language does?
sly010 6 hours ago|||
No batteries!? Go has a huge stable standard library no other language even comes close to. Built in tooling for unit testing, performance testing, debugging, code formatting, package management, etc. And most go binaries can be compiled statically so libc is not even a dependency. Golang is the definition of batteries included.
coldtea 3 hours ago|||
>Go has a huge stable standard library no other language even comes close to

Well, Java and Python do.

walthamstow 1 hour ago||
Yet the first thing most people do before making a HTTP request is pip install requests
coldtea 1 hour ago||
Yet, a nicer request wrapper is not the be all end all of batteries, and Python covers a huge spread of libs
wiseowise 3 hours ago|||
> Go has a huge stable standard library no other language even comes close to.

Java, C#, Python, Node.

intelVISA 1 hour ago||||
I really don't like the lang itself but nobody will deny it has a very strong ecosystem and stdlib for handling around 95% of many well-solved problems you are likely to encounter.
CodesInChaos 2 hours ago||||
1. It has first-class co-routines, so supports high concurrency without having to deal with async bullshit

2. It produces a dependency-less statically linked binary

3. Duck typed interfaces give you static typing with minimal ceremony. They are implemented even for types outside your own code base, which is a common pain point in Java or C#.

4. It compiles quickly

coldtea 3 hours ago||||
Go has a very full featured standard library.

It's simple (do you really ask why that's a selling point?)

It's fast to compile.

It's fast to run.

It's good with parallelism.

It has myriads of examples, and LLMs can pick it up well too.

It has good backing.

It has good tooling.

It's fun.

It statically compiles to a trivially deployable binary.

It's excellent at cross compiling.

It has good adoption.

clintonb 6 hours ago||||
I picked Go because it tends to use fewer resources than Node.js, and startup time is quite fast.
enneff 8 hours ago||||
For one thing it’s statically typed and has many fewer foot guns than Python, so the llm-produced code is more likely to do what you expect.
wiseowise 3 hours ago|||
Python has much better type system than Go, I don’t know what you’re on. With Trio it has a better async capabilities too.
shepherdjerred 8 hours ago|||
Go is statically typed but the type system leaves much to be desired.

Go’s benefit are primarily around simplicity, readability, and concurrency.

coldtea 3 hours ago||
>Go is statically typed but the type system leaves much to be desired.

Not that much. Looking at Rust or Haskell complexity, I don't really desire it.

pylotlight 8 hours ago||||
Performance? Second only to rust and other lower level langs. Surely you don't need this spelled out for you...
nvader 8 hours ago|||
Not just performance, but static typing and prevalent in the training data/easy for LLMs to reason about.

Of course, your response admits, "second to Rust", which I am guessing is an unspoken question in the grandparent's mind.

za3faran 7 hours ago||||
Java and C# are there and faster.
DeathArrow 5 hours ago||
Yes, but kids these days only consider JS, Python, Rust and Go.
hirvi74 8 hours ago|||
If performance is the main difference, whatever that means, then basically Go should be reserved for when Rust and other lower level langs cannot be used due to some other constraint? Are we mainly talking about performant Web backends?

Say I am building some app that I know will be CPU-bound, why choose Go over say... Swift?

coldtea 3 hours ago|||
>If performance is the main difference, whatever that means, then basically Go should be reserved for when Rust and other lower level langs cannot be used due to some other constraint?

Or when performance is the main but not the only difference, and there are many other benefits.

>Say I am building some app that I know will be CPU-bound, why choose Go over say... Swift?

Because unless you're building for macOS/iOS, Swift is really a no-go, with lackluster support for other platforms. Plus slow to build and convoluted.

overfeed 5 hours ago|||
> why choose Go over say... Swift?

Language religious wars are silly: you should choose a language based on your constraints and personal tastes. If there's no clear advantage of one language over another for a given task - then all the options are viable, pick one and get on with solving the problem.

DeathArrow 5 hours ago||||
>I get that it is allegedly hailed to be a simple language

That might be its core feature if you do agentic coding.

chickenman_98 9 hours ago|||
I think that’s sort of the selling point no? It’s really boring. It has like -10 keywords, compiles insanely fast, and has a concurrency model that’s easy to use and read. LLMs are great at using Go tooling to sanity check along the way. It’s easy to write shitty Go but it’s really pleasant to work with if you find those things compelling.
khimaros 8 hours ago||
don't you worry about garbage collection?
camdenreslink 8 hours ago|||
If you were using Python, then probably not.
bensyverson 7 hours ago||
haha exactly. I’m coming from Swift, and I don’t want to go back to manually releasing objects like I used to in ObjC, let alone reason about lifetimes.
mellow_observer 4 hours ago||||
What's the big issue with GC nowadays? It has mattered to me exactly once in decades and it was still manageable anyway by using a more low level style in a hot loop. I see very few usecases where GC actually matters and for those rare few cases it was not like you were using python beforehand anyway
coldtea 3 hours ago|||
Why the hell would he "worry about garbage collection"? That kind of thing is a cargo cult fear.

Garbage collection is not an issue for 99% of programs. And for those that it is, there are ways to mitigate the issue (e.g. there are extremely high performance trading system written in Java, where every last sub-millisecond counts).

Blanket fear of GC reminds me when new programmers learned about how assembly is lower level and can be faster, and wondered why everything is not written in assembly.

DeathArrow 5 hours ago||||
>Just use Go. LLMs have seen a ton of it, they write it well, it compiles practically instantly, and it has all the advantages of a typed compiled language.

Or any of the faster typed languages you are most comfortable with, as you might need to look at the code some times. LLMs are great at writing and understanding C# and Java.

arw0n 4 hours ago||
Also there are still considerations like domain, team expertise, org ecosystem etc. to consider. I love to use Rust for most things, but now I'm working with an org that primarily has expertise in Java, and I'm not going to rock the boat for barely any reason. Python is also still useful for most ML stuff, and Django is quite a pleasure to work with (although it wouldn't be my first choice).

The great thing about LLM-assisted coding is that an experienced software engineer can acquire decent familiarity with a language quite quickly. And then has a useful sparring partner for understanding and using the quirks and features of a new language.

gmueckl 10 hours ago|||
Training data can't be the whole answer. LLMs are really good at translating to different programming languages. This makes sense, given that they are derived from text translation systems. I'm getting great results in languages with comparatively small bodies of freely available code. The bigger hurdle is usually that LLMs tend to copy common idioms in the target language and if it is an "enterprise-y" language like Java or C#, the amount of useless boilerplate can skyrocket immediately, which creates a real danger that the result grows beyond the usable context window size and the quality suffers.
dnautics 7 hours ago|||
> Training data can't be the whole answer.

Absolutely correct. Anthropic showed that 250 examples can "poison" an LLM -- independent of LLM activation count.

lanyard-textile 10 hours ago||||
Very true.

I have to steer models hard for C++. They constantly suggest std::variant :P

socalgal2 9 hours ago||
is that bad?

Godbolt got a 2x speed improvement switching from what he thought was a good fast impl to std:variant

https://www.youtube.com/watch?v=gg4pLJNCV9I

jryio 9 hours ago||||
In higher dimensional vector space, yes it can.

Dimensionality gets bizarre in 1000-D space. Similarity and orthogonality express themselves in strange ways and each dimension codes different semantic meaning.

Therefore, if the training data is highly consistent you are by definition reducing some complexity and/or encoding better similarity.

In Go the statement

    result, err := Storage.write(...)

Is almost always going to be followed by

    if err != nil { ... }
In a highly dynamic language you may not get

   try { Storage.write() } catch (error) { ... }
Unless explicitly asked for.
dnautics 7 hours ago|||
It's a little bit old, but challenge you opinions about what matters for LLM agentic coding:

https://github.com/Tencent-Hunyuan/AutoCodeBenchmark/blob/ma...

za3faran 7 hours ago|||
> In a highly dynamic language you may not get

Being dynamic is secondary. A language that uses exceptions for errors does not always need to surround every try with a catch if the code doesn't need to. You have a top level handler that would catch everything.

chromacity 10 hours ago|||
> LLMs are really good at translating to different programming languages.

...for which ample training data is available.

> This makes sense, given that they are derived from text translation systems.

...for languages with ample training data available.

Yes, LLMs can combine information in novel ways. They are wonderful in many respects. But they make far more mistakes if they can't lean on copious amounts of training data. Invent a toy language, write a spec, and ask them to use it. They will, but they will have a hard time.

mbreese 9 hours ago|||
I have a language I wrote for processing data pipelines. I’ve used it for years, but I can count the number of users on one hand. I wrote it partially to learn about writing a scripting language, partially because Nextflow didn’t exist yet. I still use it now because it works much better for my way of processing data on HPC clusters.

The only code that exists on the internet for this is test data and a few docs in the github repo. It’s not wildly different from most scripting languages, from a syntax point of view, but it is definitely niche.

Both Codex and Claude figured it out real fast from an example script I was debugging. I was amazed at how well they picked up the minor differences between my script and others. This is basically on next to zero training data.

Would I ask it to produce anything super complex? Definitely not. But I’ve been impressed with how well it handles novel languages for small tasks.

lmm 10 hours ago||||
That might be an argument for not using a novel homebrew programming language. But it's not an argument against, like, any top-100 or even top-1000 programming language, which will be adequately represented in the training data.
ambicapter 9 hours ago||
It is if more training data results in better performance. In which case, GP will continue to use the language that is likely to have the most training data available.
lmm 9 hours ago||
> It is if more training data results in better performance.

Sure. But given the relation with translation systems, it seems far more likely that there are diminishing returns to larger volumes of training data.

agentultra 9 hours ago|||
They are also good at generating plausible code. The kind that has no obvious bugs in it. I wouldn’t be surprised if humans in the loop over report success with these tools. Combined with decision fatigue… it’s not a good recipe for humans making good decisions.

An experienced Rust developer is going to be in a better position to drive an agent to generate useful Rust code than a Python programmer with little or no Rust experience. Not sure I agree with the author that everyone should just generate reams of Rust now.

At least if your get paged at 3am to fix the 300k AI-generated Django blog you’ll have a chance at figuring things out. Good luck to you if Claude is down at the same time. But still better than if it was in Rust if you have no experience with that language.

not2b 10 hours ago|||
That would matter if we were asking the AI to generate code open-loop: someone probably already wrote something close to what you asked for in Python. But if the agent generates code, tries to compile it, sees the detailed error messages and acts on those messages to refine the code, it's going to produce a higher quality result. rustc produces really good diagnostics. And there's a lot of Rust code online now, even if there's so much more Python and Javascript/Typescript.
ambicapter 9 hours ago|||
LLMs don't actually semantically parse the error messages. They will generate the most likely sequence resulting from the error message based on their training data, so you're back to the training data argument.
not2b 5 hours ago|||
They process those error messages in the same way that they process your instructions about what code to generate. It is just more commands.
neutronicus 9 hours ago||||
Perhaps the training data about what compiler diagnostics mean is particularly semantically rich training data.
Tarq0n 7 hours ago|||
Of course they do, error messages get tokenized and put into the context window just like anything else. This isn't a Markov chain.
hansvm 5 hours ago|||
Except the presence of errors, mistakes, contradictions, and doubling-back causes LLMs to have substantially worse output, especially without dedicated sub-agents who have been instructed about that deficiency and know to process that kind of crap into better prompts to insert into a different LLM with pristine, error-free context. Without hard numbers we're both just pissing into the wind, but it's entirely plausible that the higher rate of errors matters more than the fact that those errors are more ergonomic. Anecdotally, my LLM work is a _lot_ more productive when I have it draft the thing in Python and translate it into Rust since it wastes so much time on the tiniest of syntactic mistakes.
onlyrealcuzzo 9 hours ago|||
I built a programming language, and LLMs can code phenomenally well in it.

I don't think the training set matters that much, since there's no way they have my language in their training set!

Programming languages have a lot in common. Python is kind of odd when it comes to languages.

zuminator 7 hours ago||
If the training data is basically irrelevant, then an LLM should be able to iteratively improve the programming language it uses, resulting in a custom language optimally designed to maximize its own coding ability. The source code might not even be human readable natively, just translated into pseudocode on an as-needed basis.
onlyrealcuzzo 6 hours ago||
> If the training data is basically irrelevant, then an LLM should be able to iteratively improve the programming language it uses, resulting in a custom language optimally designed to maximize its own coding ability.

I won't be surprised if one day they do.

At least in their current form, I don't think they can independently design a language that is so much better than other available ones that it makes sense for them to use it.

There's a very good language for almost every use case already, designing one better than the ones already available is a VERY tall order.

It's almost like these languages aren't designed by morons, but built by teams of geniuses over a decade instead.

It's taken me 6 months of heavily steering an LLM to build a language that is not yet even ready for production use.

Maybe I'm the one slowing the LLM down. But it certainly does not seem that way.

The key to a good language for them - from my experience - is maximum expression plus minimum global complexity.

Anything that makes you manage memory lifetimes & memory safety is inherently unfriendly to LLMs - that's globally complex.

All scripting languages allow spaghetti aliases that let you hack your way into oblivion - and LLMs gladly ride that gravy train to hell.

Rust excels here, because it prevents the worst and is WAY more expressive than most people think.

Go has arguably the best runtime ever built, but it's not very expressive, and it still has a lot of problems from C and scripting languages - I don't think these types of languages will be the ones people chose to write code with for LLMs in the future.

impulser_ 7 hours ago|||
People really need to stop assuming more training data the better. This is not how it works. LLM thrive off consistency.

Go for example has significantly less training data than Python, but LLMs are the best at it. Why? Go is often written the same. You go from project to project and the code looks all the same. There only a very few ways to write Go.

btown 10 hours ago|||
Also, every single interpreter error has an entire corpus of StackOverflow-esque fix suggestions alongside it, and the model has been fine-tuned to minimize such errors on the first try. This hasn't been done for more obscure languages. You'll likely take more turns, on average, to get a working output, even if your problem is fully verifiable via test input/outputs - and if it's not verifiable, you don't want the "attention" of the model focused on syntax rather than the solution.
ruszki 8 hours ago||
There is no "entire corpus of StackOverflow-esque fix suggestions" about anything which is newer than a few years. I'm using cutting edge Android frameworks all the time. Yet, LLMs fix problems even when Google/Kagi has zero answers, which happens more often than not. We are way over this requirement.

I especially found that there is no difference between languages based on that. All generated code's architecture is terrible, if you don't actively manually maintain them all the time. If you don't have a few 10s of thousands of finely architected code already in your codebase, from which they can understand how it should be really done. And the reason, I think, is quite simple: the average code on the internet - regardless of market penetration of the given language - is simply bad.

krzyk 2 hours ago|||
With AI it is important to catch errors/hallucinations early, static typing helps with that.

So languages with dynamic typing might hide some errors until runtime, static typing one could catch that during compilation.

With dynamic ones you need way more tests to cover some of the scenarios that compiler does for others.

And there is significant amount of code written "for ages" in languages that were there longer, like C, C++, Java (yes, I know that python is quite old, older than Java - 1991).

robot-wrangler 9 hours ago|||
> I could write in brainfuck with ai, but I presume, wouldn’t get the same results than if going with python.

https://esolang-bench.vercel.app/

Tarq0n 6 hours ago|||
The conclusions seem overly broad. Just because these languages are Turing complete doesn't mean they aren't massively hampered by expressiveness and amount of batteries included. To attribute all of this to training data memorization is premature.
robot-wrangler 6 hours ago||
Oh this is a very damning paper. Using simple languages from their definitions alone is a great proxy for studying truly out-of-distribution reasoning. Also just for following simple rules/instructions correctly, because a simple enough language is practically just a grammar. This paper is terrible for anyone who wants to make the case that models can do those things well.

To the extent today's AI can reason, add this to the pile of evidence that you definitely need a harness. Counter to what you hear.. that seems true for SOTA and frontier, not just toy models. Lots of people were saying many years ago someone should test exactly this, because it's obvious. Someone at megacorp probably did try and decided not to publish because they thought it was bad optics.

_boffin_ 9 hours ago|||
and this sums it up right here.
tengbretson 9 hours ago|||
Admittedly, I have very little experience with LLM-assisted Python. However, based on the severe degradation in output quality I have seen from an LLM working with plain JavaScript as opposed to TypeScript, I can't imagine choosing to start a project in Python at the moment.
fwip 7 hours ago||
It does seem like LLMs write better Python when told to use type annotations, especially when coupled with a linter.
aix1 6 hours ago||
I've been coding professionally in Python for about twenty years (alongside, at different times, a dozen or so other languages).

I find that Claude can write great modern Python more or less out of the box, with minimal style guidance from me. I do have to nudge it from time to time to not do silly things, but overall it's really rather good.

jryio 10 hours ago|||
I wrote about the meta thesis of programming languages in the training data here

https://jry.io/writing/use-boring-languages-with-llms/

_boffin_ 10 hours ago||
Please distill instead of having me navigate off site. Include link for additional info.

edit: side -> site

ocschwar 8 hours ago|||
Seems to me these LLMs have a critical mass of Python training data and Rust training data, so there's no advantage for Python there.

So as the article points out, an iterative process that catches the mistakes at compile time is much more suited for an AI than one that catches them at runtime.

aaa_aaa 2 hours ago|||
For some people reducing infra costs matter. Python is very very slow, even if it uses native libs.
Eridrus 9 hours ago|||
The LLMs are actually worse at generating Python than other langs, hypothesized due to quality of training data lol.

I still read the generated code, so I'm not quite willing to give up on Python yet though.

imron 3 hours ago|||
Large volumes of training data is a blessing and a curse, especially when you consider who wrote it.
mountainriver 8 hours ago|||
I loved from writing all my code with LLMs from Python to Rust. I’ve seen absolutely no difference, most of the time I couldn’t even tell you which it’s writing in.

My programs are faster and more reliable than they’ve ever been.

osigurdson 8 hours ago|||
I wouldn't say I get worse results with Go than I do with Python.
markboo 8 hours ago|||
that's right, we dont need to care about a lang, same as we dont care about Map when FSD promise its already end to end optimal one.
bluegatty 9 hours ago|||
There's enough training data on the other langs.
te_chris 3 hours ago|||
1) the models do generalise so concepts translate 2) languages with more opinionated semantics and a better, more coherent community seem to be better. Python is a broad shitshow with multiple ways to achieve the same thing. Elixir is tight and focused. Claude is much better at elixir.
bmitc 8 hours ago|||
> Read the first few comments and surprised I didn’t see it, but training data. The voluminous amount of Python in the training data.

That's actually part of the point. Almost no one writes types for Python and has complete type compliance. So all that training data is people just yoloing Python, writing a bunch of poor code in it.

I honestly can't believe any experienced software engineer would decide to build systems in Python these days.

th1sisoldnews 9 hours ago|||
[dead]
faangguyindia 10 hours ago|||
No if that mattered you'd write everything in html and css. Because that has way more training data.
weird-eye-issue 10 hours ago||
Those are not programming languages.
goatlover 7 hours ago||
WASM then.
weird-eye-issue 6 hours ago||
That's more of a compilation target than a programming language and I don't really see the relevancy...
gerdesj 9 hours ago||
"I could write in brainfuck with ai"

Well, go on and do the experiment! Perhaps LLMs can right code as well in BF as Python but I don't recommend it because hallucinations are really hard to notice in BF.

If you are going to worry about high level computer languages and AI, you are going to have to start with getting to grips with machine code and assemblers and that. Once you know how say some Python code ends up being processed by your laptop CPU(s), then you will know when BF might be best!

_boffin_ 9 hours ago||
> Frontier models score ~90% on Python but only 3.8% on esoteric languages, exposing how current code generation relies on training data memorization rather than genuine programming reasoning.

https://news.ycombinator.com/item?id=48100433#48102985

bob1029 43 minutes ago||
Python might still be the best option if your goal is to perfectly one shot the solution and minimize token usage as much as possible.

However, if you are willing to stub your toes, retry, and pay more money, an entire new world opens up. Languages like python seem to fall apart faster in extremely large projects.

I've got a collection of interdependent .NET codebases with about 50 megs of raw source between them. Having C# be strongly typed seems like an essential backbone for keeping everything on rails in my agentic scenarios. The code edits have been flawless for several months now. I've got successful apply_patch usages that touch 20 files at a time. LLM code editing performance might be mostly language agnostic once we compensate for the strictness of the type system. More specifically, how much useful information is returned at compile time.

Compile time errors and warnings are probably the most powerful alignment mechanism available. Some ecosystems allow for you to specify your own classes of errors and warnings. I think tools like Roslyn Analyzers might be more powerful than unit tests in this application. Domain-specific compilation feedback feels like the holy grail to me.

https://learn.microsoft.com/en-us/visualstudio/code-quality/...

xnorswap 29 minutes ago|
Yes, roslyn is like a super-power for agentic coding.

At work we have a custom disposable data provider that gets into trouble if you use async/await inside it.

Traditionally this was enforced through oral history, but with agents this needed addressing.

It was actually really easy to write a custom analyzer which can pick up whether `await` is ever called within the scope of this provider and fail the compilation.

The only thing you have to be careful of, is making sure the LLM doesn't sneak in some "ignore Rule CUST001" pragma blocks, but it's mostly good about not doing that, unless it thinks you're "prototyping", in which case it seems to treat errors as inconveniences to be worked-around.

fbrncci 8 hours ago||
Why Python? Because I have written it for 10+ years, know how to debug it and I can smell it within 10 seconds of the agent writing code if it does something that is going to end in a huge foot gun. With any other language, not so much; I would need to relearn a lot. So I am going to be preferring python; where even with the speed that AI crams out code, I still feel somewhat in control. If I did this with Go or Rust, then it would feel more like "vibecoding" than AI assisted programming, just yolo the whole product.
_waqas_ali_ 4 hours ago||
I started writing rust in this agentic era and all my prior experience with other languages still carries over and helps me spot code smell and bad architecture.

I had to learn the memory safety bits because I had no idea “what’s right” but rest of it was smooth.

Syntax fades away, you get to focus on higher level stuff and end up exploring new pathways; give it a try, you might be pleasantly surprised how much of your experience is transferable.

bambax 4 hours ago|||
Exactly that. Plus I need to be able to make adjustments here and there without the whole thing collapsing on me.

If you know Rust inside and out (if, as one example in TFA, you co-wrote The Rust Programming Language!) then sure, why not Rust?

But if not, it would be unwise.

That said, I use AI to write small C utilities that compile and run on any Windows version starting with Vista (which neither Go nor Rust can do). Yet I'm not a C programmer; but I can read and adjust it when needed, and the whole thing does work.

do_anh_tu 7 hours ago||
This is what I experienced as well, I can smell BS from AI generated code right from few lines it wrote in Python, so that why I keep using Python for most of my projects.
oxag3n 10 hours ago||
If AI writes your articles, why use brain?
lbrito 8 hours ago|||
You sneer but the models are much better now than last month and token costs are down! LLMs are just like compilers for the brain!

/s

abalashov 10 hours ago|||
[flagged]
senko 3 hours ago||
And a non-sequitur.
th1sisoldnews 9 hours ago||
[flagged]
niek_pas 13 hours ago||
Bit off topic but why in the world are people still posting on medium? The reading experience is abhorrent; I couldn’t even finish reading this article before a full screen popup literally blocked the sentence I was reading.

Is there some incentive I’m not seeing?

xrd 12 hours ago||
They have made an honest attempt to pay writers. It's a different model than substack, but that's why.

I look at it the same way I look at pay walls for newspapers. I don't like them but I understand why they are there.

raincole 3 hours ago||
Which is why it failed though. It turns out people won't pay one dollar to read an article like "If AI writes your code, why use Python?"

The situation is very unfortunate. We had perhaps once-in-a-lifetime chance to solve micropayment but we fucked up (crypto).

iLemming 11 hours ago|||
> The reading experience is abhorrent

Nothing you read in the browser can provide ultimately great and hands-down the best reading experience equally for everybody - the modern web model is inherently at odds with that. A plain HTML page with no CSS is a near-perfect reading experience. The problem is that almost nobody ships that, because the web also became a publishing platform where authors compete for attention. A plain-text protocol under user control is closer to "best reading experience for everybody". The web could be that. It mostly isn't.

I stopped trying to read long articles in the browser. Why would I do that, if I can easily extract all the relevant, plain text (and even structured one) and read it in my editor instead? Where I have control over fonts, colors, navigation, etc. The browser is a delivery mechanism, not a reading environment. Treating it as one is a habit, not a necessity.

Long ago I stopped trying to type anything longer than three words anywhere but my editor. Of course, why wouldn't I? It already has everything I need - spellchecking, thesaurus, etymology lookup, translation, access to all my notes, LLM integration, etc. Try it one day - it's enormously liberating experience. And then maybe you'd stop reading long texts in the browser as well.

autoexec 9 hours ago|||
> A plain HTML page with no CSS is a near-perfect reading experience. The problem is that almost nobody ships that, because the web also became a publishing platform where authors compete for attention.

They don't ship it because of greed. They only want your attention because of greed. They only infest their website with ads because of greed.

> The browser is a delivery mechanism,

http is a delivery mechanism. The browser is a user agent. It's supposed to display content according to the preferences of the user. If your browser isn't doing that for you it's time to find a new browser or beat the one you have into submission until it behaves. "reader mode" is a useful compromise.

iLemming 9 hours ago||
> It's supposed to display content according to the preferences of the user.

That's right, the original idea was exactly about that, but like I said - in practice that is no longer a thing.

Using the editor for reading any content is enormously underrated. Check this out - this entire thread opens in my editor as an outline with nested structure. Meaning that all the regular outline operations are available to me - folding, imenu (interactive TOC), narrowing, quick search, contextual search, pattern-based search, sparse-tree search.

Extracting all the URLs on the page while ignoring HN-internal ones is a single keypress for me - there's a link to a YT video - I can watch it, controlling the playback directly from my editor, I can extract transcript and summarize it with an LLM request - all without opening new tabs, without switching focus.

I can narrow on the sub-thread, or select a region and export only that part to a pdf, gfm, html or LaTeX. The possibilities are virtually unlimited. A web browser - even with three hundred different extensions won't let me have complete and utter control over plain text - it's just not designed for anything like that.

polaris64 3 hours ago|||
I'm assuming you use Emacs? Are you using a special "hacker news mode" or something more generic?
uxcolumbo 3 hours ago|||
Can you share your setup how to achieve what you described? I'm curious.
someguyiguess 9 hours ago|||
> Why would I do that, if I can easily extract all the relevant, plain text (and even structured one) and read it in my editor instead?

Because that’s an enormous pain in the ass. Not scalable at all.

iLemming 9 hours ago||
I beg to differ. You clearly misinterpret what I'm talking about. Please expand on "scalable", what do you mean by that?
odie5533 2 hours ago|||
It's a free, permanent host for your blog articles with a built-in community and monetization layer. There's only so many free hosts out there that I'd be confident will be around in 5 years, and Medium is one of them.
nickff 13 hours ago|||
It seems like it's just the latest evolution of the writer-friendly blogging platform; easier than Wordpress to package into a newsletter, and also easier to monetize with a paid tier.
ciupicri 12 hours ago||
But don't we have AI to deal with the complexity of Wordpress? :-)
DonHopkins 11 hours ago||
Insofar as AI is great at accidentally deleting your production and backup Wordpress databases, and forcing you to start from scratch with something else.
chneu 13 hours ago|||
My best guess is momentum. Some people are very, very brand loyal and have to do things in relation to what/how others do things.

In reality it doesn't matter where something is posted, just give us a url, but some people don't operate that way.

kelvinjps10 9 hours ago|||
check out Scribe an alternative medium frontend that's why better: https://scribe.rawbit.ninja/@NMitchem/if-ai-writes-your-code...

https://sr.ht/~edwardloveall/Scribe/ https://libredirect.github.io/

dsmurrell 13 hours ago||
Yep, Medium was free and everyone donated content... then it put up reading paywalls and conned everyone, I'm also surprised when I see people writing on there.
p4bl0 4 hours ago||
Not just for LLMs, but in general if code is produced automatically by a tool and isn't going to be a hundred percent proofread and tested by humans who could have written it manually, it's always better to use the safest possible language so that the compiler can catch most of the errors. So yeah, Rust or OCaml are good candidates. Performance is also a good point but it's a secondary issue in my opinion.
kgeist 1 hour ago|
Lots of comments here already, just my two cents. I work in R&D and I prefer prototyping things in Python with AI (although we're a 100% Go shop) because:

1) Python is expressive and has packages for everything => faster iteration times because much fewer tokens

2) It doesn't require a compilation step, so when I'm quickly iterating on something, especially if my laptop doesn't have the target hardware, the flow "copy the sources to the target machine and restart" is superfast (a couple of milliseconds)

3) Python most likely represents the largest share of training data, so almost all LLMs can one-shot almost everything

And when my prototype is ready, and we want to go to production, I can ask the LLM to port it to Go with all the necessary conventions/ceremonies and all.

More comments...