Posted by tortilla 1 day ago
Go as well seems surprisingly low given how constrained and straight forward it is
It doesn't align with anecdotal experience
> Section 3.3:
> Besides, since we use the moderately capable DeepSeek-Coder-V2-Lite to filter simple problems, the Pass@1 scores of top models on popular languages are relatively low. However, these models perform significantly better on low-resource languages. This indicates that the performance gap between models of different sizes is more pronounced on low-resource languages, likely because DeepSeek-Coder-V2-Lite struggles to filter out simple problems in these scenarios due to its limited capability in handling low-resource languages.
At the same time I have used Claude Code on an elixir codebase and it's done a great job. But for me, it's undefined that it would have done a worse job if I had picked any other stack.
The features of Elixir that lead to good software are amplified with LLM's.
One thing that I would perhaps add to the article (or emphasise) is the clarity and quality of error messages in Elixir. In my opinion some of the best error logging in the game. The vast majority of the time the error gives enough information to very quickly fix the problem.
My takeaway was that these models excel at popular languages where there’s ample training material, but struggle where the languages change rapidly or are relatively “niche.” I’m sure they’ve since gotten better, so perhaps my perception is already out of date.
OTP fits agents like a glove.