Kotlin creator's new language: a formal way to talk to LLMs instead of English

Posted by souvlakee 5 hours ago

Kotlin creator's new language: a formal way to talk to LLMs instead of English(codespeak.dev)

229 points | 188 comments

cube2222 3 minutes ago|

This is actually... pretty cool?

Definitely won't use it for prod ofc but may try it out for a side-project.

It seems that this is more or less:

  - instead of modules, write specs for your modules
  - on the first go it generates the code (which you review)
  - later, diffs in the spec are translated into diffs in the code (the code is *not* fully regenerated)

this actually sounds pretty usable, esp. if someone likes writing. And wherever you want to dive deep, you can delve down into the code and do "microoptimizations" by rolling something on your own (with what seems to be called here "mixed projects").

That said, not sure if I need a separate tool for this, tbh. Instead of just having markdown files and telling cause to see the md diff and adjust the code accordingly.

lifis 4 hours ago||

As far as I can tell it's not a new language, but rather an alternative workflow for LLM-based development along with a tool that implements it.

The idea, IIUC, seems to be that instead of directly telling an LLM agent how to change the code, you keep markdown "spec" files describing what the code does and then the "codespeak" tool runs a diff on the spec files and tells the agent to make those changes; then you check the code and commit both updated specs and code.

It has the advantage that the prompts are all saved along with the source rather than lost, and in a format that lets you also look at the whole current specification.

The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it (and also can't do LLM-driven changes that refer to the actual code), and also that in general it's not guaranteed that the spec actually reflects all important things about the program, so the code does also potentially contain "source" information (for example, maybe your want the background of a GUI to be white and it is so because the LLM happened to choose that, but it's not written in the spec).

The latter can maybe be mitigated by doing multiple generations and checking them all, but that multiplies LLM and verification costs.

Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

souvlakee 4 hours ago||

As far as I can tell C is not a new language, but rather an alternative workflow for assembly development along with a tool that implements it.

abreslav 4 hours ago||

I second that :)

abreslav 3 hours ago|||

> The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it

Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet. We are looking into ways to "catch up" the specs with whatever changes happen in the code not through CodeSpeak (agents or manual changes or whatever). It's an interesting exercise. In the case of agents, it's very helpful to look at the prompts users gave them (we are experimenting with inspecting the sessions from ~/.claude).

More generally, `codespeak takeover` [1] is a tool to convert code into specs, and we are teaching it to take prompts from agent sessions into account. Seems very helpful, actually.

I think it's a valid use case to start something in vibe coding mode and then switch to CodeSpeak if you want long-term maintainability. From "sprint mode" to "marathon mode", so to speak

[1] https://codespeak.dev/blog/codespeak-takeover-20260223

ferguess_k 13 minutes ago|||

Why are we eliminating our own job and maybe hobby so eagerly? Whatever. It is done.

newsoftheday 3 hours ago|||

> Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet.

Will we though? Wouldn't AI need to reach a stage where it is a tool, like a compiler, which is 100% deterministic?

intrasight 3 hours ago|||

We will and soon because it does not have to be deterministic like a compiler. It only has to pass all tests.

my_throwaway23 2 hours ago||

Who is writing the tests?

justonceokay 23 minutes ago||

In the future users will write the tests

vbezhenar 3 hours ago|||

Compiler is not 100% deterministic. Its output can change when you upgrade its version, its output can change when you change optimization options. Using profile-guided optimization can also change between runs.

CWIZO 1 hour ago|||

If you change inputs then obviously you will get a different output. Crucially using the same inputs, however, produces the same output. So compilers are actually deterministic.

boznz 1 hour ago|||

Also a bit formal. Maybe something like this will be the output of the prompt to let me know what the AI is going to generate in the binary, but I doubt I will be writing code like this in 5 years, English will probably be fine at my level.

lifis 3 hours ago|||

Also they seem to want to run this as a business, which seems absurd to me since I don't see how they can possibly charge money, and anyway the idea is so simple that it can be reimplemented in less than a week (less than a day for a basic version) and those alternative implementations may turn out to be better.

It also seems to be closed-source, which means that unless they open the source very soon it will very likely be immediately replaced in popularity by an open source version if it turns out to gain traction.

abreslav 3 hours ago|||

> Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

Working on that as well. We need to be a lot more flexible and configurable

oofbaroomf 29 minutes ago||

Ugh, I just wish there was a deterministic and formal way to tell a computer what I want...

the_duke 4 hours ago||

This doesn't make too much sense to me.

* This isn't a language, it's some tooling to map specs to code and re-generate

* Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

* Models are evolving rapidly, this months flavour of Codex/Sonnet/etc would very likely generate different code from last months

* Text specifications are always under-specified, lossy and tend to gloss over a huge amount of details that the code has to make concrete - this is fine in a small example, but in a larger code base?

* Every non-trivial codebase would be made up of of hundreds of specs that interact and influence each other - very hard (and context - heavy) to read all specs that impact functionality and keep it coherent

I do think there are opportunities in this space, but what I'd like to see is:

* write text specifications

* model transforms text into a *formal* specification

* then the formal spec is translated into code which can be verified against the spec

2 and three could be merged into one if there were practical/popular languages that also support verification, in the vain of ADA/Spark.

But you can also get there by generating tests from the formal specification that validate the implementation.

onion2k 4 hours ago||

Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

sensanaty 2 hours ago|||

That if at the beginning of your sentence is doing a whole lot of work. Indeed, if we could formally and provably (another extremely loaded word) generate good code that'd be one thing, but proving correctness is one of those basically impossible tasks.

xpe 33 minutes ago||

> but proving correctness is one of those basically impossible tasks.

To aim for a meeting of the minds... Would you help me out and unpack what you mean so there is less ambiguity? This might be minor terminological confusion. It is possible we have different takes, though -- that's what I'm trying to figure out.

There are at least two senses of 'correctness' that people sometimes mean: (a) correctness relative to a formal spec: this is expensive but doable*; (b) confidence that a spec matches human intent: IMO, usually a messy decision involving governance, organizational priorities, and resource constraints.

Sometimes people refer to software correctness problems in a very general sense, but I find it hard to parse those. I'm familiar with particular theoretical results such as Rice's theorem and the halting problem that pertain to arbitrary programs.

* With tools like {Lean, Dafny, Verus, Coq} and in projects like {CompCert, sel4}.

tomtomtom777 3 hours ago||||

> If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

If the spec is so complete that it covers everything, you might as well write the code.

The benefit of writing a spec and having the LLM code it, is that the LLM will fill in a lot of blanks. And it is this filling in of blanks that is non-deterministic.

pjmlp 2 hours ago||

> If the spec is so complete that it covers everything, you might as well write the code.

Welcome to the usual offshoring experience.

dsr_ 3 hours ago||||

Let's rephrase:

Since nobody involved actually cares whether the code works or not, it doesn't matter whether it's a different wrong thing each time.

brabel 1 hour ago||

You got it completely backwards. The claim is that if the code does exactly what the spec says (which generated tests are supposed to "prove") then the actual code does not matter, even if it's different each time.

ModernMech 11 minutes ago||

The point they are making is the tests are neither necessary nor sufficient alone to prove the code does exactly what the spec says. Looking at the tests isn't enough to prove anything; as an extreme example, if no one involved looks at the code, then the tests can just be static always passing and you wouldn't know either way whether or not the code matches the spec or not.

If anyone cared enough they could look at the code and see the problem immediately and with little effort, but we're encouraging a world where no one cares enough to put even that baseline effort because *gestures at* the tests are passing. Who cares how wrong the code is and in what ways if all the lights are green?

SpaceNoodled 4 hours ago||||

That's a huge "if."

gentooflux 3 hours ago||

I usually invert those to reduce nesting

FrankRay78 3 hours ago||||

Sure, but where are the formal acceptance tests to validate against?

0-_-0 2 hours ago||||

Besides, you can deterministically generate bad code, and not deterministically generate good code.

__loam 3 hours ago||||

The code is what the code does.

kennywinker 3 hours ago||

The shoe is what the shoe does.

Except one shoe is made by children in a fire-trap sweatshop with no breaks, and the other was made by a well paid adult in good working conditions.

The ends don’t justify the means. The process of making impacts the output in ways that are subtle and important, but even holding the output as a fixed thing - the process of making still matters, at least to the people making it.

raw_anon_1111 3 hours ago|||

The end is whether the code meets the functional and non functional requirements.

And guess how much shoe companies make who manufacture shoes in sweatshop conditions versus the ones who make artisanal handcrafted shoes?

kennywinker 3 hours ago|||

Ah yes - we should all strive to maximize shareholder value - triangle shirtwaist be damnned.

Btw in my metaphor, we - the programmers - are the kids in the sweatshop.

raw_anon_1111 2 hours ago||

If you are a “programmer” you are going to be the kids in the sweatshop. On the enterprise dev side where most developers work, it’s been headed in that direction for at least a decade where it was easy enough to become a “good enough” generic full stack/mobile/web etc dev.

Even on the BigTech side being able to reverse a btree on the whiteboard and having on your resume that you were a mid level developer isn’t enough either anymore

If you look at the comp on that side, it’s also stagnated for decade. AI has just accelerated that trend.

While my job has been at various percentages to produce code for 30 years, it’s been well over a decade since I had to sell myself on “I codez real gud”. I sell myself as a “software engineer” who can go from ambiguous business and technical requirements, deal with politics, XYProblems, etc

pjmlp 2 hours ago||

What do you think programmers in offshoring consulting shops are? Sadly.

raw_anon_1111 2 hours ago||

Exactly. I work in a consulting company as a customer facing staff consultant - highest level - specializing in cloud + app dev. We don’t hire anyone less than staff in the US. Anything lower is hired out of the country.

That’s exactly my point. “Programming” was clearly becoming commoditized a decade ago.

uoaei 1 hour ago|||

Functional requirements are known knowns.

Out of bounds behavior is sometimes a known unknown, but in the era of generated code is exclusively unknown unknowns.

Good luck speccing out all the unanticipated side effects and undefined behaviors. Perhaps you can prompt the agent in a loop a bnumber of times but it's hard to believe that the brute-force throw-more-tokens-at-it approach has the same level of return as a more attentive audit by human eyeballs.

raw_anon_1111 1 hour ago||

Are you as a developer 100% able to trust that you didn’t miss anything? Your team if you are a team lead who delegates tasks to other developers? If you outsource non business things like Salesforce integrations etc do you know all of the code they wrote? Your library dependencies? Your infrastructure providers?

xpe 21 minutes ago||

It seems like ^ and ^^ agree to me. Am I missing something?

pjmlp 2 hours ago|||

Yet the people voting with their wallets seem to go with cheaper option, regardless of what hides behind it.

Being shoes, offshoring, Webwidgets or AI generated code.

Copyrightest 3 hours ago||||

[dead]

jrm4 3 hours ago|||

I would be very comfortable with - re-run 100 times with different seeds. If the outcome is the same every time, you're reliably good to go.

pron 2 hours ago|||

If what you're after is determinism, then your solution doesn't offer it. Both the formal specification and the code generated from it would be different each time. Formal specifications are useful when they're succinct, which is possible when they specify at a higher level of abstraction than code, which admits many different implemementations.

vidarh 2 hours ago||

The point would presumably be to formalise it, then verify that the formal version matches what you actually meant. At which point you can't/shouldn't regenerate it, but you can request changes (which you'd need to verify and approve).

pron 2 hours ago||

But the code produced from the formal spec would still be nondeterministic. And I believe CodeSpeak doesn't wish to regenerate the entire program with each spec change, but apply code changes based on the changes to the spec. Maybe there could be other benefits to formalisation in this case, but determinism isn't one of them.

vidarh 2 hours ago||

It doesn't matter if the code is different if the spec is formal enough to validate the software against it.

I have no idea about codespeak - I was responding to the comments above, not about codespeak.

pron 54 minutes ago||

Validating programs against a formal spec is very, very hard for foundational computational complexity reasons. There's a reason why the largest programs whose code was fully verified against a formal spec, and at an enormous cost, were ~10KLOC. If you want to do it using proofs, then lines of proof outnumber lines of code 10-1000 to 1, and the work is far harder than for proofs in mathematics (that are typically much shorter). There are less absolute ways of checking spec conformance at some useful level of confidence, and they can be worthwhile, but they require expertise and care (I'm very much in favour of using them, but the thought that AI can "just" prove conformance to a formal spec ignores the computational complexity results in that field).

vidarh 30 minutes ago||

For most cases we don't need nearly that comprehensive verification. This is expecting more off AI written code than we ever bother to subject most human written code to. There's a vast chasm there we only need to even slightly start to bridge to get to far higher confidence levels than the typical human dev team achieves.

DrJokepu 3 hours ago|||

> Models aren't deterministic

Is that really true? I haven’t tried to do my own inference since the first Llama models came out years ago, but I am pretty sure it was deterministic: if you fixed the seed and the input was the same, the output of the inference was always exactly the same.

bigwheels 3 hours ago||

LLMs are not deterministic:

1.) There is typically a temperature setting (even when not exposed, most major providers have stopped exposing it [esp in the TUIs]).

2.) Then, even with the temperature set to 0, it will be almost deterministic but you'll still observe small variations due to the limited precision of float numbers.

Edit: thanks for the corrections

dwohnitmok 3 hours ago|||

> but you'll still observe small variations due to the limited precision of float numbers

No. Floating number arithmetic is deterministic. You don't get different answers for the same operations on the same machine just because of limited precision. There are reasons why it can be difficult to make sure that floating point operations agree across machines, but that is more of a (very annoying and difficult to make consistent) configuration thing than determinism.

(In general it is mildly frustrating to me to see software developers treat floating point as some sort of magic and ascribe all sorts of non-deterministic qualities to it. Yes floating point configuration for consistent results across machines can be absurdly annoying and nigh-impossible if you use transcendental functions and different binaries. No this does not mean if your program is giving different results for the same input on the same machine that this is a floating point issue).

In theory parallel execution combined with non-associativity can cause LLM inference to be non-deterministic. In practice that is not the case. LLM forward passes rarely use non-deterministic kernels (and these are usually explicitly marked as such e.g. in PyTorch).

You may be thinking of non-determinism caused by batching where different batch sizes can cause variations in output. This is not strictly speaking non-determinism from the perspective of the LLM, but is effectively non-determinism from the perspective of the end user, because generally the end user has no control over how a request is slotted into a batch.

comboy 3 hours ago||||

Limited precision of float numbers is deterministic. But there's whole parallelism and how things are wired together, your generation may end up on a different hardware etc.

And models I work with (claude,gemini etc) have the temperature parameter when you are using API.

the_duke 1 hour ago|||

You shouldn't be downvoted - LLMs could in theory be deterministic, but they currently are not, due to how models are implemented.

otabdeveloper4 30 minutes ago||

All my self-hosted inference has temperature zero and no randomness.

It is absolutely workable, current inference engines are just lazy and dumb.

(I use a Zobrist hash to track and prune loops.)

davedx 4 hours ago|||

My process has organically evolved towards something similar but less strictly defined:

- I bootstrap AGENTS.md with my basic way of working and occasionally one or two project specific pieces

- I then write a DESIGN.md. How detailed or well specified it is varies from project to project: the other day I wrote a very complete DESIGN.md for a time tracking, invoice management and accounting system I wanted for my freelance biz. Because it was quite complete, the agent almost one-shot the whole thing

- I often also write a TECHNICAL-SPEC.md of some kind. Again how detailed varies.

- Finally I link to those two from the AGENTS. I also usually put in AGENTS that the agent should maintain the docs and keep them in sync with newer decisions I make along the way.

This system works well for me, but it's still very ad hoc and definitely doesn't follow any kind of formally defined spec standard. And I don't think it should, really? IMO, technically strict specs should be in your automated tests not your design docs.

the_duke 4 hours ago|||

I think many have adopted "spec driven development" in the way you describe.

I found it works very well in once-off scenarios, but the specs often drift from the implementation. Even if you let the model update the spec at the end, the next few work items will make parts of it obsolete.

Maybe that's exactly the goal that "codespeak" is trying to solve, but I'm skeptical this will work well without more formal specifications in the mix.

intrasight 3 hours ago||

> specs often drift from the implementation > Maybe that's exactly the goal that "codespeak" is trying to solve

Yes and yes. I think it's an important direction in software engineering. It's something that people were trying to do a couple decades ago but agentic implementation of the spec makes it much more practical.

jbonatakis 3 hours ago||||

I have been building this in my free time and it might be relevant to you: https://github.com/jbonatakis/blackbird

I have the same basic workflow as you outlined, then I feed the docs into blackbird, which generates a structured plan with task and sub tasks. Then you can have it execute tasks in dependency order, with options to pause for review after each task or an automated review when all child task for a given parents are complete.

It’s definitely still got some rough edges but it has been working pretty well for me.

rebolek 3 hours ago|||

AGENTS.md is nice but I still need to remind models that it exists and they should read it and not reinvent the wheel every time.

allthetime 3 hours ago||

There should be a setting to include specific files in every prompt/context. I’m using zed and when you fire up an agent / chat it explicitly states that the file(s) are included.

wenc 1 hour ago|||

Rehashing my comment from before:

I use Kiro IDE (≠ Kiro CLI) primarily as a spec generator. In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.

I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.

Kiro writes specs using structured formats like EARS and INCOSE (which is the spc format used in places like Boeing for engineering reqs). It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.

Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests. Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).

didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.

jnpnj 1 hour ago|||

Maybe we're entering the non-deterministic applications too. No more mechanical predictable thing.. more like 90% regular and then weird.

Slightly sarcastic but not sure this couldn't become a thing.

rco8786 2 hours ago|||

How is your 2 step process not susceptible to all the exact same pitfalls you listed above?

dist-epoch 3 hours ago|||

> Models aren't deterministic - every time you would try to re-apply you'd likely get different output

So like when you give the same spec to 2 different programmers.

rco8786 2 hours ago|||

Yes, if you had each programmer rewrite the code from scratch each time you updated the spec.

orbital-decay 2 hours ago||

In reality you give the same programmer an update to the existing spec, and they change the code to implement the difference. Which is exactly what the thing in OP is doing, and exactly what should be done. There's simply no reason to regenerate the result.

The entire thing about determinism is a red herring, because 1) it's not determinism but prompt instability, and 2) prompt instability doesn't matter because of the above. Intelligence (both human and machine) is not a formal domain, your inputs lack formal syntax, and that's fine. For some reason this basic concept creates endless confusion everywhere.

dboreham 2 hours ago||||

Also like this: https://codeassociates.github.io/conversations-with-claude/c...

kennywinker 3 hours ago|||

Except each time you compile your spec you’re re-writing it from scratch with a different programmer.

pessimizer 4 hours ago|||

I think your objections miss the point. My informal specs to a program are user-focused. I want to dictate what benefits the program will give to the person who is using it, which may include requirements for a transport layer, a philosophy of user interaction, or any number of things. When I know what I want out of a program, I go through the agony of translating that into a spec with database schemas, menu options, specific encryption schemes, etc., then finally I turn that into a formal spec within which whether I use an underscore or a dash somewhere becomes a thing that has to be consistent throughout the document.

You're telling me that I should be doing the agonizing parts in order for the LLM to do the routine part (transforming a description of a program into a formal description of a program.) Your list of things that "make no sense" are exactly the things that I want the LLMs to do. I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

I want to see specs that move away from describing the specific functionality of programs altogether, and more into describing a usefulness or the convenience of a program that doesn't exist. I want to be able to feed the LLM requirements of what I want a program to be able to accomplish, and let the LLM research and implement the how. I only want to have to describe constraints i.e. it must enable me to be able to do A, B, and C, it must prevent X,Y, and Z; I want it to feel free to solve those constraints in the way it sees fit; and when I find myself unsatisfied with the output, I'll deliver it more constraints and ask it to regenerate.

ptak_dev 5 minutes ago|||

This resonates. I built an AI career consultant (CareerCraft AI) with exactly this philosophy — describe what the user needs to accomplish, not how to implement it.

The spec was essentially: "A user shares their professional background and a target job posting. The AI has a conversation to understand their experience, then generates a tailored resume and cover letter as downloadable PDFs. It remembers their profile within the session so they can generate documents for multiple roles."

That's it. No formal grammar, no structured templates for the conversation flow. The LLM handles the "how" — when to ask clarifying questions, how to reframe experience for different roles, what to emphasize based on the job description.

The interesting finding: natural language specs work remarkably well for applications where the output is also natural language (documents, advice, analysis). The formal spec approach makes more sense when the output needs to be deterministic code.

Try it if you're curious: https://super.myninja.ai/apps/6de082c7-a05f-4fc5-a7d3-ab56cc...

jbritton 17 minutes ago||||

I tried this recently with what I thought was a simple layout, but probably uncommon for CSS. It took an extremely long back and forth to nail it down. It seemingly had no understanding how to achieve what I wanted. A couple sentences would have been clear to a person. Sometimes LLMs are fantastic and sometimes they are brain dead.

darkwater 4 hours ago|||

> I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

Be careful what you wish for. This sounds great in theory but in practice it will probably mean a migration path for the users (UX changes, small details changed, cost dynamics and a large etc.)

fnord77 3 hours ago|||

[delete]

koolala 2 hours ago|||

It isn't a formal language, look at the goose example:

https://codespeak.dev/blog/greenfield-project-tutorial-20260...

It is a formal "way" aka like using json or xml like tons of people are already doing.

dist-epoch 3 hours ago|||

Software products specifications are written in real language, not in first order logic.

hkonte 4 hours ago||

[dead]

kleiba 4 hours ago||

I cannot read light on black. I don't know, maybe it's a condition, or simply just part of getting old. But my eyes physically hurt, and when I look up from reading a light-on-black screen, even when I looked at only for a short moment, my eyes need seconds to adjust again.

I know dark mode is really popular with the youngens but I regularly have to reach for reader mode for dark web pages, or else I simply cannot stand reading the contents.

Unfortunately, this site does not have an obvious way of reading it black-on-white, short of looking at the HTML source (CTRL+U), which - in fact - I sometimes do.

jbritton 5 minutes ago||

I find dark mode much easier to read and far less eye strain. I guess it just shows that users should be the ones to set the preference. There are studies on monkeys showing light mode leading to myopia. Although lately I have come to learn there are lots of poorly done studies.

newsoftheday 3 hours ago|||

Same for me, has been my whole life. I complain about it all the time. It's well documented that people can read black on light far better and with less eye strain than light on black; yet there seems to be a whole generation of developers determined to force us all to try and read it. Even the media sites like Netflix, Prime, etc. force it. At least Tubi's is somewhat more readable.

Sometimes a site will include a button or other UI element to choose a light theme but I find it odd that so many sites which are presumed to be designed by technically competent people, completely ignore accessibility concerns.

DoctorOW 1 hour ago||

The most common mistake I see (on this website at least) is the assumption that one's programming competence is equal to their competence in other things.

cambrianentropy 1 hour ago|||

Yeah I am the same.

Definitely in the minority on this one as dark mode is really popular these days.

Really hard to describe how it is literally physically painful for my eyes. Very strange.

embedding-shape 4 hours ago|||

Do you sit in a bright room? Right now, during the night, I see your comment like this: https://i.imgur.com/c7fmBns.png, but during the day when the room is bright, I also see everything with light themes/background colors, otherwise it is indeed hard to see properly.

kleiba 4 hours ago|||

Unfortunately, in my case, it's not a matter of lighting conditions.

skydhash 3 hours ago|||

When it’s dark (I can’t stand bright rooms at night), I lower the brightness of my screens instead of going for dark mode. I have astigmatism and any tiny bright spot is hard to focus on. It’s easier when the bright part is large and the dark parts are small (black on white is best).

lainproliant 2 hours ago||

I feel your pain. For me it is the opposite: I get headaches from bright backgrounds because I'm light-sensitive.

haolez 8 minutes ago||

Feels like writing tests before writing code, but with LLMs :)

le-mark 4 hours ago||

This concept is assuming a formalized language would make things easier somehow for an llm. That’s making some big assumptions about the neuro anatomy if llms. This [1] from the other day suggests surprising things about how llms are internally structured; specifically that encoding and decoding are distinct phases with other stuff in between. Suggesting language once trained isn’t that important.

[1] https://news.ycombinator.com/item?id=47322887

abreslav 3 hours ago|

We are not trying to make things easier for LLMs. LLMs will be fine. CodeSpeak is built for humans, because we benefit from some structure, knowing how to express what we want, etc.

niam 1 hour ago||

The title writer might be doing the project a disservice by using the term "formal" to describe it, given that the project talks a lot about "specs". I mistook it to imply something about formal specification.

My quick understanding is that isn't really trying to utilize any formal specification but is instead trying to more-clearly map the relationship between, say, an individual human-language requirement you have of your application, and the code which implements that requirement.

tonipotato 4 hours ago||

The problem with formal prompting languages is they assume the bottleneck is ambiguity in the prompt. In my experience building agents, the bottleneck is actually the model's context understanding. Same precise prompt, wildly different results depending on what else is in the context window. Formalizing the prompt doesn't help if the model builds the wrong internal representation of your codebase. That said curious to see where this goes.

slfnflctd 4 hours ago|

Two pieces of advice I keep seeing over & over in these discussions-- 1) start with a fresh/baseline context regularly, and 2) give agents unix-like tools and files which can be interacted with via simple pseudo-English commands such as bash, where they can invoke e.g. "--help" to learn how to use them.

I'm not sure adding a more formal language interface makes sense, as these models are optimized for conversational fluency. It makes more sense to me for them to be given instructions for using more formal interfaces as needed.

seanmcdirmid 2 hours ago|

I've done something similar for queries. Comments:

* Yes, this is a language, no its not a programming language you are used to, but a restricted/embellished natural language that (might) make things easier to express to an LLM, and provides a framework for humans who want to write specifications to get the AI to write code.

* Models aren't deterministic, but they are persistent (never gonna give up!). If you generate tests from your specification as well as code, you can use differential testing to get some measure (although not perfect) of correctness. Never delete the code that was generated before, if you change the spec, have your model fix the existing code rather than generate new code.

* Specifications can actually be analyzed by models to determine if they are fully grounded or not. An ungrounded specification is going to not be a good experience, so ask the model if it thinks your specification is grounded.

* Use something like a build system if you have many specs in your code repository and you need to keep them in sync. Spec changes -> update the tests and code (for example).

More comments...