We should revisit literate programming in the agent era

Posted by horseradish 1 day ago

We should revisit literate programming in the agent era(silly.business)

288 points | 243 commentspage 3

cmontella 1 day ago|

I agree with this. I've been a fan of literate programming for a long time, I just think it is a really nice mode of development, but since its inception it hasn't lived up to its promise because the tooling around the concept is lacking. Two of the biggest issues have been 1) having to learn a whole new toolchain outside of the compiler to generate the documents 2) the prose and code can "drift" meaning as the codebase evolves, what's described by the code isn't expressed by the prose and vice versa. Better languages and tooling design can solve the first problem, but I think AI potentially solves the second.

Here's the current version of my literate programming ideas, Mechdown: https://mech-lang.org/post/2025-11-12-mechdown/

It's a literate coding tool that is co-designed with the host language Mech, so the prose can co-exist in the program AST. The plan is to make the whole document queryable and available at runtime.

As a live coding environment, you would co-write the program with AI, and it would have access to your whole document tree, as well as live type information and values (even intermediate ones) for your whole program. This rich context should help it make better decisions about the code it writes, hopefully leading to better synthesized program.

You could send the AI a prompt, then it could generate the code using live type information; execute it live within the context of your program in a safe environment to make sure it type checks, runs, and produces the expected values; and then you can integrate it into your codebase with a reference to the AI conversation that generated it, which itself is a valid Mechdown document.

That's the current work anyway -- the basis of this is the literate programming environment, which is already done.

The docs show off some more examples of the code, which I anticipate will be mostly written by AIs in the future: https://docs.mech-lang.org/getting-started/introduction.html

catlifeonmars 1 day ago|

We actually have had literate programming for a while, it just doesn’t look exactly how it was envisioned: Nowadays, it’s common for many libraries to have extensive documentation, including documentation, hyperlinks and testable examples directly inline in the form of comments. There’s usually a well defined convention for these comments to be converted into HTML and some of them link directly back to the relevant source code.

This isn’t to say they’re exactly what is meant by literate programming, but I gotta say we’re pretty damn close. Probably not much more than a pull request away for your preferred languages’ blessed documentation generator in fact.

(The two examples I’m using to draw my conclusions are Rust and Go).

cmontella 1 day ago||

I think that's exactly what is meant, and it's a great example. The two places where literate programming have shined most are 1) documentation because it's a natural fit there and you can get away with having little programs rather than focusing on a book-length narrative as Knuth had originally purposed it for. But also 2) notebook programming environments especially Jupyter and Org mode. I think programs structured in these notebooks really are perfectly situated for LLM analysis and extension, which is where the opportunity lies today.

threethirtytwo 1 day ago||

Should be extremely low effort to try this out with an agent.

The thing is, I feel an agent can read code as if it was english. It doesn't differentiate one as hard and the other as much more readable as we do. So it could end up just increasing the token burn amount just to get through a program because it has to run through the literate part as well as the actual code part.

jasfi 1 day ago||

I wrote something similar where you specify the intent in Markdown at the file level. That can also be done by an AI agent. Each intent file compiles to a source file.

It works, but needs improvement. Any feedback is welcome!

https://intentcode.dev

https://github.com/jfilby/intentcode

ajkjk 1 day ago||

I've had the same thought, maybe more grandiosely. The idea is that LLM prompts are code -- after all they are text that gets 'compiled' (by the LLM) into a lower-level language (the actual code). The compile process is more involved because it might involve some back-and-forth, but on the other hand it is much higher level. The goal is to have a web of prompts become the source of truth for the software: sort of like the flowchart that describes the codebase 'is' the codebase.

hkonte 2 hours ago||

The "prompts are code" framing is right, and the compile analogy holds further than people think. Real code has structure: typed parameters, return types, separated concerns. A raw prose prompt is more like a shell one-liner with everything inlined. It works, but it breaks when you try to reuse or modify it.

If you take the compile idea seriously, the next step is to give prompts the same structure code has: separate the role from the context from the constraints from the output spec. Then compile that into XML for the model.

I built flompt (https://github.com/Nyrok/flompt) as a tool for this. Canvas where you place typed blocks (role, objective, constraints, output format, etc.) and compile to structured XML. Basically an IDE for prompts, not a text editor. A star would help a lot if this resonates.

sarchertech 1 day ago|||

No it doesn’t get compiled. Compilation is a translation from one formal language to another that can be rigorously modeled and is generally reproducible.

Translating from a natural language spec to code involves a truly massive amount of decision making because it’s ambiguous. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences.

Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously or constantly recompile specs that won’t end up with all of those observable differences constantly shifting, resulting in unusable software.

Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better.

Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Max or Star Trek.

ajkjk 1 day ago||

well you have to expand your definition of "compile" a bit. There is clearly a similarity, whether or not you want to call it the same word. Maybe it needs a neologism akin to 'transpiled'.

other than that you seem to be arguing against someone other than me. I certainly agree that agents / existing options would be chaotic hell to use this way. But I think the high-level idea has some potential, independent of that.

sarchertech 1 day ago||

I fundamentally don’t think the higher level idea has any potential because of the ambiguity of natural language. And I certainly don’t think it has anything in common with compilation unless you want to stretch the definition so far as to say that engineers are compilers. It’s delegation not abstraction.

I think we’ll either get to the point where AI is so advanced it replaces the manager, the PM, the engineer, the designer, and the CEO, or we’ll keep using formal languages to specify how computers should work.

Copyrightest 1 day ago||

One problem with this is that there isn't really a "current prompt" that completely describes the current source code; each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.

1718627440 1 day ago|||

> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.

My Git history contains links between the false starts and misunderstandings and the corrections, which then also include a paragraph on my this was a misunderstanding or false start. It is a lot better than just a single linear log from LLMs.

1718627440 1 day ago||||

> each source file is accompanied by a full chat log, including false starts and misunderstandings. It's sort of like reading a git history instead of the actual file.

ajkjk 1 day ago|||

true, but that just means that's the problem to solve. probably the ideal architecture isn't possible right now. But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it, so that eventually it becomes a full 'spec'.

And maybe there is a way to trim the parts out of it that are not needed... like to automatically produce an initial prompt which is equivalent to the results of a longer session, but is precise enough so as to not need clarification upon reprocessing it. Something like that? I'm not sure if that's something that already exists.

sarchertech 20 hours ago||

> But I sorta imagine that you could later on take the full transcript of that conversation and expect any LLM to implement more or less the same thing based on it

Why would you think this though? There are an infinite number of programs that can satisfy any non-trivial spec.

We have theoretical solutions to LLM non-determinism, we have no theoretical solutions to prompt instability especially when we can’t even measure what correct is.

ajkjk 14 hours ago||

yeah but all of the infinite programs are valid if they satisfy the spec (well, within reason). That's kinda the point. Implementation details like how the code is structured or what language it's in are swept under the rug, akin to how today you don't really care what register layout the compiler chooses for some code.

sarchertech 13 hours ago||

There has never been a non trivial program in the history of the world that could just “sweep all the implementation details under the rug”.

Compilers use rigorous modeling to guarantee semantic equality and that is only possible because they are translating between formal languages.

A natural language spec can never be precise enough to specify all possible observable behaviors, so your bot swarm trying to satisfy the spec is guaranteed to constantly change observable behaviors.

This gets exposed to users and churn, jank, and workflow breaking bugs.

yuppiemephisto 1 day ago||

I do a form of literate programming for code review to help read AI code. I use [Lean 4](lean-lang.org) and its doc tool [Verso](https://github.com/leanprover/verso/) and have it explain the code through a literate essay. It is integrated with Lean and gets proper typechecking etc which I find helpful.

Arubis 1 day ago||

Anecdotally, Claude Opus is at least okay at literate emacs. Sometimes takes a few rounds to fix its own syntax errors, but it gets the idea. Requiring it to TDD its way in with Buttercup helps.

gwbas1c 1 day ago||

One thing I've discovered with an LLM is that I can ask it to search through my codebase and explain things to me. It saves a lot of time when I need to understand concepts that would otherwise require a few hours of reading and digging.

fhub 1 day ago||

We were taught Literate Programming and xtUML at university. In both courses, the lecturers (independently) tried to convince us that these technologies were the future. I also did an AI/ML course. That lecturer lamented that the golden era was in the past.

ontouchstart 22 hours ago||

Literate programming in the sense of Donald Knuth is more about the chain of thoughts of the programmer than documenting code with comments or doc strings.

avatardeejay 1 day ago|

Something in this realm covers my practice. I just keep a master prompt for the whole program, and sparsely documented code. When it's time to use LLM's in the dev process, they always get a copy of both and it makes the whole process like 10x as coherent and continuous. Obvi when a change is made that deviates or greatly expands on the spec, I update the spec.

grapheneposter 1 day ago|

I do something similar with quality gates. I have a bunch of markdown files at the ready to point agents to for various purposes. It lets me leverage LLMs at any stage of the dev process and my clients get docs in their format without much maintenance from myself. As you said once you get it down it becomes a very coherent process that can be iterated on in its own right.

I am currently fighting the recursive improvement loop part of working with agents.

More comments...