Yon – a topos-oriented language with a content-addressed lattice heap

Posted by amenn 2 days ago

Yon – a topos-oriented language with a content-addressed lattice heap(yon-lang.org)

Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could nonetheless be useful to improve symbolic reasoning about the concepts themselves. I made use of both books and AI, and I followed the development of the latter, mainly with a critical eye. I have several open projects, and from some observations and explorations on one of them I started asking myself what the current limits of reasoning, of logic, of mathematics itself are. So I explored categories, and topoi, above all starting from Mazzola's theory of music. I asked myself whether this could influence type theory in programming, and I ran some experiments. Out of this came this programming language, Yon, inspired by Yoneda and by morphisms. From another project I drew observations on the Leech lattice; from yet another, some experiments with mmap and coordinate-based allocation in a structure that would be advantageous, again, in a topological sense. The language certainly has mistakes here and there and I wrote the documentation in a hurry; the work took 3 weeks in total. It compiles to LLVM for performance reasons, and for now I preferred to avoid a VM and a GC. It contains unusual data structures that turn out to be performant. It's worth a look, and I hope it will win some converts, and that someone will want to help me with its development. I'd love for it to bring fresh stimuli to programming and maybe open a few new frontiers. A few concrete details, for those who want to look under the hood. The compiler is a real pipeline, not an interpreter: an OCaml frontend takes .yon source into a custom MLIR dialect I called "topos", where the categorical constructs live as first-class operations; its lowering passes take everything down to LLVM IR and from there to a native executable. A single command, yonc, drives the whole chain, and you can stop at any intermediate stage to see what a categorical construct actually becomes on its way to silicon. The runtime is where the Leech lattice observations ended up. The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address. That buys three things I would now find hard to give up: equality is a single machine comparison no matter how big the value is (string equality benches flat at ~17 ns up to 32,768-character strings, because it compares handles, never bytes); deduplication is global and automatic, with no interning logic in user code; and giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move. Concurrency I kept deliberately simple-minded: no threads, no shared mutable state. A program splits into isolated "Spaces" (separate processes, isolation enforced by the MMU) that talk over shared-memory channels with explicit failure semantics. About what is verified and what is just hope: the ground truth is a regression suite of 112 examples plus a cross-Space scenario suite, with exit codes identical on Linux x86-64 and macOS Apple Silicon (Intel Macs: untested). The book on the site, 21 chapters plus appendices, had every snippet compiled and run before being written down. The benchmarks appendix declares its environment and method; I tried not to publish any number without one. The limits of 1.0 are written down as well, in a baseline document that lists every fixed pool (256 heaps per chain, 64 Spaces, 16 concurrent RPC sessions, and so on), with the rationale that a hard limit that fails loudly is a specification, while a soft limit that degrades silently is a bug. For the license I went with the GCC model: compiler and toolchain are AGPLv3, the runtime is AGPLv3 with an explicit linking exception, so the language itself stays free, and the programs you write in it are entirely yours, under any license you choose.

Site + book: https://yon-lang.org Repo: https://github.com/yon-language/yon (tag v1.0.0)

Happy to answer anything: the topos dialect, why a lattice rather than a hash, what the categorical constructs lower to, what broke along the way.

43 points | 48 comments

mccoyb 5 hours ago|

I'm not sure where or how to convey this, because I've seen several of these languages designed with AI, documentation created using AI, etc -- posted on Hacker News in the last months or so, and I've responded to each one with roughly the same feedback (and I'm assuming good faith: that the intent is that the poster wishes to grow as a language designer).

Your audience, or whoever you aim your work at, should be treated with respect. Otherwise, why should they give you the time of day? Why would you expect them to respond positively to effort alone when effort (in code and in shit prose) is extremely cheap right now? Their time is not cheap ...

When I read the documentation, and it is extremely clear that you haven't taken the time to clarify your ideas, when much of it is LLM prose, when much of the content introduces highfalutin ideas without motivation, blending categorical concepts (which, by the way, should never be mixed with vague prose claims about the language), violating my reader context model, preventing me from understanding what problem exactly your language design is solving (where is that problem stated clearly?), it is a waste of my time.

> The work took 3 weeks in total ... it's worth a look, and I hope it will win some converts, and that someone will want to help me with its development.

You've gone too fast, too much is vague, nothing is clear.

I'd delete everything, start over, and try and explain just one of the ideas clearly. Seriously. This sounds harsh, but it's honestly the correct approach to something as subtle and nuanced as programming language design.

VoidWarranty 56 minutes ago||

This reads to me like someone's mania project. I wish OP the best and hope they can get some rest.

skulk 3 hours ago|||

> Your audience, or whoever you aim your work at, should be treated with respect.

I just want to amplify this point. As I was reading this, the LLMisms kept jumping out at me and each one felt like the author looking at me and deciding that my time spent reading this prose wasn't actually worth anything to them.

OP: I want YOUR thoughts, not the next token predictions of a gigantic pile of matrix multiplications. I want your awkward sentences, grammar mistakes, half-baked thoughts, self-doubt, silly jokes. I don't want this pile of grandiose mechanical slop completely devoid of humanity.

solomonb 48 minutes ago||

Personally I don't want to read the codebase AND book of someone 3 weeks into a mania focused on a subject it is unclear they have any prior experience with. Its disrespectful for someone to think they can produce something worthy of consuming another human's time under those constraints.

nathan_compton 59 minutes ago|||

I have to second this. I find the AI written documentation extremely loathsome, hard to read, and somehow both pretentious and lazy.

Please, I beg everyone, stop posting AI slop.

TimorousBestie 52 minutes ago||

Regrettably, the beatings are going to continue until morale improves.

danieltanfh95 1 hour ago||

This isn’t about whether the writer uses LLM or not at all, nor is it about respect. The core novelty it tries to introduce is not hard to understand (even if it is not really that novel). If you don’t want to spend time thinking about what interesting idea it is exploring, that is fine, but pretending or insinuating that it is a LLM problem is just lazy.

mccoyb 1 hour ago|||

Explain away then my friend: surely your clear explanation will benefit many other readers who came away with similar confusion?

Retr0id 1 hour ago|||

What is the core novelty?

jrmg 4 hours ago||

Just a comment: this sounds a lot like when someone I knew mildly succumbed to AI psychosis, and thought he, with Gemini, had made some physics/metaphysics breakthrough. If you’re losing sleep and feeling distressed or euphoric, maybe lay off for a few days, no matter how hard that is. Talk to friends and/or family about unimportant things. Get outside for a while. Go back to old hobbies (reading, hiking, just going to coffee shops or thrift stores - whatever) and then reassess.

This language looks interesting, but I don’t understand the concepts. Does this stuff make sense to other people?

The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address.

What is ‘Λ₂₄’? What is a ‘lattice point’?

giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move

This kind of sounds like you’re saying that there’s nothing to free, which implies that nothing takes up memory, which I presume is not the case. Do you mean everything is immutable and content-addressed (like Git)? Doesn’t stuff still need to be freed somehow when the programs done with it, otherwise memory will grow for ever?

leecommamichael 3 hours ago||

Agreed. Everything is a weird mixture of poetry and mathematics jargon. Basically every page of the book contains some esotericism which makes empty claims. It's completely divorced from reality.

dirkt 2 hours ago|||

> Does this stuff make sense to other people?

Nope, and I actually learned about application of category theory to programming language in university.

I tried to get an idea about the main points, and then stumbled over

> a thing is what you can observe of it. > > [...] > > Content addressing is extensionality made physical (chapter 11): two values indistinguishable by observation are not merely equal, they are the same slot

That only works in a category because you have enough (a countably or uncountably infinite number) functions that you can compose and "test" so you don't need (or don't care) about the "value" itself.

But on a real computer that doesn't work, because you can't go beyond a countable number, and even then you run into the halting problem pretty soon. So equality in this model is not computable. Which is sort of bad if you want to somehow store values "in the same slot" just based on observability. It might work for string literals, and even for concatenated strings, but not in general.

Picking some random lattice (a lattice is a partially ordered structure with some extra conditions) as a base of addressing doesn't help...

So yes, crackpot AI slop. The words sort of make sense, but there's nothing solid behind it, and as soon as you look at details it falls apart.

canyp 2 hours ago||

I didn't even get that far; I found the syntax annoying.

esafak 3 hours ago|||

https://en.wikipedia.org/wiki/Leech_lattice

jrmg 3 hours ago||

Maybe I just don’t have the mathematics knowledge to understand it, but that doesn’t really tell me how you could represent one in memory, or use one as a backing store for a hash-addressed data structure.

danieltanfh95 1 hour ago||

There is nothing physics/metaphysics about this. If you don’t understand the terms, don’t pretend you do and write slop as a comment, it is really not that different from using LLM to generate slop.

itishappy 18 minutes ago|||

The parent comment is not suggesting that Yon is about physics/metaphysics.

Understanding is important for readers. Demonstrating understanding is important for writers of both technical documentation and internet comments, and of critical importance in the era of AI.

nvme0n1p1 1 hour ago||||

What if it's pure nonsense, therefore impossible for anyone to understand. Does that mean all criticism is "slop" and nobody's allowed to comment on it?

ModernMech 1 hour ago|||

"If you don’t understand the terms, don’t pretend you do"

The comment you're replying to explicitly says "This language looks interesting, but I don’t understand the concepts." so I'm not sure what you're trying to say. Their note about physics/metaphysics was about "someone [they] knew", not TFA.

Chinjut 2 hours ago||

I have a PhD in category theory and know what the Leech lattice is and I still don't understand what is going on here. What is the value of using the Leech lattice to store memory?

cjs_ac 4 hours ago||

The documentation is a work of art. Every time I try to work out what just one of the unexplained ideas is, it just introduces new unexplained ideas. I don't know where these ideas came from, how they fit together, or why putting them together is useful. I certainly don't know why I would want to write a program in this language, as opposed to any other language I already know.

skrebbel 4 hours ago||

Sibling comment suggests maybe it’s AI psychosis and that would clarify a lot.

ModernMech 1 hour ago||

Reminds me a lot of Urbit docs in that sense.

canyp 2 hours ago||

> Content addressing is extensionality made physical (chapter 11)

Actually, that's in chapter 12; 11 is the standard library. Maybe the LLM got confused because the chapters are 0-indexed.

I was curious about that topic but it seems over my head. I don't think it works outside of mathematics? In programming, one can have two objects that are identical in both structure and value but have different identities. It's why lisp has eq, eql, equal, etc. How'd you get around that other than adding an identity property?

Also:

> A handle, what your variables actually hold for strings, sections, lists, trees, is that slot index, carried as an f64

Why does the handle need floating point?

TimorousBestie 49 minutes ago|

> Why does the handle need floating point?

I don’t know if Yon does this (the documentation is gibberish) but it’s possible to use f64 NaNs to hold convenient metadata. I had a professor who wrote a bespoke teaching language (roughly based on Scheme) that did that.

ModernMech 18 minutes ago||

Here's an implementation of such: https://docs.rs/nanval/latest/nanval/

iterateoften 3 hours ago||

I noticed since 5.5 GPT has been adding "lattice" to a lot of things. Not sure if it is the new Gremlins.

solomonb 52 minutes ago||

As someone genuinely interested in programming language design, type theory, and category theory this sort of thing really saddens me. There is so much passion and rigor that has gone into developing these fields. Chucking all their jargon into an ai slop blender, imo, is actually incredibly disrespectful to those who have worked so hard.

Imagine someone honestly interested in learning about category theory but not yet knowing where to start. Projects like this only serve to muddy the waters obscuring paths to actual learning and giving the impression that the subject is a joke.

danieltanfh95 1 hour ago||

As I understand it, content addressing function content is problematic because it does not actually "normalise" the content of functions into something interchangeable. A function of input A and output B with performance signature X can still be very different in terms of actual code, but the actual comparison between both is hard to specify.

I was exploring this as a means to solving the open source, or rather the github conundrum, the problem of sharing code socially is that we need a canonical source, and this is sociologically driven than performance driven, and as it turns out, have devastating consequences for FOSS funding. I wanted to explore sharing code "interchangeably" in some sense to avoid this problem, but ultimately this seems unsolveable, even with exploration by Unison etc.

GreenSalem 3 hours ago||

Advanced AI psychosis.

Professional help might be necessary.

swiftcoder 3 hours ago|

Honestly, as someone who has at least a moderate tolerance for PL jargon, most of this is completely impenetrable. It's like someone put the whole field of PL in an LLM-powered blender.

More comments...