Posted by tamnd 2 days ago
https://www.nature.com/articles/s41586-024-07566-y
If you've spent any time using LLMs to write documentation you'll see this for yourself: the compounding will just be rewriting valid information with less terse information.
I find it concerning Karpathy doesn't see this. But I'm not surprised, because AI maximalists seem to find it really difficult to be... "normal"?
Rule of thumb: if you find yourself needing to broadcast the special LLM sauce you came up with instead of what it helped you produce, ask yourself why.
It is unlikely you are accurately perceiving some limitation that Karpathy does not.
Everything is derived from them and backlinks into them. Which is necessary to be vigilant about staleness, correctness, drift, and more. Just like in a human-built knowledge base.
I'll spare you most of the slop but.. "The Case That I Am Abulafia: The parallel is uncomfortable and precise. [...]"
Yeah... It's fun though.
————
Wow, I respect karpathy so much and have learned a ton from him. But WTF is the sibling comment he wrote as a response to you? Just pasting a Claude-written slop retort… it’s sad.
Maybe we need to update that old maxim about “if you don’t have something nice to say, don’t say it” to “if you don’t have something human to say, don’t say it.”
So many really smart people I know have seen the ‘ghost in the machine’ and as a result have slowly lost their human faculties. Ezra Klein, of all people, had a great article about this recently titled “I Saw Something New in San Francisco” (gift link if you want to read it): https://www.nytimes.com/2026/03/29/opinion/ai-claude-chatgpt...
It's weird how some people cover the whole range of putting out some really good stuff and other times the complete opposite.
Feels as if they were two different people ... or three, or four.
Appreciate the gift link, I’ll give it a read!
> Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions. They will think of mechanisms, procedures, and models. They will remember that such-and-such a person did some possibly relevant work on a topic of interest back in 1947, or at any rate shortly after World War II, and they will have an idea in what journals it might have been published. In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.
> In addition, men will handle the very-low-probability situations when such situations do actually arise. (In current man-machine systems, that is one of the human operator's most important functions. The sum of the probabilities of very-low-probability alternatives is often much too large to neglect. ) Men will fill in the gaps, either in the problem solution or in the computer program, when the computer has no mode or routine that is applicable in a particular circumstance.
> The information-processing equipment, for its part, will convert hypotheses into testable models and then test the models against data (which the human operator may designate roughly and identify as relevant when the computer presents them for his approval). The equipment will answer questions. It will simulate the mechanisms and models, carry out the procedures, and display the results to the operator. It will transform data, plot graphs ("cutting the cake" in whatever way the human operator specifies, or in several alternative ways if the human operator is not sure what he wants). The equipment will interpolate, extrapolate, and transform. It will convert static equations or logical statements into dynamic models so the human operator can examine their behavior. In general, it will carry out the routinizable, clerical operations that fill the intervals between decisions.
https://www.organism.earth/library/document/man-computer-sym...
e.g. (amongst many others) Desk-Surface Display and Control: Certainly, for effective man-computer interaction, it will be necessary for the man and the computer to draw graphs and pictures and to write notes and equations to each other on the same display surface. The man should be able to present a function to the computer, in a rough but rapid fashion, by drawing a graph. The computer should read the man's writing, perhaps on the condition that it be in clear block capitals, and it should immediately post, at the location of each hand-drawn symbol, the corresponding character as interpreted and put into precise type-face.
On a sidenote, I've been building an AI powered knowledge base (yes, it uses RAG) that has wiki synthesis and similar ideas, take a look at https://github.com/kenforthewin/atomic
The retrieval part can be grep if you don't care about semantic search.
I’ve been thinking something along the lines of a LLM-WIKI for a while now which could truely act as a wingman-executive-assistant-second-brain, but OP has gone deeper than my ADHD thoughts could have possibly gone.
Looking forward to seeing this turn into fruition
Fully retrieve all diagram or charts info from ppt and excels, and then leverage Native AI agents(e.g. Codex) to conduct Agentic Rad
More to the point, this is how LLM assistants like GitHub Copilot use their custom instructions file, aka copilot-instructions.md
https://docs.github.com/en/copilot/how-tos/configure-custom-...
also the linting pass is doing something genuinely different - auditing inconsistencies, imputing missing data, suggesting connections. thats closer to assistant maintaining a zettelkasten than a search engine returning top-k chunks
cool project btw will check it out
OP's example isn't something new or incredibly thoughtful at all - in fact this pattern gets "discovered" every other day here, reddit or social media in general by people that don't have the foresight to just look around and see what other people are doing.
What I'm pushing back on specifically is the insistence that the core loop - retrieving the most relevant pieces of knowledge for wiki synthesis - is not RAG. In order for the LLM to do a good job at this, it needs some way to retrieve the most relevant info. Whether that's via vector DB queries or a structured index/filesystem approach, that fundamental problem - retrieving the best data for the LLM's context - is RAG. It's a problem that has been studied and evaluated for years now.
thanks for checking it out
> but the LLM is rediscovering knowledge from scratch on every question
Unless the wiki stays fully in context now the LLM hast to re-read the wiki instead of re-reading the source files. Also this will introduce and accumulate subtle errors as we start to regurgitate 2nd-order information.
I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.
We've already got 1m context, 800k context, and they still start "forgetting" things around the 200k - 300k mark.
What use is 10M context if degradation starts at 200k - 300k?
It’s not what this person is describing though. A self referential layer like this that’s entirely autonomous does feel completely valueless - because what is it actually solving? Making itself more efficient? The frontier model providers will be here in 3 weeks doing it better than you on that front. The real value is having a system that supports a human coming in and saying “this is how the system should actually behave”, and having the system be reasonably responsive to that.
I feel like a lot of the exercises like op are interesting but ultimately futile. You will not have the money these frontier providers do, and you do not have remotely the amount of information that they do on how to squeeze the most efficiency in how they work. Best bet is to just stick with the vanilla shit until the firehose of innovation slows down to something manageable, because otherwise the abstraction you build is gonna be completely irrelevant in two months
now you get to condense the findings that interest from a handful of papers
in the future it solves for condensing your interests in a whole field to a handful of papers or less
As such I've taken to delegating substantial parts architecture and discovery to multiagent workflows that always refer back to a wiki-like castle of markdown files that I've built over time with them, fronted by Obsidian so I can peep efficiently often enough.
Now I'm certainly doing something wrong, but the gaps are just too many to count. If anything, this creates a weird new type of tech debt. Almost like a persistent brain gap. I miss thinking harder and I think it would get me out of this one for sure. But the wiki workflow is just too addictive to stop.
Me too, and I wonder where this will take us; I worry about losing the ability to think hard.
People who do not understand that will continue to not understand that it also applies to AI right now. Maybe at some point in the future it won't, not sure. But my impression is that systems grow in complexity far past the point where the system is gummed up and no-one can do anything, unless it's actively managed.
If a human can understand 10 units of complexity and their LLM can do 20, then they might just build a system that's 30 complex and not understand the failure modes until it's too late.
I think this is mostly a matter of expectation management. AIs are being positioned as being able to develop software independently, and that’s certainly the end goal.
So then people come in with the expectation that the AI is able to manage that, and it fails. Spectacularly.
The LLM can certainly not manage any non-local complexity right now, and succeed in increasing the technical debt and complexity faster than ever before.
Even if you can get an LLM to output good artifacts that don't eventually evolve into slop, which is questionable, it's really not that useful, especially not for a personal wiki.
It seems rather silly to me, as _creating_ those flashcards is what helps you learn, with the studying after, cementing that knowledge in your brain.
I don't know if I'd ever be comfortable with that, hopefully I'll just be retired or dead when that takes off.
Maybe my recent prompts reflects how badly up to speed I am at a given time? I don't know. A slightly related note - I recently heard the term AI de-skilling, this treads close to it imo.
Talk about back to school feelings (!)
It isn't different. This just tries to reinvent the wheel that all mainstream coding assistants have been providing for over a year.
Even ChatGPT rolled out chat memory in their free tier.
I don't think you are very familiar with the state of LLM assistants. Prompt files were a thing since ever, and all mainstream coding assistants support storing your personal notes in your own files in multiple ways. Prompt files, instruction files, AGENTS.md, etc.
The AGENTS.md approach papers over this by teaching the LLM the folder conventions. Works until the data gets complex but gets worse after many iterations.
Both are needed: files that open in any editor, and a structured interface the agent can actually query. Been building toward that with Binder (github.com/mpazik/binder), a local knowledge platform. Data lives in a structured DB but renders to plain markdown with bi-directional sync. LSP gives editors autocomplete and validation. Agents and scripts get the same data through CLI or MCP.
I would be interested in trying to make the models go into more of a research mode and organize their knowledge inside it, but I've found this turns into something like LLM soup.
For coding projects, the best experience I have had is clear requirements and a lot of refinement followed through with well documented code and modules. And only a few big 'memories' to keep the overall vision in scope. Once I go beyond that, the impact goes down a lot, and the models seem to make more mistakes than I would expect.
Start with short text context, and flow through DAGs via choose your own adventure. We alreadybreached context limits. Nows the time to let LLMs build their contexts through decision trees and prune dead ends.
I've handed my local agents a bunch of integrated command line tools (kinda like an office suite for LLMs), including a wiki (https://github.com/triblespace/playground/blob/main/facultie... ) and linkage really helps drastically reduce context bloat because they can pull in fragment by fragment incrementally.
Youd then alias these disambiguated terms and theyd still trigger the correct token autocomplete but would reduce overlap which cause misdirection.