LLM Wiki – example of an "idea file"

Posted by tamnd 2 days ago

LLM Wiki – example of an "idea file"(gist.github.com)

https://x.com/karpathy/status/2040470801506541998

https://xcancel.com/karpathy/status/2040470801506541998

293 points | 92 comments

devnullbrain 2 days ago|

I don't see why this wouldn't just lead to model collapse:

https://www.nature.com/articles/s41586-024-07566-y

If you've spent any time using LLMs to write documentation you'll see this for yourself: the compounding will just be rewriting valid information with less terse information.

I find it concerning Karpathy doesn't see this. But I'm not surprised, because AI maximalists seem to find it really difficult to be... "normal"?

Rule of thumb: if you find yourself needing to broadcast the special LLM sauce you came up with instead of what it helped you produce, ask yourself why.

gojomo 2 days ago||

Here in 2026, many forms of training LLMs on (well-chosen) outputs of themselves, or other LLMs, have delivered gigantic wins. So 2024 & earlier fears of 'model collapse' will lead your intuition astray about what's productive.

It is unlikely you are accurately perceiving some limitation that Karpathy does not.

ChrisGreenHeur 2 days ago|||

The article is not on training LLMs. it is about using LLMs to write a wiki for personal use. The article assumes a fully trained LLM such as ChatGPT or Claude already exists to be used.

hombre_fatal 1 day ago|||

Also, TFA prescribes putting ground truth source files into a /raw directory.

Everything is derived from them and backlinks into them. Which is necessary to be vigilant about staleness, correctness, drift, and more. Just like in a human-built knowledge base.

khalic 1 day ago|||

Don't even try, after vibe coding, people seem to be adopting vibe thinking. "Model Collapse sounds cool, I'm gonna use it without looking up"

mikkupikku 1 day ago||

Vibe thinking... that's an interesting premise. I'll have to build up my new llm-wiki before I'll know what to think about "vibe thinking."

mikkupikku 1 day ago||

I was joking but also not joking, this llm-wiki idea is fun. I fed into it it's own llm-wiki.md, Foucault's Pendulum, randomly collected published papers about the philosophy of GiTS, several CCRU essays, and Manufacturing Consent. It drew fun red yarn between all of them, about the topic of red yarn (e.g. schizos drawing connections out of nothing, particularly through the use of computers, and how this relates to itself doing literally this as it does it.)

I'll spare you most of the slop but.. "The Case That I Am Abulafia: The parallel is uncomfortable and precise. [...]"

Yeah... It's fun though.

sebmellen 2 days ago|||

Edit for context: the sibling comment from karpathy is gone after being flagged to oblivion. Not sure if he deleted it or if it was just removed based on the number of flags? He had copy-pasted a few snarky responses from Claude and essentially said “Claude has this to say to you:” followed by a super long run on paragraph of slop.

————

Wow, I respect karpathy so much and have learned a ton from him. But WTF is the sibling comment he wrote as a response to you? Just pasting a Claude-written slop retort… it’s sad.

Maybe we need to update that old maxim about “if you don’t have something nice to say, don’t say it” to “if you don’t have something human to say, don’t say it.”

So many really smart people I know have seen the ‘ghost in the machine’ and as a result have slowly lost their human faculties. Ezra Klein, of all people, had a great article about this recently titled “I Saw Something New in San Francisco” (gift link if you want to read it): https://www.nytimes.com/2026/03/29/opinion/ai-claude-chatgpt...

prodigycorp 2 days ago|||

It's not sad. He's a person like you and me. devnullbrain's comment is snarky. He invoked model collapse which has nothing to do with the topic of a wiki/kb, he wrote that karpathy is not normal, and then seemed to imply that the idea was useless. I'd be pretty in my feels and the fact that he wrote it and deleted it seems like a +1 normal guy thing.

sebmellen 1 day ago||

Yeah. I know you didn’t see it, but it was truly a substance-free response. Glad to see he deleted it and I know I’ve been guilty of the same kind of knee-jerk response before.

prodigycorp 1 day ago||

I saw it. It sucked, I agree. But like you said, we all get one (or a few) of those.

moralestapia 2 days ago||||

Lol at that.

It's weird how some people cover the whole range of putting out some really good stuff and other times the complete opposite.

Feels as if they were two different people ... or three, or four.

iamflimflam1 1 day ago||

Emotional state, tiredness, drunkenness, a goods nights sleep… the number of factors that drive our responses is ridiculous.

girvo 2 days ago||||

Eh, he’s just a person. I’m not surprised he posted a rude comment haha, and it got rightfully flagged off the site for being AI slop.

Appreciate the gift link, I’ll give it a read!

devnullbrain 1 day ago|||

Aw, I missed it

jahala 1 day ago|||

I did a proof of concept for self-updating html files (polyglot bash/html) some weeks ago. It actually works quite well, with simple prompting it seems to not just go in circles (https://github.com/jahala/o-o)

kwar13 2 days ago|||

also my experience. it can't even keep up with a simple claude.md let alone a whole wiki...

Vetch 2 days ago||

This sounds very like Licklider's essay on Intelligence Amplification: Man Computer Symbiosis, from 1960:

> Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions. They will think of mechanisms, procedures, and models. They will remember that such-and-such a person did some possibly relevant work on a topic of interest back in 1947, or at any rate shortly after World War II, and they will have an idea in what journals it might have been published. In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.

> In addition, men will handle the very-low-probability situations when such situations do actually arise. (In current man-machine systems, that is one of the human operator's most important functions. The sum of the probabilities of very-low-probability alternatives is often much too large to neglect. ) Men will fill in the gaps, either in the problem solution or in the computer program, when the computer has no mode or routine that is applicable in a particular circumstance.

> The information-processing equipment, for its part, will convert hypotheses into testable models and then test the models against data (which the human operator may designate roughly and identify as relevant when the computer presents them for his approval). The equipment will answer questions. It will simulate the mechanisms and models, carry out the procedures, and display the results to the operator. It will transform data, plot graphs ("cutting the cake" in whatever way the human operator specifies, or in several alternative ways if the human operator is not sure what he wants). The equipment will interpolate, extrapolate, and transform. It will convert static equations or logical statements into dynamic models so the human operator can examine their behavior. In general, it will carry out the routinizable, clerical operations that fill the intervals between decisions.

https://www.organism.earth/library/document/man-computer-sym...

ramoz 2 days ago|

Wow. fascinating insights he had.

e.g. (amongst many others) Desk-Surface Display and Control: Certainly, for effective man-computer interaction, it will be necessary for the man and the computer to draw graphs and pictures and to write notes and equations to each other on the same display surface. The man should be able to present a function to the computer, in a rough but rapid fashion, by drawing a graph. The computer should read the man's writing, perhaps on the condition that it be in clear block capitals, and it should immediately post, at the location of each hand-drawn symbol, the corresponding character as interpreted and put into precise type-face.

kenforthewin 2 days ago||

This is just RAG. Yes, it's not using a vector database - but it's building an index file of semantic connections, it's constructing hierarchical semantic structures in the filesystem to aid retrieval .. this is RAG.

On a sidenote, I've been building an AI powered knowledge base (yes, it uses RAG) that has wiki synthesis and similar ideas, take a look at https://github.com/kenforthewin/atomic

panarky 2 days ago||

There's nothing about RAG that requires embeddings.

The retrieval part can be grep if you don't care about semantic search.

alfiedotwtf 2 days ago|||

You should have started your comment with “ I have a few qualms with this app”.

I’ve been thinking something along the lines of a LLM-WIKI for a while now which could truely act as a wingman-executive-assistant-second-brain, but OP has gone deeper than my ADHD thoughts could have possibly gone.

Looking forward to seeing this turn into fruition

Jet_Xu 2 days ago|||

I believe Multimodal KB+Agentic RAG is a suitable solution for personal KB. Imagine you have tons of office docs and want to dig some complex topics within it. You could try https://github.com/JetXu-LLM/DocMason

Fully retrieve all diagram or charts info from ppt and excels, and then leverage Native AI agents(e.g. Codex) to conduct Agentic Rad

locknitpicker 2 days ago|||

> This is just RAG.

More to the point, this is how LLM assistants like GitHub Copilot use their custom instructions file, aka copilot-instructions.md

https://docs.github.com/en/copilot/how-tos/configure-custom-...

darkhanakh 2 days ago||

eh i'd push back on "just RAG". like yes the retrieval-generation loop is RAG shaped, no ones arguing that. but the interesting bit here is the write loop - the LLM is authoring and maintaining the wiki itself, building backlinks, filing its own outputs back in. thats not retrieval thats knowledge synthesis. in vanilla RAG your corpus is static, here it isnt

also the linting pass is doing something genuinely different - auditing inconsistencies, imputing missing data, suggesting connections. thats closer to assistant maintaining a zettelkasten than a search engine returning top-k chunks

cool project btw will check it out

devmor 2 days ago|||

This is just persistent memory RAG. I have had a setup like this since about a day after I started using copilot, except it's an MCP server that uses sqlite-vec and has recall endpoints to contextually load the proper data instead of a bunch of extra files polluting context.

OP's example isn't something new or incredibly thoughtful at all - in fact this pattern gets "discovered" every other day here, reddit or social media in general by people that don't have the foresight to just look around and see what other people are doing.

kenforthewin 2 days ago||||

I agree with you, the linting pass seems valuable and it's something I'm thinking about adding - it's a great idea.

What I'm pushing back on specifically is the insistence that the core loop - retrieving the most relevant pieces of knowledge for wiki synthesis - is not RAG. In order for the LLM to do a good job at this, it needs some way to retrieve the most relevant info. Whether that's via vector DB queries or a structured index/filesystem approach, that fundamental problem - retrieving the best data for the LLM's context - is RAG. It's a problem that has been studied and evaluated for years now.

thanks for checking it out

Covenant0028 2 days ago|||

I'm curious how this linting step scales with larger wikis. Looking for an inconstency across N files requires N*N comparisons, and that's assuming each file contains a single idea.

ChrisGreenHeur 2 days ago||

Presumably, randomness and only looking at a limited subset will semi-ensure over time that most contradictions will surface. Alternatively, how large do you really expect this kind of thing to be, there is a limit to the amount of facts from Warhammer 40k worth saving in a wiki.

Imanari 2 days ago||

Isn’t this just kicking the can down the road?

> but the LLM is rediscovering knowledge from scratch on every question

Unless the wiki stays fully in context now the LLM hast to re-read the wiki instead of re-reading the source files. Also this will introduce and accumulate subtle errors as we start to regurgitate 2nd-order information.

I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.

lelanthran 1 day ago||

> I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.

We've already got 1m context, 800k context, and they still start "forgetting" things around the 200k - 300k mark.

What use is 10M context if degradation starts at 200k - 300k?

SOLAR_FIELDS 2 days ago|||

I use a home baked system based on obsidian that is essentially just “obsidian but with structured format on top with schemas” and I deploy this in multiple places with ranges of end users. It is more valuable than you think. The intermediary layer is great for capturing intent of design and determining when implementation diverges from that. There will always be a divergence from the intent of a system and how it actually behaves, and the code itself doesn’t capture that. The intermediate layer is lossy, it’s messy, it goes out of date, but it’s highly effective.

It’s not what this person is describing though. A self referential layer like this that’s entirely autonomous does feel completely valueless - because what is it actually solving? Making itself more efficient? The frontier model providers will be here in 3 weeks doing it better than you on that front. The real value is having a system that supports a human coming in and saying “this is how the system should actually behave”, and having the system be reasonably responsive to that.

I feel like a lot of the exercises like op are interesting but ultimately futile. You will not have the money these frontier providers do, and you do not have remotely the amount of information that they do on how to squeeze the most efficiency in how they work. Best bet is to just stick with the vanilla shit until the firehose of innovation slows down to something manageable, because otherwise the abstraction you build is gonna be completely irrelevant in two months

ctxc 1 day ago|||

Interesting, I'd love to know more. Are parts of it public?

SOLAR_FIELDS 1 day ago||

Indeed, I have it open source, but want to preserve my anonymity here. The main gist of it is Quartz as a static site frontend bundle, backed by Decap as an editor, so that non technical users can edit documents. The validation is twofold - frontmatter is validated by a typical yaml validator library, and then I created markdown body validation using some popular markdown AST libraries, so there are two sets of schemas - one for the frontmatter, one for the body, and documents must conform via ci. I ship it with a basic cli that essentially does validation and has a few other utilities. Not really that much magic, maybe 500 lines of code or so in the CLI and another few hundred lines doing validation and the other utilties. It's all in typescript, so I use the same validation in Decap when people do edits.

dennisy 1 day ago|||

The “next gen of models” argument is a valid one and one I think of often, but if you truly buy it, it would stop do creating anything - since the next gen of models could make it obsolete.

khalic 1 day ago|||

The goal isn’t to keep the context every time, it’s to make the memory queryable. Like a data lake but for your ideas and decisions

0123456789ABCDE 1 day ago|||

this solves for now, and this solves for the future.

now you get to condense the findings that interest from a handful of papers

in the future it solves for condensing your interests in a whole field to a handful of papers or less

nidnogg 1 day ago||

It is how I feel when I do it. And it certainly shows over time.

nidnogg 1 day ago||

I've recently lazied out big time on a company project going down a similar rabbit hole. After having a burnout episode and dealing with sole caregiver woes in the family for the past year, I've had less and less energy to piece together intense, correct thought sequences at work.

As such I've taken to delegating substantial parts architecture and discovery to multiagent workflows that always refer back to a wiki-like castle of markdown files that I've built over time with them, fronted by Obsidian so I can peep efficiently often enough.

Now I'm certainly doing something wrong, but the gaps are just too many to count. If anything, this creates a weird new type of tech debt. Almost like a persistent brain gap. I miss thinking harder and I think it would get me out of this one for sure. But the wiki workflow is just too addictive to stop.

stingraycharles 1 day ago||

> I miss thinking harder

Me too, and I wonder where this will take us; I worry about losing the ability to think hard.

AlecSchueler 1 day ago|||

We can still think hard about other things? I like to do creative writing now, and I'm learning the concertina which is a real mental workout.

jareklupinski 1 day ago|||

im hoping to re-use the newly garbage-collected memory available to me now to rediscover "play hard"

kubb 1 day ago|||

You’re not doing anything wrong. This isn’t a bulletproof idea. It can work, and this is what a lot of people end up with to manage complexity, but there’s a critical point beyond which things collapse: the agent can’t keep the wiki up to date anymore, the developer can’t grok it anymore.

kaashif 1 day ago||

Managing complexity, modularity, separation of concerns, were already critical for ensuring humans could still hold enough of the system in their brains to do something useful.

People who do not understand that will continue to not understand that it also applies to AI right now. Maybe at some point in the future it won't, not sure. But my impression is that systems grow in complexity far past the point where the system is gummed up and no-one can do anything, unless it's actively managed.

If a human can understand 10 units of complexity and their LLM can do 20, then they might just build a system that's 30 complex and not understand the failure modes until it's too late.

stingraycharles 1 day ago||

> People who do not understand that will continue to not understand that it also applies to AI right now.

I think this is mostly a matter of expectation management. AIs are being positioned as being able to develop software independently, and that’s certainly the end goal.

So then people come in with the expectation that the AI is able to manage that, and it fails. Spectacularly.

The LLM can certainly not manage any non-local complexity right now, and succeed in increasing the technical debt and complexity faster than ever before.

loveparade 1 day ago|||

That has been my experience as well. Most of the value of writing docs or a wiki is not in the final artifacts, it's that the process of writing docs updates your own mental models and knowledge so that you can make better decisions down the road.

Even if you can get an LLM to output good artifacts that don't eventually evolve into slop, which is questionable, it's really not that useful, especially not for a personal wiki.

kilroy123 1 day ago|||

Makes me think of all these tools that use AI to make fancy flashcards for you to study.

It seems rather silly to me, as _creating_ those flashcards is what helps you learn, with the studying after, cementing that knowledge in your brain.

kaashif 1 day ago||

The ultimate connecting up of the dots would be brain implants that just give you knowledge with zero effort.

I don't know if I'd ever be comfortable with that, hopefully I'll just be retired or dead when that takes off.

nidnogg 1 day ago||||

And what happens when the bucket of knowledge gets too big and starts to overflow? I feel as if, by delegating that process of building knowledge too much, I end up accruing knowledge gaps of my own. Funnily enough it mirrors the LLM/agent's performance.

Maybe my recent prompts reflects how badly up to speed I am at a given time? I don't know. A slightly related note - I recently heard the term AI de-skilling, this treads close to it imo.

nidnogg 1 day ago|||

The worst part to me, by far, is having nothing more than a bunch of "smart" markdown files to show as my deliverables for the day. Sometimes this stacks for many days on end. Usually the bigger the knowledge gaps are, the more I procrastinate on real work.

Talk about back to school feelings (!)

mikkupikku 1 day ago||

Just w.r.t having time to think harder, have you considered getting a hobby that forces you to go offline and do something repetitive so your mind can wander? I do this with walks (phone left at home) and sometimes swimming laps. Physical exercise may not seem appealing if you're in burnout territory, but I think it's worth trying because for me at least it's a different, mostly orthogonal, kind of fatigue.

voidhorse 2 days ago||

This makes me feel like karpathy is behind on the times a tad. Many agent users I know already do precisely this as part of "agentic" development. If you use a harness, the harness is already empowered to do much of this under the hood, no fancy instruction file required. Just ask the agent to update some knowledge directory at the end of each convo, done. If you really need to automate it, write some scheduling tool that tells the agent to read past convos and summarize. It really is that easy.

nurettin 2 days ago||

He really wants to shine, but how is this different than claude memory or skills? When I encounter something it had difficulty doing, or consistently start off with incorrect assumptions, I solve for it and tell it to remember this. If it goes on a long trial and error loop to accomplish something, once it works I tell it to create a skill.

locknitpicker 2 days ago|

> He really wants to shine, but how is this different than claude memory or skills?

It isn't different. This just tries to reinvent the wheel that all mainstream coding assistants have been providing for over a year.

Even ChatGPT rolled out chat memory in their free tier.

TeMPOraL 1 day ago||

The difference obviously being, his way you own the memories; in what's currently deployed, it's the platform that owns them.

nurettin 1 day ago|||

Skills and memory are .md files on your system.

locknitpicker 1 day ago|||

> The difference obviously being, his way you own the memories; in what's currently deployed, it's the platform that owns them.

I don't think you are very familiar with the state of LLM assistants. Prompt files were a thing since ever, and all mainstream coding assistants support storing your personal notes in your own files in multiple ways. Prompt files, instruction files, AGENTS.md, etc.

mpazik 2 days ago||

Happy to see it gets attention. The friction shows up once you mix docs with structured things like work items or ADRs. Flat markdown doesn't query well and gets inconsistent. You can read TASKS.md fine. The agent can't ask "show me open tasks blocking this epic" without scanning prose or maintaining a parallel index.

The AGENTS.md approach papers over this by teaching the LLM the folder conventions. Works until the data gets complex but gets worse after many iterations.

Both are needed: files that open in any editor, and a structured interface the agent can actually query. Been building toward that with Binder (github.com/mpazik/binder), a local knowledge platform. Data lives in a structured DB but renders to plain markdown with bi-directional sync. LSP gives editors autocomplete and validation. Agents and scripts get the same data through CLI or MCP.

sornaensis 1 day ago||

I've been working on my own thing with more of a 'management' angle to it. It lets me connect memories to tasks and projects across all of my workspaces, and gives me a live SPA to view and edit everything, making controlling what the models are doing a lot easier in my experience https://github.com/Sornaensis/hmem in a way that suits how I think vs other project management or markdown systems.

I would be interested in trying to make the models go into more of a research mode and organize their knowledge inside it, but I've found this turns into something like LLM soup.

For coding projects, the best experience I have had is clear requirements and a lot of refinement followed through with well documented code and modules. And only a few big 'memories' to keep the overall vision in scope. Once I go beyond that, the impact goes down a lot, and the models seem to make more mistakes than I would expect.

cyanydeez 2 days ago|

Too much context pollution.

Start with short text context, and flow through DAGs via choose your own adventure. We alreadybreached context limits. Nows the time to let LLMs build their contexts through decision trees and prune dead ends.

j-pb 2 days ago|

In my experience a wiki can actually drastically reduce the amount of dead context.

I've handed my local agents a bunch of integrated command line tools (kinda like an office suite for LLMs), including a wiki (https://github.com/triblespace/playground/blob/main/facultie... ) and linkage really helps drastically reduce context bloat because they can pull in fragment by fragment incrementally.

cyanydeez 2 days ago||

Was also thinking to disambiguate context where you wish to express a tokens function (eg, top) as different from one could use unique ASCII prefix (eg, ∆top) to avoid pollution between the english and the linux binary.

Youd then alias these disambiguated terms and theyd still trigger the correct token autocomplete but would reduce overlap which cause misdirection.

More comments...