LLM Wiki – example of an "idea file"

Posted by tamnd 2 days ago

LLM Wiki – example of an "idea file"(gist.github.com)

https://x.com/karpathy/status/2040470801506541998

https://xcancel.com/karpathy/status/2040470801506541998

293 points | 92 commentspage 3

mbreese 2 days ago|

I’ve been doing something similar with a RAG system where in addition to storing the documents, we use an LLM to pull out “facts”. We’re using the LLM to look for relationships between different entities. This is then also returned when we query the database.

But I like the idea of an LLM generated/maintained wiki. That might be a useful addition to allow for more interactive exploration of a document database.

mememememememo 2 days ago||

This sounds like compaction for RAG.

jdthedisciple 2 days ago||

The challenge to me seems quality assurance:

I'd rather have it source the original document everytime, then an LLM-generated wiki which I most likely wouldn't have the time to fact-check and review myself.

argee 2 days ago||

This is what Semiont is trying to do, to some extent [0].

Doesn't really feel that useful in practice.

[0] https://github.com/The-AI-Alliance/semiont

vbarsoum 14 hours ago||

  I built an implementation of this and tested it on 3 Alex Hormozi books (~155K words, 68 source files). Some data for the skeptics:
                                                                                                                                                                                              
  The naive version (each book as 1 file) produced exactly the slop people are describing here. But splitting into chapter-level files and recompiling changed the output categorically. Same model, same prompts — the only variable was source granularity.                                                                                                                             
                  
  The compiler produced 210 concept pages with 4,597 cross-references (19.2 avg links per page). 20+ concepts synthesized across all 3 books unprompted — one pulled from 11 source files and found a genuine contradiction between two books that neither makes explicit. 173K words of output from 155K input. It's not compression — it's synthesis.
                                                                                                                                                                                              
  The thing I think the "this is just RAG" comments are missing: a vector database is only useful to machines. You can't open a .faiss file and browse it. A wiki is useful to both. I open these files in Obsidian, browse the graph, follow links, read concept pages — no AI needed. But when I do ask the AI a question, it reads the same wiki pages I do, and the answers are better than RAG because the knowledge is already structured and cross-referenced instead of retrieved as raw chunks.                                                                        
                  
  That's the key insight in Karpathy's idea. The compiled wiki is the interface for humans AND the knowledge layer for AI. Same artifact, two audiences.                                      
   
  ~Cost: 12M tokens, ~10-15 min. Repo: https://github.com/vbarsoum1/llm-wiki-compiler

serendipty01 2 days ago||

Similar: https://v-it.org/doctrine/

Lockal 2 days ago||

This thing already exists for multiple years - see https://deepwiki.com/ (99% it is autonomous, but it can be manually structured - see https://docs.devin.ai/work-with-devin/deepwiki#steering-deep...). There were also multiple attempts to replicate it with local LLMs.

The problem is that it is still a slop: not only it adds a lot of noise ("architecture" diagrams based on some cherry-picked filenames, incomplete datatables, hyperfocusing on strange things), it also hallucinates, adding factually incorrect information (while direct questions to LLM shows correct information).

0123456789ABCDE 2 days ago||

this is so validating•

https://grimoire-pt5.sprites.app/

john_minsk 2 days ago|

what is this?

qaadika 2 days ago||

> You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time.

I'm not sure how you can get any closer to "turning your thinking over to machines." These tasks may be "grunt work," but it's while doing these things that new ideas pop in, or you decide on a particular or novel way to organize or frame information. Many of my insights in my (analog? vanilla? my human-written) Obsidian vault (that I consider my "personal wiki") have been made or expanded on because I happened to see one note after another in doing the "grunt work", or just by opening one note and seeing its title right beside a previously forgotten one.

There's nothing "personal" about a knowledge base you filled by asking AI questions. It's the AI's database, you just ask it to write stuff. Learn how to learn and answer your own damn questions.

Soon pedagogy will be a piece of paper that says "Ask AI."

I hate this idea that a result is all that matters, and the quicker you can get the result the better, at any cost (mental or financial, short-term or long-term).

If we optimized showers to be 20 seconds, we'd stop having shower thoughts. I like my shower thoughts. And so too my grunt-work thoughts.

---

As an aside, I'm not totally against AI writing in a personal knowledgebase. I include it at times in my own. But since I started my current obsidian vault in 2023 (now 4100 self-written notes, including maybe up to 5% Web Clipper notes), I've had a Templater (Obsidian plugin) template I wrap around anything AI-written to 'quarantine' it from my own words:

==BEGIN AI-GENERATED CONTENT==

<% tp.file.cursor(1) %>

==END AI-GENERATED CONTENT==

I've used this consistently and it's helped me keep (and develop) my own writing voice apart from any of my AI usage. It actually motivates me to write more, because I know I could always take the easy route and chunk whatever I'm thinking into the AI, but I'm choosing not to by writing it myself, with my own vocabulary, in my own voice, with my own framing. I trick myself into writing because my pride tells me I can express my knowledge better than the AI can.

I also manually copy and paste from wherever I'm using AI into my notes. Nothing automated. The friction keeps me from sliding into the happy path of turning my brain off.

mold_aid 2 days ago|

Since you're a fellow Obsidian user, you likely remember the early days of back-linking note-taking software like Roam and such. I remember just seeing pictures of the graph being the primary visual symbol representing the depth of learning. I thought "ok well people just want to accumulate stuff." AI tools certainly help with creating a mass of notes.

There's a comment above how this is reminiscent of Licklider's work, but it reminds of the early print culture era, where books were a consumer item, and people just purchased a lot of them to put on shelves built to display them.

qaadika 2 days ago||

I actually never got into note-taking before I found Obsidian. I used Google Drive all throughout college and up to 2023, so any knowledge I had written down was sequestered by an ad-hoc folder structure that was mostly chronological by year, or in my physical notebooks by subject. It was also limited by what I felt was worth writing down enough that it merited a Doc, or what I could write in one session before getting distracted and never touching it again. And mentally I limited myself by always wanting to write something down "right", all spell-checked and grammatically correct and sensically organized, which led to often not writing anything down at all. Now I dump the words down and come back to it later when I want to "garden."

My brother showed me stuff like Trilium circa 2017, but I hadn't the thinking process I do now to even know where to start racking my brain for stuff to write down.

When I read Chernow's biography of Alexander Hamilton, I was in awe of his ability to write so damn much. I never thought I'd be able to do that. Turns out the secret is just three things: have stuff you're passionate about, be able to recall it, and write so damn much about it. When your thinking process is based around "how would I phrase/word/frame this to write it down," if you have the right process and organization to be able to access it later, it's even easier than talking. For some they can keep everything they know in their head. I'm not one of them. but I can write everything down, and in writing it down I end up remembering it better. My professors were right all along.

And if one looks at the actual written letters people like Hamilton [1] and co. wrote, or even back to Isaaci Newtoni [2][3], They're riddled with spelling errors and strange latinesque grammar and informal formalities. Yet they're revered for their ability to write. Because it really is the thought that countts [4], not the words.

(Very little of these comments is new thoughts I'm having now. Most of it is thoughts I had and documented when I was super into PKM in 2023 and since, and now comes back up as those neurons fire again and I consider the new idea of "should AI be my PKM?" after reading the post.)

---

Yeah, the graphs are cool for a little bit. But only post-facto, once one has an amount of data points where it might become useful. If the AI is doing the organization then any personal significance is lost. Or rather it was never there to begin with.

Wikilinks is the feature I use most often, outside of my folder organization (PARA). Now when I have a thought, it goes down a chain of "do I have a note already this can go into," to "no, it should be a new note. are there any notes I should wikilink in this one, or link this one in?"

I think I made a good decision early on when I was inspired by the Emacs documentation to add a basic "Related: " line before the first header (and after the YAML). There I dump any wikilink I think might possibly be something I want to reference, or find this note via a backlink, without having to think about where to put it in the body.

E.g.

{YAML header}

Related: [[Artemis]], [[Artemis II]], [[NASA Engineering]], [[Space MOC]], [[NASA CAPCOM]],

# Artemis II Mission Timeline and Notes

{body and rest of note, my own record of things that happened as I watch the stream}

---

> AI tools certainly help with creating a mass of notes.

Agreed. Presuming the implication is it creates a mass of notes, but of generic information stated generically. I'm really proud of my 4100 notes, because I know (aside from a few catagories like web clipping) even if they're a mess, they're my mess. I definitely could have gone the last three years without having found Obsidian, but I wouldn't have as clear a record of them as I do now. Or the rest of my life, as I slowly add stuff about my past, or migrate old writing into it. I also definitely repeat myself by saying the same information in different places, but in different ways. It's not 'efficient' information density-wise, but it is designed for a human to read and see the human behind the writing.

I also believe I think clearer, as often when I'm recalling information I'm actually recalling my note in my head on that subject. I write so much that in conversation "I was thinking the other day" is analogous to "I wrote down in my notes".

I might be crazy but I would put my vault in my will as something to be passed on, because there's so much me in them. My yearly journals in /02 Areas/Journals/ are the most obvious ones, but I have a /02 Areas/Writing/ folder that's just notes I consider "writing", whic is distincy from the contents of /03 Resources/ folder that's the "general knowledge" knowledgebase.

---

Anyway, I guess my tl;dr is that AI can never write about thinking as well as a human can, and in my opinion it's the thinking that important, not the writing. the writing or the words is merely a tool in thinking. Karpathy mistakes the words to be the goal, rather than the thinking that caused the words.

---

One last thing: I just re-read the HN guidelines out of curiosity, and I noticed they recently added "Don't post generated comments or AI-edited comments."

I could copy and paste almost anything from my vault into an HN comment without violating this rule. Anybody creating a PKM with this sytem could not. They would have to rewrite it in their own words. So one might as well just right it themself in the first place if they ever think they might want to reuse it in a place like HN.

---

[1] https://outhistory.org/exhibits/show/rev/hamilton-laurens-le...

[2] https://www.newtonproject.ox.ac.uk/view/texts/diplomatic/MIN...

[3] A while back, while in a Newton phase, I decided arbitrarily to refer to him as "Isaaci Newtoni," as that's how he called himself. I reinforced that by using that name for him in my notes. Now I call him that instinctually, not consciously.

[4] Intentional.

ansc 1 day ago|

The comments in the gist is depressing.

More comments...