Posted by robbie-c 1 day ago
I think democratization of intelligence is going to be interesting. You could say the same with same about internet. I think it is part of evolution. May be intelligence or expertise is what does not make us special. May be it is that we are ingenious amd creative with tools and thats how we evolve.
I'm not trying to be pedantic; I think this is an interesting topic and there's a worthwhile distinction to make here. It isn't really being democratized for a couple reasons (at least).
One, access to information isn't truly knowledge in and of itself. People allowing information from LLMs to pass through their brains are not necessarily retaining any of it, and their ability to synthesize and utilize disparate information from LLMs isn't inherently improved by this technology. So the premise of knowledge isn't very sturdy in my mind.
Two, LLMs function across very broad fields of capability, accuracy, content, and so on, and the best models are not accessible to many people. I find people tend to mean the technology is widely available and accessible when they say 'democratization', but that's not necessarily true nor what that word means to begin with.
True democratization would mean something more like "everyone participates in, shapes, regulates, and grows this technology with their own inputs". I don't think that's what happens at all, and in fact, it has been quite the inversion of that so far.
I mention all of this because I agree that it will be interesting to watch what happens, but I don't agree that it will be for the same reasons. I worry about it specifically because there is not an egalitarian distribution of knowledge, and it is not democratically built or shared.
That means the chain of thought “brains volume decreased, so individuals must have gotten less intelligent. Yet, societies grew smarter, so there must be herd intelligence” breaks at “so individuals must have gotten less intelligent”.
I think/guess that argument may have merit when replacing brain volume by number of neurons (https://en.wikipedia.org/wiki/List_of_animals_by_number_of_n...)
Einstein’s brain reportedly was below average size.
That’s a n = 1 example, but there also is a 50/50 example: man vs women. on average, the brains of males are about 10% larger than those of women (https://en.wikipedia.org/wiki/Neuroscience_of_sex_difference...). That doesn’t show up in intelligence differences (https://en.wikipedia.org/wiki/Sex_differences_in_intelligenc...)
Even only looking at males or females, I don’t think larger (fe)males tend to be more intelligent.
And every time you use the AI to be ingenious or creative, that will be added to the training data. Then someday the AI can be ingenious and creative without you! (It might take a few more breakthroughs. But investors will literally spend trillions chasing those breakthroughs.)
The endgame here is to replace all human intelligence and labor with machines that are smarter and work cheaper. But who controls the machines?
We as humans have always outsmarted the tools.
I'm not about to say that there's nothing new under the sun, but parsers are a really well-understood problem where 99.9% of people don't need frontier knowledge and wouldn't be in a position to use it anyway.
And I don't think that people doing research on parsers would ever rely on LLMs for precisely that reason. But we're not parser researchers right?
So if you were lazily copying the first blog result in Google, getting the first answer from an LLM is equivalent, but the output is actually likely to be better.
If you wanted to do your research on various techniques and evaluate alternatives, LLMs can amplify your capacity to research and to have specific considerations for your specific problem.
LLMs aren't going to solve people's natural inclination towards laziness.
Additionally, while it's true that people may read and learn less about the "lower" levels of software plumbing, it enables enormous possibilities of higher level thinking that before were limited by the amount of manpower you needed.
For example, with LLMs I can try different test sharding strategies or trivially change from factories to fixtures in large test suites. This would have been busywork or drudgery; now I can evaluate several architectural solutions which would not have been possible before.
“Whoa slow down with this ‘writing’ technology. No one will ever remember anything if they can just write it down.”
> Instead, we use ANTLR, a state-of-the-art, open source parser generator.
I don't agree with this (pre-AI-coding) take. Hand-rolled parsers are much easier to write well and maintain than people think. They also tend to be much faster and produce much better errors than parser generators. I guess if the language you're trying to parse is, say, C++, then you're going to have a miserable time (probably no matter what). But an SQL parser is very doable. (I say this as the author and maintainer of an in-house SQL dialect thingy at work.)
What makes building and maintaining a hand-written parser such a tractable task is:
- The code size can be large, but you can start with a core of a few well-chosen abstractions and then you add lots of parsing code for various language constructs but it's all kind of orthogonal and doesn't add compounding complexity as you go. - It's just about the most testable kind of code there is. You can cover all the various corner cases with tests and really lock in the behavior so that you can very confidently make changes. One approach I like is to make zillions of tiny test files in the target language accompanied by some golden representation of the AST.
And of course, as the author found out, these properties make writing a parser a really good task for AI coding, too. These tools are very, very good at generating a bunch of new code based on existing abstractions and covering it with lots of test cases.
So I agree with where they ended up, just not where they started :)
The whole notion grammars are hard is just wrong. They are not only powerful, but super simple in fact. As is the basic regexp if one cares to spend a focused afternoon to understand it. Probably even less time if working with a decent teacher.
Makes me think of all the algorithms we specify in proof languages and then hand-implement in production languages - this setup could maybe let you just specify the proof of an algorithm and then let LLMs derive efficient implementations with the (slow) proof as an oracle
If you have an oracle, and your problem is largely just a pure function, it's pretty good at generating something that both works and is fast.
I have a tool I make as a data-plane to a graph engine, and it uses cap'n proto to help (And sqlite as a sort've IPC option). One of the biggest things I have is, I know I am not testing all of it to completion. I am not even really fuzzing, yet.
Thanks for sharing!
Perhaps the next target for a 100x improvement
So it's technically vibe-coding in the sense you don't really look at the code, you just look at the results and "go by the vibes"... except now you're working to rigorously quantify and enforce those vibes. (Philosophical aside: once vibes are rigorously enforced are they "vibes" anymore?)
Recently I was messing around with parquet files in Python and ended up needing to ship the results on Windows, without a Windows machine to test on.
Shipping Python to end users is half mad already, and doing it on Windows is exactly the kind of thing I don't want to spend my life maintaining.
So I figured I'd rewrite it in Go. But that meant embedding a DLL, and how would I test it? I could spin up a VM, sure. But GitHub Actions already has a Windows environment, and there was my loop: let the agent push to the repo, run tests in GHA, rinse and repeat.
In under an hour it had a full rewrite of my Python, passing every test and producing row-for-row copies of my Parquet output. And it does work on the user machine!
Spotting a loop like that is as satisfying as noticing you can walk your chess opponent into a smothered mate. Truly empowering.
Also Windows used to have a free VHD with a trial license you could download (and convert to different format with qemu-img)
I didn't think of checking, but I now learnt there's an extension for DuckDB but it's C++ and also embeds the same DLL [0] https://github.com/flozer/duckdb-firebird