Posted by danielfalbo 3 days ago
> The fundamental challenge in AI for the next 20 years is avoiding extinction.
https://lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-lis...
Yup, this will absolutely be a big driver of gains in AI for coding in the near future. We actually built a benchmark based on this exact principle: https://algotune.io/
Denoising diffusion models benefited a lot from the u-net, which is a pretty simple network (compared to a transformer) and very well-adapted to the denoising task. Plus diffusion on images is great to research because it's very easy to visualize, and therefore to wrap your head around
Doing diffusion on text is a great idea, but my intuition is it will prove more challenging, and probably take a while before we get something working
If you know labs / researchers on the topic, i'd love to read their page / papers
But did any AI researchers actually claim there was no representation of meaning? I thought generally, the criticism of LLMs was that while they do abstract from their corpus - ie, you can regard them as having a representation of "meaning" - it's tightly and inextricably tied to the surface level representation, it isn't grounded in models of the external world, and LLMs have poor ability to transfer that knowledge to other surface encodings.
I don't know who the "certain AI researchers" are supposed to be. But the "stochastic parrot" paper by Bender et al [1] says:
> Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind.
That's a very different objection to the one antirez describes - I think he's erecting a straw man. But I'd be happy to be corrected by anyone more familiar with the research.
This means exactly that no representation should exist in the activation states about what the model wants to tell, and there must be only a single token probabilistic inference at play.
Also their model requires the contrary, too: that the model does not know, semantically, what the query really means.
Stochastic Parrot has a scientific meaning, and just only observing the function of the models, it is quite evident that they were very wrong, but now we have stong evidence (via probing) that also the sentence you quoted is not correct, since the model knows the idea to express also in general terms, and features about things it is going to say much later activates a lot of tokens earlier, including conceptual features that are relevant later in the sentence / concept expressed.
You are doing the big error that is common to do in this context of extending the stochastic parrot to a non scientifically isolated model that can be made large enough to accomodate any evidence arriving from new generations of models. The stochastic parrot does not understand the query nor is trying to reply to you in any way, it just exploits a probabilistic link among the context window and the next word. This link can be more complex than a Markov chain but must be of the same kind: lacking understanding whatsoever and communication intent (no representation of the concept / sentences that are required to reply correctly). How it is possible to believe in this, today? And, check yourself what the top AI scientists today believe about the correctness of the stochastic parrot hypothesis.
> This means exactly that no representation should exist in the activation states about what the model wants to tell, and there must be only a single token probabilistic inference at play.
That's not correct. It's clear from the surrounding paragraphs what Bender et al mean by this phrase. They mean that LLMs lack the capacity to form intentions.
> You are doing the big error that is common to do in this context of extending the stochastic parrot to a non scientifically isolated model that can be made large enough to accomodate any evidence arriving from new generations of models.
No, I'm not. I haven't, in fact, made any claims about the "stochastic parrot". Rather, I've asked whether your characterisation of AI researchers' views is accurate, and suggested some reasons why it may not be.
Around the world people ask an LLM and get a response.
Just grouping and analysing these questions and solving them once centrally and then making the solution available again is huge.
Linearly solving the most asked questions and then the next one then the next will make, whatever system is behind it, smarter every day.
I wonder how a "programmers + AI" self-improving loop is different from an "AI only" one.
AGI will also be generic.
LLM is already very impressive though
The two main limitations of the Transformer that it helps with are:
1) A Transformer is just a fixed-size stack of layers, with a one-way flow of data through the layers from input to output. The fixed number of layers equates to how many "thought" steps the LLM can put into generating each word of output, but good responses to harder questions may require many more steps and iterative thinking...
The idea of "think step by step", aka chain of thought, is to have the model break it's response down into a sequence of steps, each building on what came before, so that the scope of each step is withing the capability of the fixed number of layers of the transformer.
2) A Transformer has extremely limited internal memory from one generated word to the next, so telling the model to go one step at a time, feeding its own output back in as input, in effect makes the model's output a kind of memory that makes up for this.
So, chain of thought prompting ultimately give the model more thinking steps (more words generated), together with memory of what it is thinking, in order to be able to generate a better response.
Does your clairvoyance go any further than 2027?
If you assume that we're only one breakthrough away (or zero breakthroughs - just need to train harder), then the step could happen any time. If we're more than one away, though, then where are they? Are they all going to happen in the next two years?
But everybody's guessing. We don't know right now whether AGI is possible at current hardware levels. If it is N breakthroughs away, we all have our own guesses of approximately what N is.
My guess is that we are more than one breakthrough away. Therefore, one can look at the current state of affairs and say that we are unlikely to get to AGI by 2027.
why are you so sensitive?
This reminded me of the Don’t look up movie where they basically gambled with the humans extinction.
> Chain of thought is now a fundamental way to improve LLM output.
That kinda proves _that LLMs back then were pretty much stochastic parrots indeed_, and the skeptics were right at the time. Today, enthusiasts agree with what they previously said: without CoT, the AI feels underwhelming, repetitive and dumb and it's obvious that something more was needed.
Just search past discussions about it, people were saying the problem would be solved with "larger models" (just repeating marketing stuff) and were oblivious to the possibility of other kinds of innovations.
> The fundamental challenge in AI for the next 20 years is avoiding extinction.
That is a low level sick burn on whoever believes AI will be economically viable short-term. And I have to agree.