Posted by bigwheels 1/26/2026
> I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media
It has arrived. Github will be most affected thanks to git-terrorists at Apna College refusing to take down that stupid tutorial. IYKYK.
He ran Teslas ML division, but still doesnt know what a simple kalman filter is (in the sense where he claimed that lidar would be hard to integrate with cameras).
I'd guess that cameras on a self-driving car are trying to estimate something much more complex, something like 3D surfaces labeled with categories ("person", "traffic light", etc.). It's not obvious to me how estimates of such things from multiple sensors and predictions can be sensibly and efficiently combined to produce a better estimate. For example, what if there is a near red object in front of a distant red background, so that the camera estimates just a single object, but the lidar sees two?
Kalman filters basic concept is essentially this.
1. make prediction on the next state change of some measurable n dimentional quantity, and estimate the covariance matrix across those n dimentions, which describe essentially a probability that the i-th dimention is going to increase (or decrease) with j-th dimention, where i and j are between 0 and n (indices of the vector)
2. Gather sensor data (that can be noisy), and reconcile the predicted measurement with the measured to get the best guess. The covariance matrix acts as a kind of weight for each of the elements
3. Update the covariance matrix based on the measurements in previous step.
You can do this for any vector of numbers. For example, instead of tracking individual objects, you can have a grid where each element represents a physical object that the car should not drive into, with a value representing certainty of that object being there. Then when you combine sensor reading, you still can use your vision model but that model would be enhanced by what lidar detects, both in terms of seeing things that camera doesn't pick up and rejecting things that aren't there.
And the concept is generic enough to where you can set up a system to be able to plug in any additional sensor with its own noise, and it all works out in the end. This is used all the You can even extend the concept past Gaussian noise and linearity, there are a number of other filters that deal with that, broadly under the umbrella of sensor fusion.
The problem is that Karpathy is more of a computer scientist, so he is on his Code 2.0 train of having ML models do everything. I dunno if he is like that himself or Musks "im smarter than everyone else that came before me" rubbed off.
And of course when you think like that, its going to be difficult to integrate lidar into the model. But the problem with that thinking is that forward inference LLM is not AI, and it will never ever be able to drive a car well compared to a true "reasoning" AI with feedback loops.
Somewhere, there are GPUs/NPUs running hot. You send all the necessary data, including information that you would never otherwise share. And you most likely do not pay the actual costs. It might become cheaper or it might not, because reasoning is a sticking plaster on the accuracy problem. You and your business become dependent on this major gatekeeper. It may seem like a good trade-off today. However, the personal, professional, political and societal issues will become increasingly difficult to overlook.
Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!"
If it does not, this is going to be first technology in the history of mankind that has not become cheaper.
(But anyway, it already costs half compared to last year)
check out whether clocks have gotten cheaper in general. the answer is that it has.
there is no economy of scale here in repairing a single clock. its not relevant to bring it up here.
Getting a bespoke flintstone axe is also pretty expensive, and has also absolutely no relevance to modern life.
These discussions must, if they are to be useful, center in a population experience, not in unique personal moments.
You could not have bought Claude Opus 4.5 at any price one year ago I'm quite certain. The things that were available cost half of what they did then, and there are new things available. These are both true.
I'm agreeing with you, to be clear.
There are two pieces I expect to continue: inference for existing models will continue to get cheaper. Models will continue to get better.
Three things, actually.
The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0].
[0]: https://blog.irvingwb.com/blog/2018/09/a-critical-appraisal-...
This is harmless when it comes to tech opinions but causes real damage in politics and activism.
People get really attached to ideals and ideas, and keep sticking to those after they fail to work again and again.
this is accounting for the fact that more tokens are used.
[1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...
In the ChatGPT product this is not immediately obvious and many people would strongly argue their preference for 4. However, once you introduce several complex tools and make tool calling mandatory, the difference becomes stark.
I've got an agent loop that will fail nearly every time on GPT-4. It works sometimes, but definitely not enough to go to production. GPT-5 with reasoning set to minimal works 100% of the time. $200 worth of tokens and it still hasn't failed to select the proper sequence of tools. It sometimes gets the arguments to the tools incorrect, but it's always holding the right ones now.
I was very skeptical based upon prior experience but flipping between the models makes it clear there has been recent stepwise progress.
I'll probably be $500 deep in tokens before the end of the month. I could barely go $20 before I called bullshit on this stuff last time.
Until you struggle to review it as well. Simple exercise to prove it - ask LLM to write a function in familiar programming language, but in the area you didn't invest learning and coding yourself. Try reviewing some code involving embedding/SIMD/FPGA without learning it first.
No-one has ever learned skill just by reading/observing
The bits left unsaid:
1. Burning tokens, which we charge you for
2. My CPU does this when I tell it to do bogosort on a million 32-bit integers, it doesn't mean it's a good thing
I've been working in the mobile space since 2009, though primarily as a designer and then product manager. I work in kinda a hybrid engineering/PM job now, and have never been a particularly strong programmer. I definitely wouldn't have thought I could make something with that polish, let alone in 3 months.
That code base is ~98% Claude code.
Not sure if it's an American pronunciation thing, but I had to stare at that long and hard to see the problem and even after seeing it couldn't think of how you could possibly spell the correct word otherwise.
Trying to incorporate it in existing codebases (esp when the end user is a support interaction or more away) is still folly, except for closely reviewed and/or non-business-logic modifications.
That said, it is quite impressive to set up a simple architecture, or just list the filenames, and tell some agents to go crazy to implement what you want the application to do. But once it crosses a certain complexity, I find you need to prompt closer and closer to the weeds to see real results. I imagine a non-technical prompter cannot proceed past a certain prototype fidelity threshold, let alone make meaningful contributions to a mature codebase via LLM without a human engineer to guide and review.
It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.
I never paid any attention to different models, because they all felt roughly equal to me. But Opus 4.5 is really and truly different. It's not a qualitative difference, it's more like it just finally hit that quantitative edge that allows me to lean much more heavily on it for routine work.
I highly suggest trying it out, alongside a well-built coding agent like the one offered by Claude Code, Cursor, or OpenCode. I'm using it on a fairly complex monorepo and my impressions are much the same as Karpathy's.
Which is to say you have to learn to use the tools. I've only just started, and cannot claim to be an expert. I'll keep using them - in part because everyone is demanding I do - but to use them you clearly need to know how to do it yourself.
I also find pointing it to an existing folder full of code that conforms to certain standards can work really well.
If you're using plain vanilla chatgpt, you're woefully, woefully out of touch. Heck, even plain claude code is now outdated
After you tried it, come back.
I tried a website which offered the Opus model in their agentic workflow & I felt something different too I guess.
Currently trying out Kimi code (using their recent kimi 2.5) for the first time buying any AI product because got it for like 1.49$ per month. It does feel a bit less powerful than claude code but I feel like monetarily its worth it.
Y'know you have to like bargain with an AI model to reduce its pricing which I just felt really curious about. The psychology behind it feels fascinating because I think even as a frugal person, I already felt invested enough in the model and that became my sunk cost fallacy
Shame for me personally because they use it as a hook to get people using their tool and then charge next month 19$ (I mean really Cheaper than claude code for the most part but still comparative to 1.49$)
Does anybody have any info on what he is actually working on besides all the vibe-coding tweets?
There seems to be zero output from they guy for the past 2 years (except tweets)
Well, he made Nanochat public recently and has been improving it regularly [1]. This doesn't preclude that he might be working on other projects that aren't public yet (as part of his work at Eureka Labs).
More broadly though: someone with his track record sharing firsthand observations about agentic coding shouldn't need to justify it by listing current projects. The observations either hold up or they don't.
[1] https://x.com/EurekaLabsAI
[2] PhD in DL, early OpenAI, founding head of AI at Tesla
However more often than not, someone is just building a monolithic construction that will never be looked at again. For example, someone found that HuggingFace dataloader was slow for some type of file size in combination with some disk. What does this warrant? A 300000+ line non-reviewed repo to fix this issue. Not a 200-line PR to HuggingFace, no you need to generate 20% of the existing repo and then slap your thing on there.
For me this is puzzling, because what is this for? Who is this for? Usually people built these things for practice, but now its generated, so its not for practice because you made very little effort on it. The only thing I can see that its some type of competence signaling, but here again, if the engineer/manager looking knows that this is generated, it does not have the type of value that would come with such signaling. Either I am naive and people still look at these repos and go "whoa this is amazing", or it's some kind of induced egotrip/delusion where the LLM has convinced you that you are the best builder.
Starcraft and Factorio are exactly what it is not. Starcraft has a loooot of micro involved at any level beyond mid level play, despite all the "pro macros and beats gold league with mass queens" meme videos. I guess it could be like Factorio if you're playing it by plugging together blueprint books from other people but I don't think that's how most people play.
At that level of abstraction, it's more like grand strategy if you're to compare it to any video game? You're controlling high level pushes and then the units "do stuff" and then you react to the results.
In Starcraft, you decide if an SVC should mine crystals or build a bunker or repair a depot.
Imagine if instead it were a UI over agents that you could assign to different tasks, build new pipelines.
I don't think stutter-stepping marines to min/max combat was the metaphor he was going for.
I have a professor who has researched auto generated code for decades and about six months ago he told me he didn't think AI would make humans obsolete but that it was like other incremental tools over the years and it would just make good coders even better than other coders. He also said it would probably come with its share of disappointments and never be fully autonomous. Some of what he said was a critique of AI and some of it was just pointing out that it's very difficult to have perfect code/specs.
Billionaire coder: a person who has "written" billion lines.
Ordinary coders : people with only couple of thousands to their git blame.