Posted by gainsurier 2 hours ago
This is the normal way to use computers. They should spend most of their time idle, waiting on us. We shouldn't be waiting for them or spinning more plates to keep them busy.
However, a faster llm isn't enough. You also need fast compiles and fast tests.
It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.
I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...
(I should go measure this now, I'm curious)
There can't be many normal use cases where there'd be any cost benefit.
It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias.
I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast?
I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost.
It will go much faster.
So long as AI lives in server farms, humans will be needed for tasks in the physical world.
It's only if we combine AI with robots that things get really dicey.
So, if any, I would say it's worse for us. Obviously, it's the completely opposite situation for corporations and executives: they are loving the AI situation so much!
I’m excited for ultrafast AI. It likely means less temptation to multi-thread and deeper flow in single sessions.
First make it write a contract (REQ/ARCH/IMPL documents). Skim through those for any mistakes.
Then based on those ask it to write tests. Again skim through them.
Now you have a context full of guardrails. It’s less likely to surprise you.
If you're treating it like a slot machine you're doing it wrong. It will give you exactly what you ask for if you ask clearly, i.e. write a clear, detailed specification, not just "do X!". The nondeterminism comes from vagueness in specification.
i've a Github copilot yearly subscription. Microsoft recently changed their billing to based on token. i'm still getting billed per premium request but GPT 5.4 is now 6x compare to 1x before.
I genuinely don't understand what moat these US model labs have. If they're saying recursive self improvement is just around the corner and Chinese labs are only slightly behind the leading US models, what moat does the US labs have? Are the US models going to recursively self improve better than the Chinese open source ones or something?
I might be completely wrong about this, but if I had money in OpenAI or Anthropic I'd be pulling it all right now. I think the chance of them going to near-zero over the next few years is very significant.
Or Google. I'm working with multiple customers right now that are very pissed at Google for deprecating Gemini 2.5 Flash, canning the GA release of 3.0 Flash and now have to decide whether to bite the bullet of the 5x price increase for 3.5 Flash or switching providers. Quite a few of them will likely fully pivot to open models.
For non subsidized plans? Pretty sure they'd need to put this in ToS, or law suites would have followed by now.
Sometimes Opus just gives me a rubbish session.
Data at https://gertlabs.com/rankings
It is another thing the BigLabs accuse open weight models of benefiting from distillation & other techniques & essentially avoid higher training costs (which typically bleed into bills end users pay for inference).
Ex A: https://www.anthropic.com/research/2028-ai-leadership
Ex B: https://www.reuters.com/world/china/openai-accuses-deepseek-...
In this case, at least it’s threatening multimillion dollar salary jobs instead of entire towns of working class people in America or Mexico.
And the Chinese labs actually release their weights. You could call it… open AI.
Discussions about choosing a library with the best syntactic sugar method naming is just as crazy as suggesting we type in assembly.
This strategy will seem to work really well until the economy that enabled that foundation to form is hollowed out. Then, there will be a reckoning (but we will have no choice but to march forth from there).
I'm not agreeing or disagreeing with you, but my brain cannot comprehend how machines can advance such interconnected systems while keeping humans in focus.
Perhaps I shouldn't have watched the Animatrix again.
There will only be a reckoning if models don't get much better.
If they do get much better you can just have them refactor, fix bugs in, or replace the existing codebase.
The concept of tech debt is sort of meaningless if you anticipate intelligence gains in models to continue.
In software + GenAI now every housewife can build some App over evening.
Especially as teams invest in proper agentic harnessing.
We have had a champion in our team that has invested a lot of time into it over the last 4 months, and if anything, quality has improved, not decreased. Architecture is more coherent, codebase has been cleaned up, agents find information quickly, code produced is very solid and my role is more and more checking that the output meets the requirements. But I cannot confidently say that I would've done a better job than AI more often than not I have to admit it does a better job than mine.
The mistakes are less and less technical and merely in the domain mapping. And AI is still not creative as I am for finding solutions quickly to unlock stakeholders' issues. Also, AI is still not creative as I am for finding the proper solutions for advanced technical problems. But it does a better job than me, even on that front, one shotting few solutions in a fraction of a time it would've taken me to test one idea myself.
Mind you, I don't like AI and I think it ruined the job, I don't like working this way, it's exhausting, way more work on one side, way less fun and fiddling with technical parts.
And yet, I have the genuine belief that few years from now we'll be cloning open source repositories that are already optimized/harnessed and tested for agentic loops and best practices left and right with software engineers mostly overseeing the domain translation and putting their 2 cents on the non-boilerplatey parts of the product (which, in general, are a small part of the surface).
I think that the next years of my career will be mostly spent in setting up and writing the harnessing and domain mapping part. Then I will move to another sector, not because I necessarily believe I won't have a job, but because I want to vomit thinking that's going to be my job.
"Watching John with the machine, it was suddenly so clear. The terminator would never stop. It would never leave him, and it would never hurt him, never shout at him, or get drunk and hit him, or say it was too busy to spend time with him. It would always be there. And it would die to protect him. Of all the would-be fathers who came and went over the years, this thing, this machine, was the only one who measured up. In an insane world, it was the sanest choice."
As long as you've indicated what you want, the machine will try to do what you ask of it. It won't get tired because "the codebase is too big", or it has gotten bored of the pattern, or it wants to introduce a new technology.
It just does the thing you asked of it. (note, that yes, I get that as a codebase size increases, it might make it more difficult to fit into context, but that only applies if it needs to read a large percentage of the project to implement the task, which shouldn't be the case.
there are good actors, which are empowered by AI to produce positive impact, but often there are N times more bad actors, which push crappy code to close feature requests fast, increase performance LoC-like metrics, etc.
It's going to skip the code entirely for small businesses and just render UIs straight from context data and prompts at interactive speeds. Kind of like Google's Genie does with games but much more accurately.
> No one cares anymore.
I never cared about this.
I think this captures something that I've been searching for the words for. (Maybe I should have gotten an LLM to write the words for me.) Some of the biggest AI boosters are the kind of dev that would have cared about the new frameworks of the last 3 months. They had a "the framework does all the thinking for me" attitude already, so it is easy for AI to slot into that.
it needs to win marketing landscape, hyper-overcrowded by thousands of competitors, slop-gened over weekend.
I have a more hopeful take. As AIs improve and get faster we can more quickly and iteratively improve code which we may have historically avoided due to the work involved.
I know i've made several refactors that would have otherwise been insane lifts. Not only because the work involved but because sometimes you don't know if it will work, and so you have a sort of double friction; you don't know if it will even succeed. With an AI you can just throw it at the refactor to see if it runs into a problem all while you're having a coffee break or w/e.
In general AI is going to enable humanity to be more extreme versions of itself. For good and bad. I suspect more bad than good, though.
For a while I was running Cerebras GLM 4.7 for a bunch of tasks. Not a very smart model, but it's fantastic to be have a live prototype of a site up and be able to type "make the fonts bigger. No not that big" and see it change in real time. And MiMo 2.5 is a lot more capable than GLM 4.7.
MiMo 2.5 is not the same model as MiMo 2.5 Pro.
GLM 5.1 is z.ai's lastest iteration & is one of the popular open weight coding models.
If you've had the chance, how does GLM 5.1 (which is now more expensive than MiMo 2.5 Pro after its recent 70% price drop) compare?
But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to 10x more on my little tests, at least by the API rates.