I have been toying with a 2.5D engine in C on on top of raylib and using DeepSeek as companion in between.
It's thinking transcripts in OpenaCode are transparent and mind boggling to look at things it would consider in its thought process. Very long to read but none of them useless or meaningless.
Always happened that I discovered an assumption that I didn't think about or was wrong but DeepSeek flags it in its thought process and then in final output it would "align" to my flawed request and I'll tell it wait, I saw you thought so and so too and that's correct I made a mistake let's consider that aspect too.
For others who are lacking context :-)
Their stated inspiration for this SEO bomb is Chanel perfumes.
Has anyone tested what happens if you try and run this on lower-RAM Macs? It might work and just be a bit slower as it falls back on fetching model layers from storage.
From the Github page it seems it only supports Apple and DGX Spark. I have 128 GB of RAM and a 3090 but it probably won't work.
[1] https://unsloth.ai/docs/models/tutorials/minimax-m27
(Unsloth's deepseek-v4 support is still WIP)
I do wonder, though, if another agent is really needed. I've been driving it with Pi (Claude Code's system prompt is far too heavy given the prefill speeds) and it's been great. OpenCode is another good option. Is there anything else to gain from another similar tool specific to Deepseek 4?
Also there is a lot more to imagine, TUI side. The problem is that most projects all copy what they already saw. For instance I just did this in 20 minutes: https://x.com/antirez/status/2055190821373116619 Now that code is cheap, ideas have more value. Are we sure that today it is still the case to think in terms: "Is another XYZ needed"? It could be the case that only just to explore new ideas, it is worth it. I I don't like the Javascript / Node ecosystem for my code, so if I have to explore a new TUI or agent workflow, if I do it with the tools I'm more happy to use, the result, the iterations, are different.
Codex CLI is written in Rust, which should give comparable raw performance to C/C++. Of course you can care about the "less dependencies" point but this is somewhat less of a concern on a properly maintained project like Codex. That's not so much "wild, out of control" third-party dependencies and closer to the old ideal of proper software componentry.
> Also there is a lot more to imagine, TUI side. The problem is that most projects all copy what they already saw. For instance I just did this in 20 minutes.
This mockup is really nice and the sidebar display gives you a natural way to expose running multiple thinking flows in parallel, at least if you keep them from stepping on each other's toes with code edits (keep them all in read-only "plan" mode or working on completely separate directories/files). That's not so helpful on a 128GB MacBook where a single agentic flow brings you to thermal/power limits already, but it suddenly becomes useful on other hardware (DGX Spark, Strix Halo, lower-RAM machines with SSD offload, multiple nodes with pipeline parallelism) where you have more compute than you could use for single-stream decode.
Once we hit that point, I am curious how much of Anthropic's current business model falls apart? So far it's always been clear that you just pay for the most intelligent model you can get because it is worth it. It now seems clear to me that there is limited runway on that concept. It is just a question of how long that runway is. I honestly wonder how much of their frantic push to broaden out into enterprise / productivity is because they see this writing on the wall already.
Is that true? I find the smarter models can just be effective when smaller models can't. It isn't a matter of just waiting longer.
Perhaps you'd still turn to hosted models for the hardest tasks, but most tasks go local. It does seem like that would make demand go down significantly.
Of course that's all predicated on model advances plateauing, or at least getting increasingly more expensive for incremental improvements, such that local open source models can catch up on that speed/quality/cost curve. But there is a fair amount of evidence that's happening. The models are still getting noticably better, but relative improvement does seem to be slowing, and cost is seemingly only going up.
* local compute isn’t scaling as before, so algorithmic improvements are the only ways models get meaningfully faster and smarter
* all those same algorithmic improvements would also be true for larger models
* hardware manufacturers have an incentive against local LLMs because cloud LLMs are so much more lucrative (+ corps would by desktop variants if they were good enough)
So no it’s not clear quality will ever be comparable. It may be good enough for what you want but there will always be a harder problem that you need to throw more compute and more memory at.
Sure, but if the “good enough for what you want” consumes the vast majority of cases - data-center ai becomes just for the very extreme edge cases. Like how I can render a 4k rez video game at 60fps on my home pc, but if pixar wants to render their next movie they use data-center compute.
> all those same algorithmic improvements would also be true for larger models
Smaller models run faster. If ten runs of a small model gets me the same quality result as one run of the big model, and the small model runs 10x faster, then they are functionally the same.
This is a very nice analogy actually and it impacts the whole story about US vs. Chinese leadership in "frontier AI".
It's always going to be cost;
developer time vs developer cost vs AI cost vs developer productivity.
With 4.6 it's looking like we are at the upper limit of appetite for cost (for "regular" Business) so the other levers will probably need to change.
It did ok, but scored substantially less than Opus. It also cost nearly as much, even with the current launch promo pricing for Deepseek.
That cost is interesting - I've seen similar things with Sonnet vs Opus, and in my own benchmarking there are some models that benchmark well, seem to have a good price but use so many tokens they cost just as much as "more expensive" models.
[1] https://blog.kilo.ai/p/we-tested-deepseek-v4-pro-and-flash
That depends on where the methodology goes. But more and more it's hands off. If the trajectory continues it won't matter because nobody is sitting their waiting / watching the LLM code anyway. It is all happening in the background. We might see hybrid approaches where the weaker / cheaper agent tries to solve it and just "asks for help" from the more expensive agent when it needs it etc.
My personal experience is that for production-grade code you need to steer the agent more often than not... so yes, at least some of us are watching the LLM code.
> We support the following backends:
Metal is our primary target. Starting from MacBooks with 96GB of RAM.
NVIDIA CUDA with special care for the DGX Spark.
AMD ROCm is only supported in the rocm branch. It is kept separate from main
since I (antirez) don't have direct hardware access, so the community rebases
the branch as needed.
> This project would not exist without llama.cpp and GGML, make sure to read the acknowledgements section, a big thank you to Georgi Gerganov and all the other contributors.Edit: aww, doesn't seem to support offloading to system RAM[0] (yet)
[0] https://github.com/antirez/ds4/issues/108
Guess I'll have to keep watching the llama.cpp issue[1]
Has anybody tried it? There is a lot of emphasis on MacBook Pro in this thread, but I would like to use it with an AMD Halo Strix with 128GB of unified RAM.
Configured one just now, delivers in 2 weeks
The code seems based on llama.cpp and GGML.
I don't fully understand why it is a standalone project. The readme discusses this: DwarfStar 4 is a small native inference engine specific for DeepSeek V4 Flash. It is intentionally narrow: ...
I think the only bigger difference in DeepSeek V4 vs other models is maybe the type of self-attention. And that leads to: KV cache is actually a first-class disk citizen.
But I still feel like those changes could have been implemented as part of some of the other local engines.
I also assume more models will come out, not just from DeepSeek but also from others, and they might share similar self-attention approaches, that would benefit from a similar KV cache implementation.
(the ux of ds4 is fantastic too -- it's dead-easy to get a known-good model, great quant. llamacpp you're much more hacking in the wilderness, w/ many many knobs.)
Is it true? We'll see, in a few years.
Antirez explained the dev process when he posted a pure C implementation of the Flux 2 Klein image gen model, at https://news.ycombinator.com/item?id=46670279