Top
Best
New

Posted by alattaran 14 hours ago

DeepClaude – Claude Code agent loop with DeepSeek V4 Pro(github.com)
497 points | 200 commentspage 2
_345 13 hours ago|
If you're okay with sonnet level performance, this sounds like a straight upgrade. But I find that sonnet messes up too much, that it ends up not being worth cost optimizing down to using it or another sonnet-level model. Glad to have this as an option though
2ndorderthought 13 hours ago||
A lot of people are having good experiences doing things like using opus for designing and using locally hosted qwen3.6 for implementation.

I could see a serious cost reduction story by using opus for design and deepseek for implementation.

Personally I would avoid anthropic entirely. But I get why people don't.

girvo 13 hours ago||
Like me: that’s what I do. Either Opus 4.7 or GLM 5.1 for planning, write it out to a markdown file, then farm it out to Qwen 3.6 27B on my DGX Spark-alike using Pi. Works amusingly well all things considered.
brianjking 10 hours ago|||
How are you interacting with GLM 5.1? Via the Claude Code harness? I really wish they'd release a fully multimodal model already.
girvo 1 hour ago||
Through Pi, mostly! Also my own for-fun agent I wrote

Yeah so would I, I do miss having vision tools sadly.

2ndorderthought 13 hours ago||||
How is glm 5.1? I have t tried it yet but have been meaning too
girvo 12 hours ago||
It's surprisingly good. Beats MiniMax 2.7 and Qwen 3.5 Plus in my testing (I haven't tested 3.6 plus though), quite handily. It's far better than Sonnet, and often equivalent to Opus for the web development and OCaml tasks I'm using it for. It definitely isn't Opus 4.7, but its far good enough to earn it's keep and is substantially cheaper.
amunozo 45 minutes ago|||
Did you compare it with Kimi K2.6 and DeepSeek V4 Pro? I feel they're similar but as GLM is more expensive, I am not using it much.
sshine 11 hours ago||||
I agree with this. And also: it uses more thinking time to reach this. So while you get a lot of tokens on their plan, the peak 3x token usage multiplier + the extra thinking means you run into the rate limit anyways.
girvo 11 hours ago||
True, though the $20 equivalent used for planning only I don’t hit those limits often, vs Claude where the Pro can literally hit limits with a single prompt haha
Alifatisk 6 hours ago|||
I second this, glm-5.1 is incredible.
aftbit 13 hours ago|||
What hardware are you using to power this?
girvo 12 hours ago||
> DGX Spark-alike

Probably wasn't clear enough if you don't know what that is already, apologies

It's an Asus Ascent GX10, which is a little mini PC with 128GB of LPDDR5X as shared memory for an Nvidia GB10 "Blackwell" (kind of, it's a long story) GPU and a MediaTek ARM CPU

sterlind 10 hours ago|||
pulls up chair

could you tell me the long story?

edit: or wait, is it quasi-Blackwell the way all DGX Sparks are quasi-Blackwell? like the actual silicon is different but it's sorta Blackwell-shaped?

girvo 9 hours ago||
Yeah exactly. Shader model 121 is different to SM 120 (consumer Blackwell) and is different again to data centre Blackwell SM100.

The promise of this chip was “write your code locally, then deploy to the same architecture in the data centre!”

Which is nonsense, because the GB10 is better described as “Hopper with Blackwell characteristics” IMO.

Still great hardware, especially for the price and learning. But we are only just starting to get the kernels written to take advantage of it, and mma.sync is sad compared to tcgen05

aftbit 12 hours ago|||
Ah yeah I saw that, I was just curious which particular mini-PC you were using. I was considering picking up one of the various AI Max 395 boxes before the RAMpocalypse but didn't take the plunge. Thanks for the response!
girvo 11 hours ago||
I heavily considered one of the AMD Strix Halo boxes, but part of the reason I wanted this was to learn CUDA :)
chrsw 12 hours ago|||
I keep re-learning this lesson: I chug along with a lesser model then throw a problem at it that's too complex. Then I try different models until I give up and bring in Opus 4.6 to clean up.
brianwawok 12 hours ago|||
And I keep using Opus to like, make git commits. Really just need a smart router that is actually smart, vs having to micromanage model
sterlind 10 hours ago||
the problem is managing the contexts. your session might fit in Opus, but will that smaller model you dispatch the git commit to fit? even so, will it eat too much on prefill? do you keep compactions around for this, or RAG before dispatch or something? how do you button back up the response?

all doable but all vaguely squishy and nuanced problems operationally. kinda like harness design in general.

energy123 9 hours ago|||
It's not even that much cheaper, GPT 5.5 is about 2x more expensive per task than Deepseek v4 Pro when you adjust for less token usage, according to Artificial Analysis. Doesn't seem worth it to me.
cpursley 3 hours ago||
Are we talking pay as you go API or vs plans?
energy123 2 hours ago||
Pay as you go API rates.
maxdo 9 hours ago|||
This is the problem: you need the best model, not just a good one, for: - Good architecture, which requires reading specs, code, etc. reads like: lots of tokens in/out - Bug fixing — same, plus logs, e.g. datadog

Once you've found the path, patches are trivial and the savings are tiny unless you're doing refactoring/cleanup.

testing gets more and more complicated. Take a look at opencode go, and you see this:

>Includes GLM-5.1, GLM-5, Kimi K2.5, Kimi K2.6, MiMo-V2-Pro, MiMo-V2-Omni, MiMo->V2.5-Pro, MiMo-V2.5, Qwen3.5 Plus, Qwen3.6 Plus, MiniMax M2.5, MiniMax M2.7, >DeepSeek V4 Pro, and DeepSeek V4 Flash

and now on your own with bugs, all of these models can produce at scale. Am i missing anything in this picture. What is the real use of cheaper models?

JSR_FDED 6 hours ago||
I'd argue that you need the model that's good enough, not the best.
Culonavirus 7 hours ago|||
We're not yet at a point of saturation when all the frontier models would be of somewhat comparable "intelligence" and we could decide which to use based on other factors (speed, effective context window etc.), so I honestly don't see why would you (as a company or an employee) not use the best available model with the highest (or at least second highest) thinking effort. The fees are not exactly cheap, but not that expensive either.
nyssos 6 hours ago||
Agreed that we're not at saturation, but we don't have a canonical "best" either. For example ChatGPT 5.5 + Codex is, in my experience, vastly superior to Opus 4.7 + Claude Code at sufficiently well-specified Haskell, but equally vastly inferior at correctly inferring my intent. Deepseek may well have its own niche, though I haven't used it enough to guess what it might be.
mohsen1 6 hours ago|||
This has been my experience working on tsz.dev. Only Opus 4.7 and GPT 5.5 can really be productive for the remaining test cases.
willio58 12 hours ago|||
I don’t find this with sonnet at all. As long as I have a solid Claude.md and periodically review the output and enforce good code practices via basic CI gates I’ve rarely ever found myself having to switch to opus
2ndorderthought 11 hours ago||
You might be surprised then at how good cheaper models solve your problems
zkmon 2 hours ago||
Next claude news (trump style): Recent versions of Claude code no longer allow talking to other models, or helping with any code that has the goal of moving away from anthropic models.
dopeepsreaddocs 11 hours ago||
Did... Did you just ask an AI to one-shot something that normally amounts to no more than setting two env variables?
lukaslalinsky 6 hours ago||
I've been using DeepSeek v4 pro as an alternative to Claude models and for the first time I can see it as a real replacement. With the other Chinese models, I was missing something, but DeepSeek seems good enough for the kind of development I want to do.
jay1996523 6 hours ago||
Claude code can already use the DeepSeek API, so what are the advantages of this tool?
nclin_ 10 hours ago||
Is claude code the best coding harness? Anyone running evals on that?
ahmadyan 10 hours ago||
In my anecdotal experience, it is not. Same model, opus, works better in 3P harnesses such as Factory Droid or Amp.

Claude code, on the other hand, is the most subsidized one, both for consumers (through max subscription) and for enterprises (token discounts). It is also heavily optimized for cost, specially token caching and reduced thinking, at the expense of quality.

DeathArrow 5 hours ago||
Terminal Bench is testing agent harness.

The best two are Codex and Forge Code.

However I am using plugins and skills that are only compatible with Claude Code or work best with Claude Code.

So, for me, Claude Code with plugins like claude-meme, Context Mode, Superpowers and Get Shit Done is better than other tools.

I think everyone should test multiple models and multiple agent harness for his specific needs, codebase and way of working.

alexdns 13 hours ago||
obviously vibe coded ( co authored ) + the prices dont even match
2ndorderthought 13 hours ago||
It's going to be real hard to find headlines that weren't vibe coded from here on out unfortunately.
SchemaLoad 13 hours ago|||
Unless I actually know the author I assume everything here is vibeslop and full of mistakes.

Maybe I need to switch to some news publication that actually does real research and writing still. Because public forums like this have been completely destroyed by LLMs.

cyanydeez 13 hours ago|||
welp, pack it it in boys, it was nice conceptualizing all you as real humans on the internet. I guess I'll just have to go touch grass if I want to feel parasocial.
dragontamer 13 hours ago||
I mean, we have the tech and community to actually build in person meetups and sign CRT certificates, right?

If we touch grass in person and swap certificate requests, we can actually rebuild a trust network.

This is a pretty old problem with regards to clubs / secret societies and whatnot. And with certificates / PKI, our modern security tools have solved all the technical problems.

2ndorderthought 12 hours ago||
I wish I could be invited to a secret club of guaranteed humans. Someone hand me a certificate next time you see me! Also don't stab me kthxbye
cyanydeez 12 hours ago||
Unfortunately, a lot of whats happening in the tech world seems to be from some super serious AI cults, so not sure goin offline like this is any better.
2ndorderthought 12 hours ago||
Yea but we could have fun. Play some dnd. Drink tea or whiskey. Eat pizza pie. Light saber battle. Buy a megaphone and hang out at a street corner telling passerbys they are perfectly acceptable and worthy of kindness and love
inciampati 12 hours ago||
poorly vibe coded. machines can check details easily, use them.
shay1607m 3 hours ago||
Interesting setup

do you have any benchmarks on: - token usage over time - failures/retry rates

would be great to see how it behaves in production

orliesaurus 13 hours ago||
Is there a way to do this directly by using claudecode CLI (which I already have installed) and openrouter??
vitaflo 13 hours ago||
Yes, Deepseek even documents how:

https://api-docs.deepseek.com/quick_start/agent_integrations...

theanonymousone 13 hours ago|||
Yes, from Claude Code themselves: https://code.claude.com/docs/en/llm-gateway
jubilanti 12 hours ago|||
Here's a oneliner:

   ANTHROPIC_BASE_URL="https://openrouter.ai/api" ANTHROPIC_AUTH_TOKEN="$OPENROUTER_API_KEY" ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek/deepseek-v4-flash" CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 claude
gnat 13 hours ago|||
This repo's README explains how it works and you can do it yourself. claude looks for environment variables that say which API endpoint to talk to, which key to pass, which model name to use for haiku/sonnet/opus-level workloads, etc.
999900000999 9 hours ago|
I just spent half my day getting CUDA and LLAMA to work with my 5070TI.

I was able to use it in agent mode with Roo, I stopped after having it write out a plan, but I'll continue when I have more time.

Deepseek feels less likely to do a straight up rug pull since you can self host with enough money, but I'm still more excited about local solutions.

Usually I just need grunt work done. I'm not solving difficult problems.

More comments...