A few words on DS4 - Hacker News

Posted by caust1c 15 hours ago

352 points | 144 commentspage 3

bjconlan 14 hours ago|

This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.

Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)

kamranjon 13 hours ago|

Very cool! I had no idea that HF was doing this - I really love their small model experiments.

sbinnee 13 hours ago||

It is a big thing for sure to have a competitive local agentic model. I've replaced gemini 3 flash preview with DeepSeek v4 flash for all of my personal use cases. Starting from chat app, language learning, and even hobby coding. For coding, I couldn't get decent results no matter which sota latest models I used before. It's not close to Opus or Codex models. It's a flash model and makes mistakes here and there (I just saw `from opentele while import trace`, new Python syntax!)

But I found its tool calling is reliable than other oss models I tried. I assume that it attributes to interleaved thinking. Its reasoning effort is adjusted automatically by queries. I enjoy reading these reasoning traces from open models because you can't see them from proprietary models.

I would love to try DS4 so bad. Well, I don't have a machine for it. I will just stick to openrouter. I wish I can run a competitive oss model on 32GB machine in 3 years.

zozbot234 13 hours ago||

> I wish I can run a competitive oss model on 32GB machine in 3 years.

You could try DS4 on that machine anyway and see how gracefully it degrades (assuming that it runs and doesn't just OOM immediately). Experimenting with 36GB/48GB/64GB would also be nice; they might be able to gain some compute throughput back by batching multiple sessions together (though obviously at the expense of speed for any single session).

thegeomaster 13 hours ago|||

> `from opentele while import trace`

FYI, this to me points to an inference bug, bad sampling, or a non-native quant. OpenRouter is known to route requests to absolutely terrible, borked implementations. A model like DeepSeek V4 Flash shouldn't be making syntax errors like this.

kristianp 13 hours ago||

> I wish I can run a competitive oss model on 32GB machine in 3 years.

It's so hard to predict what size the open-weight models will be, even in 6 months time. Will a 96GB machine turn out to be a complete waste of money? Who knows.

sourcecodeplz 4 hours ago||

This project is a week old and already super popular. Guess people really were tired of lmstudio or tunning llama.cpp with settings.

zargon 2 hours ago|

llama.cpp (and consequently LM Studio) don't support DeepSeek V4. If you want to run V4, this is your only option right now unless you have hardware that can run vLLM.

karel-3d 3 hours ago||

Oh a local DeepSeek? Nice

> Starting from MacBooks with 96GB of RAM.

... oh. And I thought I bought a lot with 48 GB.

zozbot234 3 hours ago|

96GB is what the author claims will work in a foolproof way for easy production use. But nothing stops you from trying to run it on 48GB, it ought to gracefully fall back on accessing model layers from the disk.

brcmthrowaway 13 hours ago||

This guy is falling deep into Yegge-tier psychosis.

linkregister 12 hours ago||

Empirically, DS4 is hosting the DeepSeek v4 Flash model with good performance on home hardware. I'm curious how you came to this conclusion.

dakolli 12 hours ago||

"Empirically", have you tested this yourself?

linkregister 10 hours ago||

It's trivial to find reviews and benchmarks of DS4 online. Also, there are benchmarks in the article.

Here's one of the top hits: https://forums.developer.nvidia.com/t/fully-custom-cuda-nati...

Bizarre comment; sounds like "How do you know Porsches are fast? Did you drive one?"

calmingsolitude 9 hours ago|||

Parent is simply pointing out the incorrect usage of "empirically", which should typically only be mentioned when you've tested it yourself.

linkregister 9 hours ago||

I'm having trouble finding dictionaries or other references that add the qualifier that it needs to be self-tested and not relying on the research of others. Can you point me to one?

dakolli 9 hours ago||

I don't think comments on the internet count as "empirical" evidence, but sure.

linkregister 7 hours ago||

If you think antirez's benchmarks in the blog post are false, you should make the claim. Continue to move the goal posts.

dakolli 9 hours ago|||

Are you comparing an LLM running on a laptop to a Porsche?

I just find it really funny people are willing to write things like "empirically speaking, X is obvious" without actually testing it themselves.

I've seen mixed reviews, and the most honest sounding ones have said it has latency issues.

I don't really care that much what the average LLM power user says at this point, they're impressed by anything an LLM does. They're like toddler's entertained by the sound their Velcro shoes make.

You LLM people are going to be like my mom, once she got an Maps app she completely gave up on navigating anywhere with her own brain, and is lost without a phone.

Except for you LLM people, its going to be reading, writing, problem solving and thinking in general. You'll be completely reliant on an llm to get anything done, have fun with that. You're cooked bro.

linkregister 9 hours ago||

It's funny because you make these assertions without any empiricism of your own. They're just speculations.

"You LLM people". Has it occurred to you that individuals have variation within groups?

wren6991 8 hours ago|||

Not even close. "I made this DSP task faster by focusing on exactly one compute graph on one machine instead of a compute graph compiler that runs on every possible machine" is a real engineering approach, and the AI usage is incidental. Things like Gas Town are self-serving turboslop whose only purpose is to generate more slop.

fgfarben 12 hours ago||

Nope.

vrighter 7 hours ago||

Damn it I was expecting something interesting about the ps4 controller. Not some more junk about AI. Such a rugpull

codedokode 13 hours ago|

I thought DeepSeek was closed-weights and proprietary? I wonder how it compares against Western open-weight models. The hugging face page contains the comparison only with proprietary models for some reason.

itishappy 13 hours ago||

DeepSeek has always been open-weight, and the DeepSeek HuggingFace page does not contain any comparisons. Where did you form these opinions?

codedokode 13 hours ago|||

It contains comparisons: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

itishappy 12 hours ago||

Just the first one then...

Apologies. Where did I form my opinions?

zozbot234 13 hours ago||

Nemotron would be a comparable Western open model AIUI.