Posted by redm 10 hours ago
It seems to keep repeating that the water cycle is the main source of energy for all living things on the planet and then citing Jenkins 2010. There are also a ton of sentence beginning with “It also…”
I don’t even think it’s correct. The sun is the main source of energy for most living things but there’s also life near hydrothermal vents etc.
I don’t know who Jenkins is, but this model appears to be very fond of them and the particular fact about water.
I suppose fast and inaccurate is better than slow and inaccurate.
If you have an existing network, making an int4 quant is the better tradeoff. 1.58b quants only become interesting when you train the model specifically for it
On the other hand maybe it works much better than expected because llama3 is just a terrible baseline
demo shows a huge love for water, this AI knows its home
There's a lot that you can do when the model size is that small, yet still powerful.
Our next step is that we want to put up a content distribution network for it where people can also share their diffs for their own fine-tuned model. I'll post the project if we finish all the parts.
[1] https://www.youtube.com/live/x791YvPIhFo?is=NfuDFTm9HjvA3nzN
My disappointment is immeasurable and my day is ruined.
That said, I think the comparison to improving GGUF quantization isn't quite apples to apples. Post-training quantization is compressing a model that already learned its representations in high precision. Native ternary training is making an architectural bet that the model can learn equally expressive representations under a much tighter constraint from the start. Those are different propositions with different scaling characteristics. The BitNet papers suggest the native approach wins at small scale, but that could easily be because the quantization baselines they compared against (Llama 3 at 1.58 bits) were just bad. A full-precision model wasn't designed to survive that level of compression.
The real tell will be whether anyone with serious compute (not Microsoft, apparently) decides the potential inference cost savings justify a full training run. The framework existing lowers one barrier, but the more important barrier is that a failed 100B training run is extremely expensive, and right now there's not enough evidence to derisk it. Two years of framework polish without a flagship model is a notable absence.
Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.
Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?
How to run Qwen 3.5 locally https://news.ycombinator.com/item?id=47292522