Ternary Bonsai: Top Intelligence at 1.58 Bits

Posted by nnx 3 days ago

Ternary Bonsai: Top Intelligence at 1.58 Bits(prismml.com)

174 points | 50 commentspage 2

mchusma 14 hours ago|

Ever since I saw the first one of these one-bit models made by Microsoft, I thought this was a fascinating route. I assume that in practice, this is less helpful than it seems, just because there's every economic incentive in the world for the big AI labs to produce small, powerful, fast models. None of them seem to be using this technique, so it's interesting, but I suspect it's not quite working.

I also have yet to see any of these at a larger scale. For example, can you try one of these at 100 billion parameters?

tiagod 4 hours ago||

> Fig IV: Throughput (toks/sec) and energy consumption (mWh/tok) across various hardware platforms.

I don't see any mWh/token figures in that chart.

ericb 13 hours ago||

This is pretty cool! I would love to see an even larger models shrunk down.

If you got that into a couple gigs--what could you stuff into 20 gigs?

londons_explore 6 hours ago||

How is the research on training these models directly in their quantized state going?

That'll be the real game changer.

sigmoid10 6 hours ago||

The original BitNet was natively trained on 1.58 bits. PrismML has not released any actual info on how they trained, but since they are based on Qwen, there was certainly some downstream quantization involved.

usrusr 4 hours ago||

Is it just quantization or is it also rearranging the weights to get clusters with (almost) the same factors? If it's the latter it would very much be training in full precision (but also hardly any precision lost by the compression).

Unfortunately my mental model doesn't contain anything to even guess if that's possible or not, my AI times were at the falling flank of symbolic. Funny how one bit models feel a bit like approaching an approximation of symbolic again (until you read about the grouped scale factors and then the illusion is gone)

One thought that suggests rearranging is not involved,a thought that does not require any knowledge at all: if it did involve rearranging, someone would certainly have added some order by scale factor tricks with linear interpolation by address offset to lose even less precision.

cubefox 4 hours ago||

This is the only paper which really does this:

https://proceedings.neurips.cc/paper_files/paper/2024/hash/7...

They train directly in the 1 bit domain, without any floating point weights. They don't use the classical Newton-Leibniz derivative (which operates on approximations of real numbers) for gradient descent / backpropagation. Instead they invented a binary version called "Boolean variation".

I don't know why this paper didn't get more attention.

wmf 14 hours ago||

Yet again they're comparing against unquantized versions of other models. They would probably still win but by a much smaller size margin.

Dumbledumb 13 hours ago|

Wouldnt the margin be higher? All other models being moved from unquantized to quantized would lower their performance, while bonsai stays. I get what you see if it was in regards to score/modelsize, but not for absolute performance

SwellJoe 11 hours ago||

The metric they're selling this on is intelligence per byte, rather than total intelligence. So, if they used the quantized competing models, the intelligence per byte gap shrinks, because most models hold up very well down to 6-bit quantization, and 4-bit is usually still pretty good, though intelligence definitely tends to fall below 6-bit.

Nonetheless, the Prism Bonsai models are impressive for their size. Where it falls apart is with knowledge. It has good prose/logic for a tiny model, and it's fast even on modest hardware, but it hallucinates a lot. Which makes sense. You can't fit the world's data in a couple of gigabytes. But, as a base model for fine-tuning for use cases where size matters, it's probably a great choice.

happygoose 11 hours ago||

unfortunately, there doesn't seem to be a clear way to fine-tune these models yet. excited for when that happens though.

syntex 8 hours ago||

hallucinates in pretty much every answer

TimorousBestie 11 hours ago||

This model tends to be annoyingly literal. An example from earlier today:

>> What are some names like Llewelyn?

> Some names like Llewelyn are Llewelyn, Llewelyn, Llewelyn, (repeats several times), and Llewelyn.

est 8 hours ago||

installed since last HN post. So Bonsai (1-bit) and Ternary-Bonsai are different?

Can it be run on browsers with WASM/WebGPU?

gbgarbeb 11 hours ago||

When do we get 1100B Kimi K2.6 in 160 GB of memory at 1.125 bpw?

goofy_lemur 12 hours ago|

> On M4 Pro, Ternary Bonsai 8B runs at 82 toks/sec, roughly 5x faster than a 16-bit 8B model

Wow, if this is true, I am extremely impressed and excited!

I wonder about kv cache how much better it is as well!