Top
Best
New

Posted by samwho 10 hours ago

Quantization from the Ground Up(ngrok.com)
174 points | 33 commentspage 2
fcpk 8 hours ago|
something I have been wondering about is doing regressive layer specific quantization based on large test sets. ie reduce very specifically layers that don't improve general quality.
qskousen 8 minutes ago||
I've experimented with this with diffusion models with a safetensors - gguf tool I wrote. even with relatively few sample images (~10k, still enough to keep my 3090 spinning for days straight) the benefits are quite noticeable - a smaller file with overall better results.
woadwarrior01 3 hours ago|||
This is a very well established idea. It's called dynamic quantization. Vary the quantization bit-width (or skip quantization altogether) on a layer by layer basis, using a calibration dataset.

EvoPress is the first time that comes to my mind, when I think of dynamic quantization.

https://arxiv.org/abs/2410.14649

buildbot 8 hours ago||
This is a thing! For example, https://arxiv.org/abs/2511.06516
fcpk 7 hours ago||
that's brilliant, I wonder why we haven't seen much use of it to do very heavy quantization
maxilevi 4 hours ago||
since when ngrok is doing ai
srichard16 1 hour ago|
https://ngrok.ai/
aeve890 4 hours ago||
Oh, _that_ quantization.
maltyxxx 2 hours ago||
[dead]
leontloveless 4 hours ago||
[dead]
vicchenai 9 hours ago|
[dead]