Top
Best
New

Posted by todsacerdoti 12/21/2025

Faster practical modular inversion(purplesyringa.moe)
65 points | 3 comments
fn-mote 12/27/2025|
The article is quite interesting. It ranges from a mathematical discussion (it explains Stein’s variant of the GCD algorithm if you make it far enough) to instruction-set-level issues.

For example:

> GCC documents __builtin_clz(0) as having an “undefined result”, so I initially assumed it means an indeterminate value. In reality, GCC maintainers consider it UB and LLVM documents it as UB… but the optimizers seem to model it exactly like an indeterminate value? (e.g. LLVM considers @llvm.cttz(0) to produce poison) This is frankly ridiculous, someone do something about it

There is a very interesting table of raw timings, showing variation from about 30% faster to 10% slower across a variety of bit widths (of the numbers involved) and architectures. This is a VERY thorough level of testing!!

PS It’s not a dupe if no discussion occurred… that’s just the second chance pool.

signa11 12/27/2025|
dupe: https://news.ycombinator.com/item?id=46344548
867-5309 12/27/2025||
anachronism: