Top
Best
New

Posted by fheinsen 9 hours ago

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation(arxiv.org)
136 points | 70 commentspage 2
physicsguy 4 hours ago|
With this, they've not provided an upper bound the error on the kernel expanded with N terms which I think is a big missing piece.
andes314 8 hours ago||
Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference
smokel 7 hours ago|
What about n log n?
yanosh_kunsh 8 hours ago||
So does that mean that LLM inference could go down significantly in price and/or context length would dramatically increase?
NedCode 7 hours ago||
Reference implementation: https://github.com/glassroom/sata_attention
rvz 8 hours ago|
> Our work enables unbounded token generation at modest fixed cost, substantially reducing the infrastructure and energy demands of large-scale Transformer models. The mathematical techniques we introduce are of independent interest.

Now this is a very interesting paper, which hopefully should address the chronic inefficiencies of the AI lack of efficient methods and approaches in reducing their significant computational and energy demands which are off the charts.

> These factors penalize performance relative to what a fused, hardware-optimized implementation could achieve, and the reported runtime results should therefore be interpreted conservatively.

It's still early with several limitations, but the need for wasting billions on GPUs will begin to not make any sense soon.