Hashed sorting is typically faster than hash tables

Posted by Bogdanp 5 days ago

Hashed sorting is typically faster than hash tables(reiner.org)

216 points | 61 commentspage 2

scrubs 4 days ago|

Great article. Informative and well written.

smallpipe 2 days ago||

Could you reduce the BW requirement of the hash table by using an atomic increment instruction instead of read-modify-write ? It still performs a read modify write, but further down in the hierarchy, where more parallelism is available.

zahlman 2 days ago||

> First, extremely sparse unstructured matrix multiplication with e.g. sparsity of one in a billion and one scalar per nonzero.

Why does this have an equivalent inner loop, and why is it an important task?

nasretdinov 2 days ago||

I imagine it must make a bigger difference in languages where allocations are more costly too. E.g. in Go sorting to count uniques is most certainly much faster than using a map + it saves memory too

aDyslecticCrow 2 days ago|

This doesnt concerns in-language allocations at all. There is only one large allocation for both, done up-front.

The slowdowm has to do with cashe misses and cpu cashe capacity, which is optimisations the cpu does when executing.

Granted, a language like go may have more of the cpu cashe used up by different runtime checks.

Basically, i think this analysis largely language agnostic.

swiftcoder 2 days ago|||

It's not language agnostic in the sense that some languages don't have the necessary functionality to allocate everything up front and avoid allocations in the hot loop.

For example, if you were to benchmark this in Java, the HashMap<> class allocates (twice!) on every insertion. Allocations are a bit cheaper with the GC than they would be via malloc/friends, but still we'd expect to see significant allocator and hence GC overhead in this benchmark

tialaramex 2 days ago|||

If we used C++ the standard library's hash set type std::unordered_set doesn't have the all-in-one reserve mechanic. We can call reserve, but that just makes enough hashtable space, the linked list items it will use to store values are allocated separately each time one is created.

I mean, that type is also awful, so early in tuning for this application you'd pick a different map type, but it is the standard in their language.

kazinator 1 day ago||

This sounds like a spherical cow problem.

The one time in your career you run into it, the next day your boss will add the requirement that entries are going to be inserted and deleted all the time, and your sorting approach is fucked.

If the entries can change all the time, we can use two hash tables, U and D. U maintains the set of unique items at all times, D maintains duplicates. An item is never in both at the same time. In D, it is associated with a count that is at least 2.

A new item is inserted into U. The first duplicate insertion removes the item from U and adds it into D with a count of 2. Subsequent insertions increment the count for that item in D.

A deletion first tries to decrement a count in D, and when that hits below 2, the item is removed from D. If it's not in D, it is removed from U.

At any time we can walk through U to visit all the unique items, without having to deal with spaces of non-unique item.

That has implications for complexity. For instance suppose that for whatever reason, the number of unique items is bounded to sqrt(N). Then iterating over them is O(sqrt(N)), whereas if we just had one hash table, it would be O(N) to iterate over all items and skip the non-unique ones.

pklausler 2 days ago||

The author discusses finding “unique” values, but I think they meant “distinct”.

o11c 2 days ago||

Note that "fixes bad distributions" is the same thing as "causes bad distributions" if your input is untrusted (or even if you're just unlucky). Especially when you are deliberately choosing a bijective function and it is trivial for an attacker to generate "unfortunate" hashes.

The usual trie tricks can avoid this problem without letting the worst case happen. But as often, adding the extra logic can mean worse performance for non-worst-case input.

pygy_ 2 days ago|

You could also use a random salt (for i64, add it up with wrap around overflow) with minimal overhead.

tgv 2 days ago||

Isn't radix sorting somewhat like hashing?

HelloNurse 2 days ago|

Not at all, but it can be used to sort by hashes rather than by "natural" data.

curtisszmania 2 days ago|

[dead]