Posted by Bogdanp 5 days ago
Why does this have an equivalent inner loop, and why is it an important task?
The slowdowm has to do with cashe misses and cpu cashe capacity, which is optimisations the cpu does when executing.
Granted, a language like go may have more of the cpu cashe used up by different runtime checks.
Basically, i think this analysis largely language agnostic.
For example, if you were to benchmark this in Java, the HashMap<> class allocates (twice!) on every insertion. Allocations are a bit cheaper with the GC than they would be via malloc/friends, but still we'd expect to see significant allocator and hence GC overhead in this benchmark
I mean, that type is also awful, so early in tuning for this application you'd pick a different map type, but it is the standard in their language.
The one time in your career you run into it, the next day your boss will add the requirement that entries are going to be inserted and deleted all the time, and your sorting approach is fucked.
If the entries can change all the time, we can use two hash tables, U and D. U maintains the set of unique items at all times, D maintains duplicates. An item is never in both at the same time. In D, it is associated with a count that is at least 2.
A new item is inserted into U. The first duplicate insertion removes the item from U and adds it into D with a count of 2. Subsequent insertions increment the count for that item in D.
A deletion first tries to decrement a count in D, and when that hits below 2, the item is removed from D. If it's not in D, it is removed from U.
At any time we can walk through U to visit all the unique items, without having to deal with spaces of non-unique item.
That has implications for complexity. For instance suppose that for whatever reason, the number of unique items is bounded to sqrt(N). Then iterating over them is O(sqrt(N)), whereas if we just had one hash table, it would be O(N) to iterate over all items and skip the non-unique ones.
The usual trie tricks can avoid this problem without letting the worst case happen. But as often, adding the extra logic can mean worse performance for non-worst-case input.