The unreasonable effectiveness of modern sort algorithms

(github.com)

30 points | by Voultapher 3 days ago

7 comments

bob1029 38 minutes ago
I find in practice that if the sorting process is too slow, you should begin thinking about different ways to attack the problem. Maintaining a total global order of things tends to only get more expensive over time as the scope of your idea/product/business expands. The computational complexity of the sort algorithm is irrelevant once we get into memory utilization.
This is why we have things like tournament selection. Randomly sampling from the population and running tournaments is way more scalable than scanning and ordering a global list each iteration. You can maintain things like an ELO score with very narrow views into memory. Nothing needs a global view yet you get global effects.
codegladiator 9 minutes ago
Good read. Reminds me of the 1 billion row aggregation challenge, especially the perfect hashing part, all the top entries all used it.
https://github.com/gunnarmorling/1brc
Epa095 2 hours ago
Double jaw-drop. First when the (dynamic) match was slower than the hash map, second when sort_unstable was faster than the hash map!
Cool article. It's clear that all my theoretical algorithm-knowledge comes short when faced with real CPUs.
conradludgate 2 hours ago
The efforts of developing better sorting algorithms like driftsort/ipnsort and better hash functions like foldhash make my life as developer so much simpler. No matter how clever I try to be, most often just using foldhash hashmap or a sort_unstable is the fastest option
DennisL123 2 hours ago
Efficiency, not effectiveness. They are all effective in the sense that they produce sorted results. Even the non-modern sort algorithms are effective in the sense that the results are correct. This should be about the efficiency with which they do it, right?
[-]
- aabhay 1 hour ago
  Agreed. Effectiveness would imply that some algorithms are more likely to sort the list correctly than others, or they sort a higher percentage of elements. Efficiency is about factors external to the correctness
dvh 1 hour ago
Isn't this just another case of premature optimization? Shouldn't you be adjusting sorting algorithms only when customer complains?
[-]
- codegladiator 8 minutes ago
  This is pushing the limits to identify the boundaries