2018.07.15 01:46:49 (1018280689224085504) from Daniel J. Bernstein, replying to "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)" (1018238983287918592):
Thanks for the reference. There are vectorized sorting networks in the literature as far back as 1987; see https://sorting.cr.yp.to/refs.html. People use these on GPUs. But the conventional wisdom has been that other sorting algorithms are faster on typical CPUs.
2018.07.14 19:11:00 (1018181077448515585) from Daniel J. Bernstein:
"Sorting integer arrays: security, speed, and verification." Slides for first djbsort talk now available: https://cr.yp.to/talks.html#2018.07.11
2018.07.14 23:01:06 (1018238983287918592) from "Vlad Krasnov πͺπΊπ¦ (@thecomp1ler)":
To be fair AVX quick sort and radix sort are not very well optimized for small inputs. However http://www.vldb.org/pvldb/1/1454171.pdf does SIMD merge networks.