2020.06.22 08:52:44 (1274958482379116544) from Daniel J. Bernstein, replying to "damageboy (@damageboy)" (1274637913297498114):
Beware that most benchmarking frameworks don't reliably measure timings of software running on Intel CPUs with Turbo Boost. Cycles/second is variable for the CPU core and for perfmon, while it's constant for rdtsc (the usual cycle counter) and for DRAM. Have to control temps etc.
2020.06.21 03:56:38 (1274521579511308288) from "Travis Downs (@trav_downs)", replying to "damageboy (@damageboy)" (1274520378505924608):
Looks very competitive between this and djbsort For small inputs, it's probably even more useful to quote in cycles since I expect it to scale perfectly with frequency, then you can compare results across chips (of the same uarch). Google bench doesn't have it build in, IIRC.
2020.06.21 04:11:09 (1274525230711652354) from "damageboy (@damageboy)", replying to "Travis Downs (@trav_downs)" (1274521579511308288):
Yes, I'm trying to hack cycle counting into googlebench right now.
2020.06.21 04:24:32 (1274528598003761154) from "Travis Downs (@trav_downs)", replying to "damageboy (@damageboy)" (1274525230711652354):
Here's a relatively portable one I use (header file is next to it): https://github.com/travisdowns/zero-fill-bench/blob/master/cycle-timer.c
2020.06.21 11:38:54 (1274637913297498114) from "damageboy (@damageboy)", replying to "Travis Downs (@trav_downs)" (1274528598003761154):
thanks, weirdly enough Google bench has one internally they don't expose so went for that. it's mostly as you expect, simply multiply by 2.1. I'll do another post once I clean up some minor issues. just copy pasting djbsort's compiler flags moved it by a small notch.