This is the fourth video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 70, 71, 72, and 73) are all available on the github.
Despite multiple attempts, I was not able to make a reasonable introduction to RDTSC that wasn’t around 45 minutes long. It’s not that RDTSC is particularly complex — it’s actually quite a simple instruction — but rather that the way it works has changed over the course of several processor revisions, and that context is required to understand why it does different things on different processors1.
Because RDTSC now effectively functions as a high-precision wall clock timer, any reasonable introduction unfortunately has to include a discussion of how to calibrate it to some known time interval. Without that, TSC values are are effectively meaningless, and there would be no way to intuitively understand what they mean.
So today’s video is unfortunately quite hefty! For the sake of future course-takers, I will keep thinking about this and see if I can come up with a way that splits this lesson into (at least) two smaller parts while still retaining enough information in each part to have useful homework. It’s a bit tricky, because introducing OS timers on their own is fairly useless (as you’ll see in next week’s post). Hopefully I will eventually come up with a split that doesn’t end up doing that.
For now, I apologize for the 45+ minute runtime, and hopefully it will not prove too much of a barrier to understanding what RDTSC is and how it came to do what it now does.