This is the tenth video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 96, 97, 98 and 99) are available on the github.
We've come to the end of our work on time-stamp counter profiling. We’ve made a nice profiler that handles any call graph, and we can get reasonable results with it as long as the number of opened profile blocks per second isn’t too astronomical.
Of course, we’re going to want to gather other types of data as well, not just the time-stamp counter. But if we tried to do that now, we’d be getting a little ahead of ourselves. Since we don’t yet know the microarchitectural details of the platform we’re working on, it’d be hard to know what kind of data we would want to gather!
So, starting next week, we’re going to start employing the time-stamp counter to investigate the performance of our program in more detail. After we gain a better understanding of the microarchitectures we’re working with, then we’ll return to the profiler to add more data gathering capabilities.
As a final note on our block profiling work, this week I wanted to leave you with another A/B comparison that’s easy for us to do now that we have a profiling infrastructure. One of the questions that people had when I went over RDTSC and QueryPerformanceCounter was, why not just use QueryPerformanceCounter for everything?
Of course, I did later show the ASM for QueryPerformanceCounter, and we saw that on modern x64 platforms it’s just a glorified wrapper around RDTSC. But how much would that affect our profiler in practice? Would there be much difference if we used QueryPerformanceCounter for everything?
Well, in the previous post we used the preprocessor to A/B test RDTSC profiling vs. no profiling. Why not use the same trick to A/B test RDTSC vs. QueryPerformanceCounter?
Our profiler doesn’t really care what timer we use for block profiling. If we want to make the choice of timer configurable, all we have to do is make the name of the timer function a preprocessor switch, just like we made block profiling itself a switch.