This is the seventh video in Part 2 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly. The listings referenced in the video (listings 76, 81, 82, 83, and 84) are available on the github.
We are now comfortable with using the time-stamp counter to measure how much time has elapsed in various parts of our program. And, if you did the homework from last time, you've already had a shot at making something that allows you to quickly and easily deploy that kind of timing wherever you want.
This may have been extremely simple for you to do, if the language you’re using supports strong metaprogramming. Or, it may have been extremely difficult, if the language is fairly spartan. The language I use for the reference code, C++, is somewhere in the middle of the two extremes.
What I'd like to do in this post is go over the basic design I used, then look at how we might modify the code to solve the problem I demonstrated at the end of the previous post: the fact that if we nest two blocks, we don't get easy-to-read profiling results.
Easy-to-read results aren’t an absolute necessity. Since this is just an integrated profiler we’re making, and not a commercial product, we can live with some compromises. But obviously we want to get as much as we can out of it while still keeping the code small and simple.
So let's go take a look at what I did for the reference C++ version. My solution would mostly work in C as well, but it would not have the automatic region closing — you’d need a TimeFunctionBegin/TimeFunctionEnd pairing instead of just the TimeFunction macro.