Halloween Spooktacular Day 1: The Challenge
Here are the API requirements you must satisfy to successfully complete the 2024 Halloween Spooktacular Challenge.
As I explained in the announcement, collecting PMCs should have been simple. It’s simple at the hardware level, so it should be simple at the software level.
The Windows API, unfortunately, makes it heinously complex. Your Spooktacular Challenge is to hide that complexity and present a future user with something as simple as you can manage to make it.
Many of you follow our courses here in a variety of languages, so please feel free to define any style of API you want, in any language, so long as it is simple, easy to use, and provides the following three features:
1. PMC selection
CPUs provide a large number of PMCs — often more than a hundred — but only a few of them can be active at any time. Prior to collection, your API should allow the user to select which PMCs they are trying to collect.
Windows already provides an easy way for a user to know which PMCs are available through ETW on their machine. They can use the built-in wpr
command in Windows like this:
wpr -pmcsources
to get a list of all the possible PMCs by name (and number).
In my code for the Challenge, I allow the user to put the UTF-16 ETW names into an array and pass it in to my API:
pmc_name_array AMDNameArray =
{
L"TotalIssues",
L"BranchInstructions",
L"BranchMispredictions",
L"IcacheMisses",
};
pmc_name_array IntelNameArray =
{
L"TotalIssues",
L"BranchInstructions",
L"BranchMispredictions",
L"CacheMisses", // NOTE(casey): Intel doesn't expose I$ misses through ETW in a default install, unfortunately
};
pmc_source_mapping PMCMapping = MapPMCNames(&AMDNameArray);
if(!IsValid(&PMCMapping))
{
PMCMapping = MapPMCNames(&IntelNameArray);
}
I provide MapPMCNames
— which takes a name array and returns a mapping that can be used later — as well as an IsValid
call that can be used to see whether or not the names were successfully mapped. I implemented it this way because I wanted it to be easy for the user to try a few different arrays and use whichever one worked. This avoids forcing them to query the CPU type and Windows version in order to figure out which PMCs might be available — they can just try all the variants they support, and if any work, they can proceed.
Once they’ve mapped their PMCs, they can start (and stop) using my version like this:
pmc_tracer Tracer;
StartTracing(&Tracer, &PMCMapping);
// ... tracer usage goes here ...
StopTracing(&Tracer);
Of course, I’m only showing my code as an example of how your version might work. You don’t have to do it exactly like this. As long as your API allows you to select a set of PMCs to sample, it meets criteria #1, so feel free to design the API the way you want, in the style that makes sense for your language of choice.
2. PMC collection within a bracket
The goal of this challenge is to sample PMCs during normal program operation, without forcing the user to restructure or extract parts of their program into a test harness. Your API must allow the user to easily bracket any piece of code for PMC collection.
In my version, it works like this:
pmc_traced_thread TracedThread;
StartCountingPMCs(&Tracer, &TracedThread);
// ... code to measure goes here ...
StopCountingPMCs(&Tracer, &TracedThread);
The idea here is to mimic rdtsc
bracketing as closely as possible — the user only has to insert a “start” and “stop” line into their code to collect PMCs. However, you’ll note that it’s a bit more complicated than rdtsc
-based collection — specifically, it doesn’t directly return the PMC counter results at the “stop” line, and instead relies on a pmc_traced_thread
structure to track bracketed regions.
Why?
This inconvenience is unfortunately due to the nature of ETW, and can’t really be avoided. Because ETW is asynchronous, you can’t guarantee when you will receive your event data. So you only have two choices:
Return results immediately at the “stop” call, which would be more like
rdtsc
, but would require halting the user’s thread until ETW decides to provide the data, orProvide a way to query results asynchronously, so the user’s thread can keep running across many bracketed regions without stopping.
Since the last thing you want a profiler to do is suspend threads while it waits for sampling data, I chose option #2, and your API should too!
3. Retrieval of results
Finally, the user must be able to retrieve their profiling results. As discussed in #2, this has to support at least asynchronous retrieval.
In my version, you can ask if a particular trace is complete by calling an IsComplete
function with the pmc_traced_thread
used to bracket the region:
if(IsComplete(&Tracer, &TracedThread))
{
// ... process the results here ...
}
When IsComplete
returns true for a particular trace, getting the results is straightforward — a GetResult
call returns a struct with an array of the counters:
pmc_trace_result Result = GetResult(&Tracer, &TracedThread);
for(u32 CI = 0; CI < Result.PMCCount; ++CI)
{
printf(" %llu\n", Result.Counters[CI]);
}
Also, for users who don’t care about asynchronous retrieval, I supply a utility function called WaitForResult
that will block until the results are ready — so if the user doesn’t care, they can use that instead:
pmc_trace_result Result = WaitForResult(&Tracer, &TracedThread);
for(u32 CI = 0; CI < Result.PMCCount; ++CI)
{
printf(" %llu\n", Result.Counters[CI]);
}
As with before, you don’t have to implement exactly these functions. So long as your implementation provides a simple way for the user to retrieve their profile counter values, and properly correspond them to a particular trace they started/stopped, it meets criteria #3.
No Cheating!
“For the avoidance of doubt”, as the lawyers say, to pass the Spooktacular Challenge you must implement an API that satisfies all three of the above requirements without using any kernel-mode code other than that which is present in a default Windows install1.
You’re not allowed to cheat and write your own kernel-mode code, or install something like winring0 or Intel Performance Counter Monitor! Your EXE must run on a vanilla installation of Windows 10 or 11, without installing anything, from the standard “Run As Administrator” command prompt2.
That said, although you’re not allowed to install kernel drivers, you are allowed to use any documentation or reference code you can find on the internet. You do not have to try to figure this out blind, or just from MSDN. Documentation on ETW PMC collection is vanishingly scarce, so this challenge will be sufficiently difficult without restricting your resources. Feel free to post on Stack Overflow, copy code from github, or anything else you can do to scrape together a working implementation. Confronting the maddening horror of ETW will be plenty scary even with the whole internet behind you!
With that, I bid you the best of luck. As foretold in the announcement, I will post the first hint here tomorrow, and an additional hint every day until Halloween.
Let the great Computer Enhance 2024 International Event Tracing for Windows Halloween Spooktacular Challenge begin!
If you’d like the rest of the Spooktacular Challenge to be delivered automatically to your inbox, you can select a subscription option here:
In case there are any security researchers reading this, I chose that wording very carefully. Yes, I would consider it passing the challenge if you somehow exploited the existing Windows kernel code to sample and return PMCs to you without using ETW. This would be an impressive hacking feat in its own right, and I wouldn’t want to discourage anyone from pulling it off if they think it’s possible!
PMC collection via Event Tracing for Windows will only work when running as administrator. Ostensibly, this prevents hackers from using PMC collection to mount side-channel attacks. In practice, however, hackers always seem to be able to mount side-channel attacks anyway, so I’m not sure who is really being helped here…
I'll be taking the challenge in my own tempo, and expect to finish it way after everyone else. But first of all I'm intrigued by the initialized pmc_name_array type. Looks like something that will require a metaprogram translating it to a type in a generated file:
typedef wchar_t * pmc_name_generated12247;
And then in the translated version of the cpp file, pmc_name_array is replaced with
pmc_name_generated12247 AMDNameArray[CountInTheInitialization]
where CountInTheInitialization is the preprocessor counting the number of values in the initialization and inserting it
but that sounds like a very internal API, so I'm wondering if I'm missing something obvious
This has been really fun! What do I do if I think I solved it?