I'll be taking the challenge in my own tempo, and expect to finish it way after everyone else. But first of all I'm intrigued by the initialized pmc_name_array type. Looks like something that will require a metaprogram translating it to a type in a generated file:
typedef wchar_t * pmc_name_generated12247;
And then in the translated version of the cpp file, pmc_name_array is replaced with
It's just a plain struct wirh a wchar_t array in it - nothing fancy! It doesn't need to be dynamically sized because hardware PMCs are alway limited to just a few counters, so you can just make the array 8 long and know you're good.
Whatever you want! If you just want to record proof that you solved it before a certain number of hints, you could upload a GIST or github version and not tell anyone, but that will record that date when you made the code :)
This is the first part of the course that seems completely impossible to follow on linux. I'm not even sure I understand the problem. You perf_event_open with the counters you want and then you read them. The 'default' api on linux seems to do exactly what you suggested.
This is not part of the course, it's just something I wanted to post.
Separately, though, that is correct: Linux already had a good API for PMC collection - I mentioned that in the announcement in fact, if I remember correctly! It is only Windows that has a horrifically twisted API for this.
Can we have a little more information on the usage we need to be able to reproduce ? For example if we want to take different PMCS on two regions in the same hot loop, can you do that ( meaning the zone gives you the hitcount, avg , min, max for each zone with different PMCS) ? Can you nest your collection zones (recursing or just taking different PMC samples inside a bigger region) ? Can we take different PMCS on different threads at the same time ? What do we need to validate the challenge ?
Getting PMC data from the Win32 API is hard enough that if you get them at all before the hint on Day 8, I would consider that alone to be passing the challenge :) But in terms of what you could achieve maximally:
1) You cannot collect different PMCs for different regions at the same time, as the overhead for doing that on Win32 would be prohibtive. It only supports one set of PMCs globally, so it would not work threaded, and would be very expensive to start and stop. So that is not part of the challenge. If you figured out some way to do that efficiently or threaded, you are way more advanced than what I figured out!
2) Nesting is possible.
3) Simultaneous collection from multiple threads is possible.
Again, just getting anything working at all before Day 8's hint would be impressive. Getting things like nesting and multithread working is even harder, but it is possible.
You are welcome to decide how hard you want to make the challenge :)
Well now, this course is exceeding my expectations. It's fun to think that not only is this course bringing your students up to speed on basic performance skills, but it'll actually put us ahead of everyone else in at least one small way because we'll have this novel performance utility that no one else has :)
Or, reeling in this optimism a bit, we'll be the first to have this tool _on Windows specifically_. And that's only the case because Microsoft really dropped the ball. But still, pretty cool!
I am on Windows 10 and it works for me. Must be something more specific than that. Maybe post which CPU you're using?
Edit: Don't know about secret incantation. The `wpr -pmcsources` command just worked for me. Maybe I got lucky or maybe it was something we did earlier in the course.
Since this is part of the challenge, I am being intentionally vague :) There will be a hint in the future that should help folks who hit this problem and can't figure out how to resolve it.
I'll be taking the challenge in my own tempo, and expect to finish it way after everyone else. But first of all I'm intrigued by the initialized pmc_name_array type. Looks like something that will require a metaprogram translating it to a type in a generated file:
typedef wchar_t * pmc_name_generated12247;
And then in the translated version of the cpp file, pmc_name_array is replaced with
pmc_name_generated12247 AMDNameArray[CountInTheInitialization]
where CountInTheInitialization is the preprocessor counting the number of values in the initialization and inserting it
but that sounds like a very internal API, so I'm wondering if I'm missing something obvious
It's just a plain struct wirh a wchar_t array in it - nothing fancy! It doesn't need to be dynamically sized because hardware PMCs are alway limited to just a few counters, so you can just make the array 8 long and know you're good.
- Casey
This has been really fun! What do I do if I think I solved it?
Whatever you want! If you just want to record proof that you solved it before a certain number of hints, you could upload a GIST or github version and not tell anyone, but that will record that date when you made the code :)
- Casey
This is the first part of the course that seems completely impossible to follow on linux. I'm not even sure I understand the problem. You perf_event_open with the counters you want and then you read them. The 'default' api on linux seems to do exactly what you suggested.
This is not part of the course, it's just something I wanted to post.
Separately, though, that is correct: Linux already had a good API for PMC collection - I mentioned that in the announcement in fact, if I remember correctly! It is only Windows that has a horrifically twisted API for this.
- Casey
Can we have a little more information on the usage we need to be able to reproduce ? For example if we want to take different PMCS on two regions in the same hot loop, can you do that ( meaning the zone gives you the hitcount, avg , min, max for each zone with different PMCS) ? Can you nest your collection zones (recursing or just taking different PMC samples inside a bigger region) ? Can we take different PMCS on different threads at the same time ? What do we need to validate the challenge ?
Getting PMC data from the Win32 API is hard enough that if you get them at all before the hint on Day 8, I would consider that alone to be passing the challenge :) But in terms of what you could achieve maximally:
1) You cannot collect different PMCs for different regions at the same time, as the overhead for doing that on Win32 would be prohibtive. It only supports one set of PMCs globally, so it would not work threaded, and would be very expensive to start and stop. So that is not part of the challenge. If you figured out some way to do that efficiently or threaded, you are way more advanced than what I figured out!
2) Nesting is possible.
3) Simultaneous collection from multiple threads is possible.
Again, just getting anything working at all before Day 8's hint would be impressive. Getting things like nesting and multithread working is even harder, but it is possible.
You are welcome to decide how hard you want to make the challenge :)
- Casey
Well now, this course is exceeding my expectations. It's fun to think that not only is this course bringing your students up to speed on basic performance skills, but it'll actually put us ahead of everyone else in at least one small way because we'll have this novel performance utility that no one else has :)
Or, reeling in this optimism a bit, we'll be the first to have this tool _on Windows specifically_. And that's only the case because Microsoft really dropped the ball. But still, pretty cool!
It appears that Windows 10 does not support these PMCs, the wpr command only returns lonely Timer.
Oh, it supports them. You just didn't figure out the secret incantation!
- Casey
I am on Windows 10 and it works for me. Must be something more specific than that. Maybe post which CPU you're using?
Edit: Don't know about secret incantation. The `wpr -pmcsources` command just worked for me. Maybe I got lucky or maybe it was something we did earlier in the course.
Since this is part of the challenge, I am being intentionally vague :) There will be a hint in the future that should help folks who hit this problem and can't figure out how to resolve it.
- Casey
That is already a useful hint! I had just assumed my hardware or driver was limited to the one Timer event.
I've got AMD 5700X, it should definitely support this command.