9 Comments

Such an interesting read ! Thank you for sharing it with us !

Expand full comment

Stupid me, I forgot I have laptop with 12700H. Not sure what a difference between between mobile and desktop cpus, but I could to test things on windows 11 and linux.

Expand full comment

Another Casey banger :)! Thank you!

Expand full comment

With discoveries like this, every passing day my jestful claim that Assembly is really a declarative language, rings more and more true. Thank you for these deep dives!

Expand full comment

I definitely initially misread the title as “The Case of the Missing Excrement” and was *really* confused

Expand full comment

Since I had to use Event Tracing for Windows, I can assure you that not only was the excrement not missing, it was present in abundance.

- Casey

Expand full comment

I have an Alder Lake CPU, and observed this when doing the homework for part 3.

I have also tested this on a Raptor Lake CPU, which follows the exact same pattern.

I'm using Google Benchmark, since it makes it really easy to read the performance counters you want, without recompiling.

I was able to create a few other benchmark programs that probe a bit at how the CPU executes these things. I've observed that the front-end will fuse these instructions with a jump when they occur contiguously, at which point each cycle can only execute a single of immediate addition or subtraction. Adding a nop prevents the fusion, and the optimization happens again. So a question would be how often the CPUs leverage this optimization in the wild.

I also found something in Agner Fog's microarchitecture manual, in the section about Alder Lake, where it says: "Integer addition with a small immediate constant has zero latency in some cases."

I've created a gist with my benchmark program, and the output from running these on Alder Lake and Raptor Lake CPUs, with a few relevant performance counters: https://gist.github.com/danielbendix/a377a976e62b6e8a8ea9c93636f0ff1e

Anyone let me know if you have something you'd really like tried on these, and I'll see what I can do.

Expand full comment

Great article thanks for sharing this research! And, sorry you had to go through the Ultimate Sadness...

On another note I wonder why they didn't stick with this for newer processors? Maybe it was only something they experimented in Golden Cove and turned out not as beneficial?

Expand full comment

I am not certain what processesors have this, since I only was able to test Golden Cove. It's possible that it does happen in some other Intel processors!

- Casey

Expand full comment