13 Comments

Sitting here with my morning coffee and watching this before work is making me so happy right now!

It's invaluable that you spend the time to create such high-quality videos.

Thanks Casey!

- Jorge

Expand full comment

Superb explanation!

Even though we are doing optimization here, this nitty-gritty details of hardware and exploits are so fascinating.

Thanks, Casey.

Expand full comment
Comment removed
Mar 27
Comment removed
Expand full comment

Thanks, added to my list.

Expand full comment

**UPDATE: This comment was on the original video. Casey replaced the video by now, so these comments do no longer apply to the current video.**

I believe you were wrong explaining the bit around 25 minutes 25 seconds into the video. The attacker process can never access the same memory as the victim, so therefore can not time it. What I think does sound possible is that the attacker times access to its own memory to figure out if the memory got evicted from the cache, because the victim process accessed _some other_ memory that maps to the same cache set.

Another thing I believe sounds a bit sketchy to me at 46'20": the reason p is not being prefetched when loading b is exactly the opposite of what you said: because the attacker is NOT force evicting it from the cache, such that the do-not-scan-tag is still there, and so it does NOT prefetch p.

Expand full comment

I thought that is what I said at 46'20 - maybe I am just mishearing myself, but at least what I was trying to say was exactly what you said :)

As for 25:25, I think I did get something wrong, but it's not that it's different "memory". I mean, obviously it's different physical memory, because the processes are separate. But it's the same memory address, or more specifically, the same virtual addresses (that map to the cache set). But in re-listening to it, I believe I am saying the timing part backwards (meaning that the timing you're looking for is _slower_, not faster, because you're the one getting evicted - if that makes sense).

I will go listen to it more closely, and I can post a re-record of it to fix it (and I will also say "memory addresses" more specifically to avoid any confusion about the physical memory of the processes).

- Casey

Expand full comment

Update: I went back to the Prime+Probe paper to make sure, and I definitely had the timing backward. I have made a new video that explains it correctly (to the best of my knowledge).

- Casey

Expand full comment

Your dedication and integrity are amazing. 🤯 You rerecorded the whole video. That is extremely respectable! I'm so glad you are taking your time for this high-quality teaching. I'm a happy customer. 😅 The new explanation is also much clearer, and sounds fully reasonable to me. Thank you for taking the time to learn, improve, and teach.

Expand full comment

How do you evict specific pages and leave others unevicted? I am refering to those holding A in B arrays, how do we ensure that B stays in L2 so it does not get rescanned, but A is evicted? Or am I misunderstanding something

Expand full comment

That is what an "eviction set" is. You determine (through on-line testing) a series of memory addresses which map to the same cache set, such that when you go through and fetch each address repeatedly, you force the cache to keep only those cache lines in the set, leaving room for nothing else (since the cache is only 12-way in this case, etc.)

- Casey

Expand full comment

The explanation is phenomenal, thanks. I've already sent it to bunch of people.

I have two conflicting thoughts on DMP:

1. It might seem like this is the reason why some heavier websites (which are usually a pointer soup in JS) seems to run very noticeably better on M series MacBooks. I remember discussing this with coworkers, who all noticed it in Jira and Slack after upgrade.

2. On the other hand, V8 uses pointer compression to pack 64-bit pointers to 32 bits, because JS can't address more than that anyway, and this seems to nullify the DMP optimization. I wonder how does DMP interact with optimizations like this.

Same can be said about any code using indexes (entity ids) instead of pointers, which is pretty common practice. I wonder when do the perf gains you get from allocating your objects yourself outweigh the benefits from DMP and vice versa. It could be that in some cases, it's better to first map your array of indexes to array of pointers, to make sure DMP gets triggered on them, which seems like a pretty counterintuitive thing to do.

Expand full comment

Thank you for putting in the all work to make this understandable. Some people are too frickin smart, god damn! As someone who knows very little about security and only really started thinking about CPUs with this course, it was pretty mind blowing

One thing that I wonder is how security researchers decide to release findings like this? Doesn't describing this exploit in detail make it available to a lot more attackers?

Also really hoping apple doesn't have to release a fix that slows my laptop down...

Expand full comment

"White hat" security researchers, for the most part, seem pretty responsible to me (as an outsider, of course). They have a bunch of "disclosure" conventions they seem to follow voluntarily.

For example, this exploit was released last week, but they disclosed it to Apple in December! So they gave the IHV three months head start before telling the world about the exploit.

I think the idea behind these kinds of disclosure practices is to balance the need to let the public know about potential exploits vs. giving the IHV/ISV time to fix it if they're competent.

- Casey

Expand full comment

Aha, was wondering if they gave them advanced notice. That makes sense, thanks!

Expand full comment