Powerful Page Mapping Techniques
Once you know how CPU address translation works, you can use it to do some surprisingly powerful operations that would have seemed difficult or impossible otherwise.
This is the eighth video in Part 3 of the Performance-Aware Programming series. Please see the Table of Contents to quickly navigate through the rest of the course as it is updated weekly.
Although user-space applications are not allowed to directly manipulate CPU page tables, most consumer operating systems provide APIs that allow you to use them in powerful ways nonetheless. In this overview video, I cover three examples of things you can do with CPU address translation hardware using standard Windows page table APIs:
Automatic Circular Buffers (Listing 121). This technique allows you to have circular buffers handled transparently at the CPU level. Rather than use utility functions or a class to implement circular buffers manually, circular buffer mapping makes memory appear to be circular automatically, without any changes to existing code that uses the buffer.
Automatic Change Detection (Listing 122). This technique leverages the “dirty” bit of the x64 CPU page tables to automatically track which memory has been modified, and when, during the run of a program. It can be used to do things like record the run of the program and return to any prior state, or to implement features like “undo”, all without the need to modify any of the code that uses the memory.
Sparse Memory (Listing 123). This technique allows you to allocate extremely large regions of the address space that would not fit in physical RAM (or potentially even the page file) and use any reasonably-sized subset of it as if it were actually there. While some OSes do this automatically (I believe Linux, for example, can do this by default1), on Windows it is a bit more cumbersome. To keep it straightforward, in this listing I show the way to map the pages inline, right where they are used — but you can also take a more sophisticated approach and use a fault handler to map the pages when they are first written. This would make it automatic sparse memory, and it would then be completely transparent to the underlying code.
In addition, although I do not cover it in the video, I’ve also included an example listing for a technique that was asked about during one of the course Q&As:
32-bit Pointers in 64-bit Programs (Listing 124). Since the address range below four gigabytes is still mappable by the CPU page table scheme, you can technically use raw, absolute 32-bit pointers even in a 64-bit program that still uses 64-bit pointers elsewhere. Although the OS’s lack of guarantees about memory availability in that low address range make this an inadvisable thing to ship in a consumer application, in a server where you control the machine, it works trivially. Why you would want to use this technique instead of just using 32-bit offsets, I have no idea, but it does work.
If you like these sorts of programming explorations, that’s exactly what we do here on Computer Enhance. Next week, we’ll be resuming out Performance-Aware Programming Series, which is designed to teach programmers how to think about, evaluate, and improve software performance. If that sounds interesting to you, you can check out our subscription options right here:
David Buunk mentioned on X that all you need to do to get this behavior on Linux is add the MAP_NORESERVE flag when calling mmap, and Linux will automatically treat the memory as sparse instead of trying to reserve swap space for the entire range. This is quite a bit simpler than Windows, where you would have to write your own fault handler to get equivalent behavior.
An update so on newer machines if I use the new functions they fail with 32 bit pointers but the old libraries work fine any ideas ?
Looking at this code been a long time since I used cpu's the circular buffer seems only to store 8 bit values is this due to the allocation method how woudl you allocate 32 bit integer storage