Once you know how CPU address translation works, you can use it to do some surprisingly powerful operations that would have seemed difficult or impossible otherwise.
Looking at this code been a long time since I used cpu's the circular buffer seems only to store 8 bit values is this due to the allocation method how woudl you allocate 32 bit integer storage
I wonder if virtual memory was available for the xbox360. I remember doing a simple playback function to show the last 5s of gameplay before the player lost. For this I just created a circular buffer but managed it myself. This was at the beginning of my career and didn't knew this kind of tricks.
Great timing. I just so happen to be deep in kernel development on this exact subject matter.
Do you think it would make sense to provide a facility in which the kernel provides distinct virtual address spaces for more than one application to facilitate memory mapping to a single physical space?
Do you mean allowing two processes to share the same physical memory? If so, yes, that is a definite requirement for operating systems these days. Windows, Linux, and MacOS all support this feature as a means for fast interprocess communication.
On Linux mmap has a MAP_32BIT flag to put the mapping into the first 2 gigabytes of the address space. The manual also says about the flag that "it was added to allow thread stacks to be allocated somewhere in the first 2 GB of memory, so as to improve context-switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this performance problem, so use of this flag is not required on those systems."
So this kind of mapping is available, but just using 32bit offsets is probably the more straight-forward choice.
For the 32-on-64 example (listing 124) I believe you could also use VirtualQueryEx to enumerate the occupied virtual ranges from 0 to 2^32 and then reserve the gap ranges, as an alternative to your trial-and-error VirtualAllocs.
Since RIP-relative addressing is +/- 2 GB and since .bss is zero filled without taking up space on disk you could also consider a "safe" array size like 1 GB and still be in range of all instructions in the executable's code section. I tested that this works with clang, but it does require that you're okay with committing (not just reserving) 1 GB (or whatever size) at executable load time. Compared to just the movzx for the low 32-bit address space allocation, this costs you an extra RIP-relative lea.
My assumption for how you would ship this if you wanted to (which I don't, but...) is that you would first use VirtualQuery to map the range and build your own internal memory allocation map. You would then allocate out of that map, and every time the corresponding VirtualAlloc failed, you would then re-probe the covered region with VirtualQuery to update your map.
I'm pretty sure this is not worth it, because even with that (or the new VirtualAlloc2 API), you still have the intractable problem of the OS not guaranteeing any particular amount of available address space in that range. So shipping a program with 32-bit pointers seems to risky, because no matter how small you make the required memory for that part of the program, it still could just not have that space available below 4GB, technically.
Maybe that's not a real practical concern, IDK! But it's enough to scare me away from using it, since it's not a particularly valuable technique. You could just use 32-bit indexes into buffers, which is what I usually do most of the time anyway.
Yeah, I feel equally squirmy about shipping it to desktop users. When I brought this up with Paul Khuong he indicated that he had shipped the "map unused space in 0 to 2^32" thing in production (which for him usually means to thousands of servers running for billions of CPU hours) but that's in a server environment where they have a lot more control over the operating conditions.
An update so on newer machines if I use the new functions they fail with 32 bit pointers but the old libraries work fine any ideas ?
Looking at this code been a long time since I used cpu's the circular buffer seems only to store 8 bit values is this due to the allocation method how woudl you allocate 32 bit integer storage
I wonder if virtual memory was available for the xbox360. I remember doing a simple playback function to show the last 5s of gameplay before the player lost. For this I just created a circular buffer but managed it myself. This was at the beginning of my career and didn't knew this kind of tricks.
Great timing. I just so happen to be deep in kernel development on this exact subject matter.
Do you think it would make sense to provide a facility in which the kernel provides distinct virtual address spaces for more than one application to facilitate memory mapping to a single physical space?
The functionality is pretty straightforward.
Do you mean allowing two processes to share the same physical memory? If so, yes, that is a definite requirement for operating systems these days. Windows, Linux, and MacOS all support this feature as a means for fast interprocess communication.
- Casey
On Linux mmap has a MAP_32BIT flag to put the mapping into the first 2 gigabytes of the address space. The manual also says about the flag that "it was added to allow thread stacks to be allocated somewhere in the first 2 GB of memory, so as to improve context-switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this performance problem, so use of this flag is not required on those systems."
So this kind of mapping is available, but just using 32bit offsets is probably the more straight-forward choice.
It's nice to have these collected in one place.
For the 32-on-64 example (listing 124) I believe you could also use VirtualQueryEx to enumerate the occupied virtual ranges from 0 to 2^32 and then reserve the gap ranges, as an alternative to your trial-and-error VirtualAllocs.
Since RIP-relative addressing is +/- 2 GB and since .bss is zero filled without taking up space on disk you could also consider a "safe" array size like 1 GB and still be in range of all instructions in the executable's code section. I tested that this works with clang, but it does require that you're okay with committing (not just reserving) 1 GB (or whatever size) at executable load time. Compared to just the movzx for the low 32-bit address space allocation, this costs you an extra RIP-relative lea.
My assumption for how you would ship this if you wanted to (which I don't, but...) is that you would first use VirtualQuery to map the range and build your own internal memory allocation map. You would then allocate out of that map, and every time the corresponding VirtualAlloc failed, you would then re-probe the covered region with VirtualQuery to update your map.
I'm pretty sure this is not worth it, because even with that (or the new VirtualAlloc2 API), you still have the intractable problem of the OS not guaranteeing any particular amount of available address space in that range. So shipping a program with 32-bit pointers seems to risky, because no matter how small you make the required memory for that part of the program, it still could just not have that space available below 4GB, technically.
Maybe that's not a real practical concern, IDK! But it's enough to scare me away from using it, since it's not a particularly valuable technique. You could just use 32-bit indexes into buffers, which is what I usually do most of the time anyway.
- Casey
Yeah, I feel equally squirmy about shipping it to desktop users. When I brought this up with Paul Khuong he indicated that he had shipped the "map unused space in 0 to 2^32" thing in production (which for him usually means to thousands of servers running for billions of CPU hours) but that's in a server environment where they have a lot more control over the operating conditions.
It seems totally fine for server code. Anything where you control the deployment, really.
- Casey