0:00
/
0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #73 (2025-03-03)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Questions addressed in this video:

  • [00:03] “I believe that when you were talking about a Linux problem, you mentioned that rendering in 2D at any scale factor is a solved problem. I know of a few approaches that work, but I'm curious to hear how you approach this. Some approaches I've debated between in my head are

    1) Vector rendering every frame

    2) Vector rendering to an atlas at startup or whenever the scale factor changes, then sample the atlas every frame

    3) Use high-resolution image assets and scale them down to the desired size (either every frame or whenever the scale factor changes)

    I'm currently making a chess app. I want to render the board at the largest size that fits in the window. What gets me is that in the worst case, the scale factor of the pieces changes every frame. This happens when the user is actively resizing the window. Since I want the resizing experience to be smooth, it feels bad to put it on a slow path, but it also feels bad to slow down all my rendering due to something that only happens occasionally.”

  • [14:02] “For analyzing the domain of the sin function, I noticed that the different uses of sin have different domains, so that could be a point for optimization and I don't *think* it has been mentioned yet.”

  • [16:47] “Approximately what percentage of your code bases do you think you end up writing SIMD intrinsics for? In a lot of the code bases I work in, it feels to me that the majority of the code is either the "plumbing" of the data from one algorithm to the next or just trying to manage the complexity of outside systems like the OS, and so I don't see much use for SIMD in those places. Would you say this a problem with the code bases I work in? Or is this normal?”

  • [21:02] “I've experienced many ‘cancel’ buttons and ‘press esc to abort’s in my days that take almost as long, or longer, to cancel than the task at hand, or they appear to do nothing.

    Though, I can see now why this could be the case. Putting an ‘if (canceled) break;’ in every iteration of a loop or an ‘if (canceled) goto abort;’ after every step in a procedure seems rather yucky.

    Suppose you could put the task in a tread and mercilessly terminate it upon cancel. Though, depending on the task, terminating without side effects could be tricky.

    Have you ever pondered what a responsive abort would look like?”

  • [28:17] “The PeekMessage function can potentially stall applications under certain conditions. Could you shed light on how to integrate this function effectively within the gameloop to ensure smooth windowing operations, responsive resizing, and the avoidance of issues associated with the modal loop, for example? I am interested in understanding how industry-level applications implement foundational elements such as windowing, input processing, and multithreaded rendering. For example, on VSCode, I have seen that I can type in text while resizing or holding the titlebar, or in Chrome I can resize and move the window but the video is rendered consistently with jitter.”

  • [38:58] “I got some surprising performance from code I recently wrote and would like to get your thoughts. My goal was to implement a bitmap that will be shared by all the threads in a process. The map is initially all 0s. Each thread can check if a bit is set. If not, it performs some action then sets the bit to ensure that other threads don't redo the action. Note that it is OK for two thread to do the action for a given bit (in other words the bit setting doesn't need to be atomic) we just want to avoid all the threads redoing the same action many times. I saw that the `bts` x86 instruction was a good fit for this and my initial implementation was:
    However this had *terrible* scaling properties as the number of thread/core increased.

    I ended up switching to the following which just manually performs what the `bts` instruction with a memory operand did. We load the desired bit into a register, check it and conditionally set it and write it back if it was previously 0. And this second implementation scales much better (pretty much linear). Do you know what might be the problem in the first implementation?”

  • [42:35] “Are there any reasons to prefer MSVC over clang on windows nowadays?”

The full video is for paid subscribers