0:00
/
0:00

Paid episode

The full episode is only available to paid subscribers of Computer, Enhance!

Q&A #74 (2025-04-01)

Answers to questions from the last Q&A thread.

In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.

Of all the days this could have happened, it had to be April 1st, didn’t it?

Yesterday, as many of you know, I tested the new studio setup and the new “live stream” capabilities of Substack. The stream was an impromptu funeral for the Sony a6000 camera that had previously recorded all of the lightboard videos you’ve watched here on Computer Enhance. The a6000 had died and was replaced with a Sony a7 iii, and as result, the lightboard setup had to be recalibrated as well.

As part of the “funeral”, we had some background music playing to set the mood. Unfortunately, I completely forgot to remove this from the OBS settings before recording the Q&A today! So the entire Q&A now sounds much more dramatic than normal.

Ordinarily, this would be a “sorry for the mistake” kind of thing, and that would be that. But because it’s April 1st, I know everyone is going to assume that this is a joke. And furthermore, because I am the kind of person who would make a post saying that something isn’t an April Fools joke, when it actually was, you’re going to think that I’m also not telling the truth about the music.

But I am! I swear, this is just a normal Q&A video, where I legitimately answer all the submitted questions, like I always do. It just happens to have dramatic music in the background because I forgot to remove it. That’s all. There’s no joke. I don’t think I make a single joke in the entire video!

Anyway, with that gigantic caveat, the questions addressed in this video are:

  • [00:03] “The Haversine example program just does some work and then exits, but how about application that runs in a loop (games for instance)? Would you use a different profiling strategy? Like taking measurements per frame? Or is accumulating data through the entire run time of the program still the best approach. How does one account for varying run time lengths or certain sections (maybe a part of some level has a lot more enemies) being more expensive?”

  • [06:37] “I'm curious about how you architect a (potentially large) codebase in a more general sense.

    Do you think of things in a layered way, where you have a more low level layer that deals with hardware/OS, and then build other layers on top of that? Things like networking, or rendering, or file operations, memory management, your own string system, etc. Do you have something like that which is similar between projects that you've extracted out to be reusable across projects?”

  • [10:30] “I'm the one who asked the question about the domain of the Sin function and wanted to clarify what I meant. Since Sin is called twice in our Haversine reference function (once in regard to the latitude (y), and once in regard to longitude (x)), I ran different domain and range tests on both separately. I noticed that the domains of the uses of Sin were different, one being [-pi,pi], and the other being [-pi/2,pi/2]. The optimization I had in mind was simply to optimize the range reduction step of the process, cutting out some mirroring work. Let me know if you're still a little confused about my question.”

  • [14:40] “I was testing an audio system by playing a simple Sin wave, and noticed that the pitch was randomly shifting down after ~30 seconds. This was because in order to increment the 't' for the Sin, I was adding a very small number to it and so as 't' got larger, there wasn't enough floating point precision to do the computation properly.

    I was wondering if, when trying to maintain precision for floating point calculations, you try to keep the numbers small, or try to keep the operands similar magnitudes so one isn't truncated to fit into the other, etc. I know you've talked about minimizing the amount of operations, but is order operations are done- (a+b)+c vs. a+(b+c)- something we will discuss?”

  • [19:40] “Hi, following up on the bitmap bts question.

    In both cases, the start of the bitstring is in `rdi` and the offset of the bit to be checked is `rcx`.

    When `bts` is used with a memory operand, the first operand is the start of the bitstring and the second is the offset.

    So `bts [rdi], rcx` accesses the bit at address `[rdi + rcx]` which is also what the second implementation does. I get the same results with both implementation (apart from the performance difference).

    Let me know if the question makes more sense now.”

  • [27:45] “What has AMD done wrong with GPUs that has caused the company to lag so tremendously relative to Nvidia? Is AMD’s software that terrible or do you think it’s more so a hardware issue?”

  • [38:55] “Maybe a little of topic question: What is your take on the whole signed vs unsigned sizes in c debate?

    I heard people argue that besides the more logical negative result instead of overflow on pointer subtraction, there's also a performance benefit because signed arithmetic is the general case. On the other hand ptrdiff_t < size_t leads to UB on results that are to big.”

  • [44:07] “It seems like page faults have practically no impact on performance on Apple Silicon, but I really don't understand why.

    It's basically everything like in the videos, but the conclusions are the opposite. I double checked my intrumentation code, to make sure I am not deceived by wrong calculations, but everything _seems_ correct.

    Has anybody else seen anything like that?”

  • [47:12] “I've heard you mention a few times that we should store indexes instead of pointers for referencing things. I tried it out in a project I was working on but it quickly became bothersome to always do Base + Index every time I wanted to look something up. It doesn't seem like we get a space win, if the indexes are also 64 bit. It also doesn't seem too hard to back out the indexes if they are needed for serialization since I was using Arenas for memory management, so maybe my use case wasn't complex enough to see the full benefit of storing indexes. Could you share some thoughts or examples where indexes shine?”

  • [51:04] “I was wondering if there was any update on the Zen vs. Cuda vs. Tensor core series?”

  • [52:39] “I was looking at the cpuid instruction in the intel x64 manual, and noticed that you can use it to query the TSC frequency directly with EAX value 0x15, without needing to do something like an undocumented NT call. Is there any reason not to use this instead of estimating the frequency using QPC if available?”

  • [54:50] “I'm doing the course in Java and, AFAIK, I can't use intrinsics without using native methods and thus doing a function call (e.g.: a wrapper for RDTSC I found: https://github.com/Deamon5550/Intrinsic).

    With this setup, if I call the RDTSC wrapper function twice in a row, I get a difference in the hundreds of cycles (the frequency is ~3.4 GHz). It's usually around 400 cycles, but I've seen outliers from 270 to 800+.

    How big of a problem is this going forward? Any tips on how to mitigate it? If I do the difference between two calls repeateadly in a loop; the average goes down to 30-40 cycles.”

  • [57:57] “Hi Casey, thank you for the great course! You previously mentioned that having an Undefined Behavior in the C/C++ languages was a bad choice because it leads to many problems and gives almost nothing in return. What do you think about strict aliasing in that regard? Do you find it useful?”

The full video is for paid subscribers