In each Q&A video, I answer questions from the comments on the previous Q&A video, which can be from any part of the course.
Questions addressed in this video:
[00:04] “Many people on the internet suggest that ‘returning early’ from a function reduces complexity and makes code more readable. Your seem to prefer the complete opposite in your code, i.e. large and nested if statements. Why is that?”
[06:38] “You mention that JSON is terrible for performance. However, would you still recommend it for non-performance-critical use cases, such as configuration or settings files, where readability and simplicity take precedence?”
[11:49] “I am curious about std::execution from the C++ standard library, e.g. std::execution::par for parallel processing. Is it multi-threading? Is it vectorization (SIMD)? It's definitely faster. Is there a way to get the same behavior without standard library? Thanks.”
[14:30] “Hi! I have a question about how CPUTimerFreq is used in NewTestWave when using the repetition tester. TLDR: Shouldn't the CPUTimerFreq value in NewTestWave be updated each time using EstimateCPUTimerFreq() for accurate comparisons, instead of always using the same initial CPUTimerFreq value?”
[20:12] “When computing sines, cosines and similar, using tables looks promising in order to reduce the amount of online computation needed.
Also, the library math functions are geared towards computing one value at a time, or sometimes two as with sincos. When doing the computation in batch with independent computations as is the case with the haversine problem, it looks like it should be possible to use the simd instructions better than how they are used in the normal math library.
Are there any good libraries for batch computations that has table size as a parameter so one may easily experiment?”
[22:30] “I am a bit behind i was working through listing 150_read_widths and noticed a couple of things that made my code a lot worse performing than yours and i put it down to 2 things which i only half understand. Firstly you switched at some point from the previous listings to using rax as the comparator and using jb instead of the sub rcx of the counter and jnle (for example in listing 146)
Using the latter seems to halve, or worse the throughput measured , i was able to reproduce this on your code by changing it as above. If you explained why you made this change I missed it i tried to go back and forth on the videos but could not find it.
Secondly, changing VirtualAlloc in your code to malloc also simialrly affects performance but my instinct was that since we are only doing 1 allocation ever of 1gb why would it matter whether we use malloc or virtual alloc. Is this somehow because of the alignment of the memory returned by virtalalloc versus malloc.”
[26:02] “What do you think about NISC (no instruction set, in contrary to RISC or CISC)? From my understanding, it may move a lot of responsibility from processor to compilers, allowing for more performance and cheaper chips, as there will be no decoding and compilers already know all dependency chains, while having a lot of computational power to do optimizations.”
[33:40] “Does the branch predictor predict well if the branch changes but then remains stable. Ie the first 10000 calls a function pointer points to A ( or jnz is taken) , after which for the next 10000 it points to B (or jnz is not taken)?
We already tested that the branch predictor will successfully predict the first steady state , but im not sure if it will always adapt to a change later. In my mind im imagining a massive table of all jumps that have some statistic on whether a branch should be taken but i guess it couldnt be that?”
[37:08] “What do you mean by paraphrase the writings on the wall for windows" and that we are all going to be linux programmers eventually. I dont mind :) , but i have not heard that perspective from you before and am not sure if this came up somewhere else ,is windows dead/dying?”
[48:18] “Been reading some materials on how to correctly do dynamic dispatch between different instructions set (like one hand written function using SSE intrinsics and another one using AVX2 for example), and a lot of people seem to go for some SIMD libraries like Google Highway. You said once, in the molly rocket discord, that you were using CPUID for it, and having different builds. Can you explain a little more in detail how you do it?”