When I went back to solve interview question #4, I found that I ended up with a more efficient version of the algorithm than the ones I've seen published.
Just wanted to say that I thought this particular problem overview was incredibly insightful. The digital differential analyzer was new to me, and drastically simplified the algorithm. I loved your presentation and willingness to be clear and thorough. Thank you!
Amazing video! I didn't even realize an hour had passed until I reached the end. I was glued to the screen the entire time, except for a few moments when I paused to think for a bit.
Since I have never seen this particular formulation before, I don't know if any demoscene people would have been using it exactly... but they would likely have been using some kind of related DDA-style technique, like one of the more common formulations (Bresenham circle, midpoint circle algorithm, etc.)
Those are filled circles, though, so it's going to be a slightly different situation. Also it depends how you wanted to draw them - one way is to use the CPU, if that was your goal, and another way is to use the blitter. One could also imagine using both, filling different parts of the circle with each.
Actually, in a demo, I could even imagine using the copper list for such things as well (to fill solid interiors on 8-pixel boundaries), but I haven't programmed an Amiga in 30 years so I would have to go digging through the manuals to relearn the specifics of each of these things :)
It has low X resolution, but the CPU and/or blitter are also in play. So my thinking was, perhaps you fill just the edge pixels to cover the 8-byte boundary (or whatever it is), and then you put a copper list entry in to fill the interior. I mean if you wanted a "maximum speed circle", I could image using all three chips... kinda makes me what to try this myself :)
A bit of unrelated question to the topic. The videos on your yt channel have the comments disabled. Is there a reason for that? I'd assume that a large portion of the comments would be unhelpful if not actively counterproductive, but wouldn't they also invite at least some valuable and helpful discussion? I would be interested to hear your thoughts on that.
I really love problems like this where some up-front math on the blackboard can simplify the algorithm a lot, I have never seen or thought about this way of drawing circles before, definitely interesting. Could you comment on how this algorithm would perform today on modern hardware, should I consider going for an approach like this if filling out a bitmap on the CPU-side (not talking about a shader actually doing this)? I guess the explicit computation using sines and cosines and floating point arithmetic is really fast today and would be easier to implement with SIMD and that the branching version of the algorithm you proposed would suffer due to the branch predictor guessing wrong a substantial amount of times?
Scatter-write algorithms like this don't tend to be good choices on modern CPUs which do not have high scatter throughput, so it is unlikely that you would ever want to use DDA-style algorithms for drawing anything on a CPU today.
I was a bit confused by your explanation for case 2 in your insight, but after thinking about it more I think its because you misspoke slightly. I think the point is not that we turn the right side slightly positive so that its even more larger than the left side, but rather that by negating the left side it goes from very negative to very positive and by not touching the right side it remains slightly negative. This means that the above <? comparison is false so we choose the right point.
I agree, I think case 2 is wrong. But depending on what problem you're solving, it may not matter. In the end, we see that Casey uses integer coordinates for the grid of pixels, in this case, it looks like we're always in case 1 or 3. To have case 2, we would have to use floating-point coordinates, and we would have to start a step with a coordinate that is somewhere on the right side of a pixel.
I have updated the video. For the hopefully-more-clear-and-convincing explanation, here is a link to the correct timestamp: https://youtu.be/JtgQJT08J1g?t=1858.
Actually, I am fairly confident that all the cases work. I have recorded a patch where I don't screw up the explanation, and will replace the video. Hopefully the new explanation will be convincing :)
I presume a lot of old source code were doing this type of computation, whereby looking at the code, you wouldn't be able to grasp what was in the programmer's mind. My question is,
"If efficiency is mandatory, and you have to make the algorithm reduced to
the delta mindset, like here. How would you make it more readable?"
Is it by writing in a comment the mathematical formula? By telling each steps you've made in this particular problem?
Yes, when I do stuff like this that is _very_ complicated (I don't consider DDAs to be complicated, because they're common in graphics), I typically have a giant multi-page comment at the top of the file that explains the entire algorithm and how it was derived. This is as much for myself as for other people, because subtle things are easy to forget, even if you were the person who figured them out in the first place :)
As an example, the bspline solver I wrote at RAD for the character animation system begins with around 200 lines of comments explaining how it works and how the notation in the file maps to the mathematical representation of the operation.
Just for clarification: Is this algorithm pixel accurate? Because I don't see any 0.5 bias / offset in the calculations. For example, a circle with radius 0 (point) would start at x=0.5 and with radius 5 would start at x=5.5. Then to continue with integer calculations, everything is multiplied by 2 or so.
Thanks again, Casey, for showing us this simple algorithm that looked difficult at first. As always, very entertaining and educational to watch your videos. We need more professional programmers like you to teach us *good* and *performance-aware* programming.
Although I have not analyzed it extensively, I believe the algorithm is pixel-accurate. But it treats pixel "centers" as being the whole integer coordinates (as was traditional for fast integer rasterization, I believe). If you would like pixel centers to be at {+0.5, +0.5} instead, you would have to rework it.
Interestingly enough, your version starts to become slower around a 200 pixel radius. I don't really understand why, given the branching and more arithmetic operations in the other one.
Does it have to do with it plotting only 4 Quadrants while using more iterations (Less memory accesses)?
I timed the functions first while they were drawing to a buffer which got displayed via ResizeDIBSection in a loop. (With processing window messages, clearing the buffer)
Now I wrote a minimal version in which both versions just draw 10000 circles to a buffer.
In this version your variant is consistently 30 -50 % faster.
I guess I don't really understand the implications of the windows stuff regarding the execution of these functions.
I am happy to look through any of this stuff if you're curious (we can control for caching and analyze the ASM and see what's going on in there, etc.), but, as I mentioned in the video, modern machines aren't designed to operate this way. They really really REALLY hate things like repeated scatter-writes, which is what DDA-style algorithms do. So in general, perf-profiling this sort of thing on a modern machine is not useful, because you will always totally destroy this kind of algorithm by using a tiled batched rasterizer instead (eg., one that works on, say, 64x64 squares and rasterizes all shapes into that square before moving on to the next square).
In my programming journey I tried to work through things
in a way tied to the evolution of the technology. For me this makes understanding the inner workings of the system much easier, since the older ones are a lot less complex. (With the 8086 part of the series I felt right at home :) ). So after toying around a lot with asm on dos, I kind of need a "smooth" transition to the complexity of modern, highly parallelized systems. In my understanding that is what you will cover mostly in the future of the series, so no worries there.
Wow, this is crazy cool. You did accidentally write a "-4" instead of "4" for F's delta at the very end, haha. How often do you find problems can be simplified by this "DDA" approach?
Since we have to use integer registers, we can never divide our equations unless everything is still an integer afterward. So the 1/2 that you've got there is the thing that stops us.
Wow, this is just the right course, thank you for doing this. I hope you do not think I am playing the wise guy here.
My first thought was does it really matter, couldn't we just drop the one-half? It is really small against the r^2. Taking a small nap and thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula:
-(x^2 + y^2) +x + C < 0
-r^2 +x +C < 0
now the r^2 cancel out.
x + 1/2 < 0
or if you don't like the 1/2 2x + 1 < 0.
We do not even need r and y. All depends on x which makes sense since we only calculate the first slice.
I am not sure, this seems extremely simple. So I am trying to code this and try if it actually works. Unfortunately I am really bad at C++. I will try in Rust which I am more comfortable with, though this might also be a good opportunity to get familiar with C++ a little. I try to summarise again, provided I understood this right -2x^2 +2x -2y^2 + C < 0 with C = 2r^2 + 1 is what we try to evaluate to determine if we take the right or the left pixel.
Now we know that x^2 + y^2 = r^2 so we can replace this which leads to
2x + 1 < 0
So it should be sufficient to evaluate 2x + 1 < 0 to make a decision.
But, having said the above, I am new here and don’t want to irritate anyone. I will try to code this and see what happens.
One thing that I cannot grasp right now, is, if the simplification is too much because of the assumptions taken on the absolute value. But coding this should settle it, so give me some time to see if it actually works.
But that is only true if you have never moved from the initial point. X only equals R until you change X. Once you change X, that identity won't hold anymore. So I don't think you can do what you're doing here?
Another way to say this is, obviously the decision depends on both X and Y. If it didn't, it couldn't be a circle. So removing Y from the equation entirely means you've definitely done a substitution which only works for one particular Y value.
This is math, so I am rather week here and I might be totally wrong. But my thought was the circle is x^2 + y^2 = r^2. r is a constant and if I fix x y is determined. Well not fully, since y = sqrt(r^2 - x^2) and there a two solutions, but we got this covered via the genius symmetry you apply. I am currently on my mobile, so I cannot code this right now, and it is rather late here in the UK, but once at home I will try to code it, and see if it works.
> thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula
The point of the implicit formula is to *test* if the (x, y) coordinate is on the circle (or is outside or inside of it). Your substitution *assumes* (x, y) is on the circle, so isn't valid here.
For the case where we go up and left, the calculation of the digital differential analyzer should be D(x - 1, y + 1), no? I wrote it out and it gets the same result as D(x - 1, y) which is 4x - 4 so technically the resulting circle is the same.
Oh, right. Should had read the code before asking, thank you for the answer. Also the result is not the same which makes more sense, it was a mistake on my part.
Right, it will be different because it is trying to do *both* DDA steps in a single step, which of course we could do if we wanted to make the algorithm have an "else" clause, but I don't know that there is an efficiency win there. I haven't played with other formulations yet, since I was even trying to find one in the first place, I just accidentally did :)
Difficult to express in words how blown away I am. It was magical to see all those insights and how everything ended up.
Just wanted to say that I thought this particular problem overview was incredibly insightful. The digital differential analyzer was new to me, and drastically simplified the algorithm. I loved your presentation and willingness to be clear and thorough. Thank you!
Here's a version that produces an image that you can image with most image viewer: https://paste.sr.ht/~tomleb/0db9f7bf1b650670299ac3de1997dc2b15eb7000.
Run something like this: make main && ./main > circle.ppm && xdg-open circle.ppm
The PPM format is incredibly simple to produce for this kind of thing. For example, the letter J looks like this (taken from wikipedia):
P1
6 10
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 0
1 0 0 0 1 0
0 1 1 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Amazing video! I didn't even realize an hour had passed until I reached the end. I was glued to the screen the entire time, except for a few moments when I paused to think for a bit.
Makes me wonder if the demoscene folks on the amiga did their circles like this! Always wondered how they could do that!
Amazing video as always. That hour went by fast!
Since I have never seen this particular formulation before, I don't know if any demoscene people would have been using it exactly... but they would likely have been using some kind of related DDA-style technique, like one of the more common formulations (Bresenham circle, midpoint circle algorithm, etc.)
- Casey
That was one of the effects i never figured out :)
Example here: https://youtu.be/DQJa2xsQSaw?t=133
It amazed me with that 68000 at 7,14 Mhz. Got to try it now that i've got the algorithm!
Those are filled circles, though, so it's going to be a slightly different situation. Also it depends how you wanted to draw them - one way is to use the CPU, if that was your goal, and another way is to use the blitter. One could also imagine using both, filling different parts of the circle with each.
Actually, in a demo, I could even imagine using the copper list for such things as well (to fill solid interiors on 8-pixel boundaries), but I haven't programmed an Amiga in 30 years so I would have to go digging through the manuals to relearn the specifics of each of these things :)
- Casey
Probably adjust the algorithm to only draw one start and one end point on each row and use the blitters line draw mode! (http://amiga.nvg.org/amiga/reference/Hardware_Manual_guide/node0122.html)
The copper had too low x resolution iirc? As you said, it was 30 years ago :)
It has low X resolution, but the CPU and/or blitter are also in play. So my thinking was, perhaps you fill just the edge pixels to cover the 8-byte boundary (or whatever it is), and then you put a copper list entry in to fill the interior. I mean if you wanted a "maximum speed circle", I could image using all three chips... kinda makes me what to try this myself :)
- Casey
Great video and great write up! Very much enjoying the course. Thanks!
A bit of unrelated question to the topic. The videos on your yt channel have the comments disabled. Is there a reason for that? I'd assume that a large portion of the comments would be unhelpful if not actively counterproductive, but wouldn't they also invite at least some valuable and helpful discussion? I would be interested to hear your thoughts on that.
I really love problems like this where some up-front math on the blackboard can simplify the algorithm a lot, I have never seen or thought about this way of drawing circles before, definitely interesting. Could you comment on how this algorithm would perform today on modern hardware, should I consider going for an approach like this if filling out a bitmap on the CPU-side (not talking about a shader actually doing this)? I guess the explicit computation using sines and cosines and floating point arithmetic is really fast today and would be easier to implement with SIMD and that the branching version of the algorithm you proposed would suffer due to the branch predictor guessing wrong a substantial amount of times?
Scatter-write algorithms like this don't tend to be good choices on modern CPUs which do not have high scatter throughput, so it is unlikely that you would ever want to use DDA-style algorithms for drawing anything on a CPU today.
- Casey
I was a bit confused by your explanation for case 2 in your insight, but after thinking about it more I think its because you misspoke slightly. I think the point is not that we turn the right side slightly positive so that its even more larger than the left side, but rather that by negating the left side it goes from very negative to very positive and by not touching the right side it remains slightly negative. This means that the above <? comparison is false so we choose the right point.
Yes - I will go back and listen to the video but I may well have said that totally wrong.
- Casey
I agree, I think case 2 is wrong. But depending on what problem you're solving, it may not matter. In the end, we see that Casey uses integer coordinates for the grid of pixels, in this case, it looks like we're always in case 1 or 3. To have case 2, we would have to use floating-point coordinates, and we would have to start a step with a coordinate that is somewhere on the right side of a pixel.
I have updated the video. For the hopefully-more-clear-and-convincing explanation, here is a link to the correct timestamp: https://youtu.be/JtgQJT08J1g?t=1858.
- Casey
Actually, I am fairly confident that all the cases work. I have recorded a patch where I don't screw up the explanation, and will replace the video. Hopefully the new explanation will be convincing :)
- Casey
It does sounds more convincing to me!
Thanks a lot, I understand with the new explanation.
That's also what Ivy above explained incidentally, I just didn't get it the first time. The update helped!
I loved it, your explanation was perfect.
Something pops in my mind though:
I presume a lot of old source code were doing this type of computation, whereby looking at the code, you wouldn't be able to grasp what was in the programmer's mind. My question is,
"If efficiency is mandatory, and you have to make the algorithm reduced to
the delta mindset, like here. How would you make it more readable?"
Is it by writing in a comment the mathematical formula? By telling each steps you've made in this particular problem?
---
Axel
Yes, when I do stuff like this that is _very_ complicated (I don't consider DDAs to be complicated, because they're common in graphics), I typically have a giant multi-page comment at the top of the file that explains the entire algorithm and how it was derived. This is as much for myself as for other people, because subtle things are easy to forget, even if you were the person who figured them out in the first place :)
As an example, the bspline solver I wrote at RAD for the character animation system begins with around 200 lines of comments explaining how it works and how the notation in the file maps to the mathematical representation of the operation.
- Casey
Great explanation, thank you!
A sidenote, if you don't do analysis of 3 cases and just squre both parts of equation you end up with the same result.
Can you upload that derivation somewhere so I can look at it?
- Casey
That's how I did it
https://raw.githubusercontent.com/pankkor/pap/master/src/interview1994/draw_circle_derivation.jpeg
We still have to evaluate sign of (1-2x) when we divide by it. Since we are dealing with the first quadrant x > 0, hence 1-2x < 0.
That is cool - I like the idea of using the knowledge that X is 1 or higher to know the effect of the division.
- Casey
Just for clarification: Is this algorithm pixel accurate? Because I don't see any 0.5 bias / offset in the calculations. For example, a circle with radius 0 (point) would start at x=0.5 and with radius 5 would start at x=5.5. Then to continue with integer calculations, everything is multiplied by 2 or so.
Thanks again, Casey, for showing us this simple algorithm that looked difficult at first. As always, very entertaining and educational to watch your videos. We need more professional programmers like you to teach us *good* and *performance-aware* programming.
Although I have not analyzed it extensively, I believe the algorithm is pixel-accurate. But it treats pixel "centers" as being the whole integer coordinates (as was traditional for fast integer rasterization, I believe). If you would like pixel centers to be at {+0.5, +0.5} instead, you would have to rework it.
- Casey
Hi Casey, thank you for the really nice derivation. It was very good to follow.
I have played around a bit with your version an compared its performance to the one I use in a toy rasterizer. (Found here: http://members.chello.at/~easyfilter/bresenham.html)
Interestingly enough, your version starts to become slower around a 200 pixel radius. I don't really understand why, given the branching and more arithmetic operations in the other one.
Does it have to do with it plotting only 4 Quadrants while using more iterations (Less memory accesses)?
Keep up the amazing work!
OK, should have tested better.
I timed the functions first while they were drawing to a buffer which got displayed via ResizeDIBSection in a loop. (With processing window messages, clearing the buffer)
Now I wrote a minimal version in which both versions just draw 10000 circles to a buffer.
In this version your variant is consistently 30 -50 % faster.
I guess I don't really understand the implications of the windows stuff regarding the execution of these functions.
I am happy to look through any of this stuff if you're curious (we can control for caching and analyze the ASM and see what's going on in there, etc.), but, as I mentioned in the video, modern machines aren't designed to operate this way. They really really REALLY hate things like repeated scatter-writes, which is what DDA-style algorithms do. So in general, perf-profiling this sort of thing on a modern machine is not useful, because you will always totally destroy this kind of algorithm by using a tiled batched rasterizer instead (eg., one that works on, say, 64x64 squares and rasterizes all shapes into that square before moving on to the next square).
- Casey
Thanks for your comment.
In my programming journey I tried to work through things
in a way tied to the evolution of the technology. For me this makes understanding the inner workings of the system much easier, since the older ones are a lot less complex. (With the 8086 part of the series I felt right at home :) ). So after toying around a lot with asm on dos, I kind of need a "smooth" transition to the complexity of modern, highly parallelized systems. In my understanding that is what you will cover mostly in the future of the series, so no worries there.
I guess it's time to go back to the future...
That's what we're getting into in Part 3, in fact, so the first part of that is what we're doing for the next few weeks.
- Casey
Wow, this is crazy cool. You did accidentally write a "-4" instead of "4" for F's delta at the very end, haha. How often do you find problems can be simplified by this "DDA" approach?
Not sure what you mean... -4 is correct? It is not 4.
- Casey
Argh, right you are. I got the subtraction order confused.
Great video, thank you. I came here because I wanted to ask this one question:
Before you cleared the board after you moved out the constants into C, why didn't you divide by 2?
That would make it -x^2 +x -y^2 + C < 0 with C now r^2+ 1/2
with all the further simplifications -2y-1 and so on?
Since we have to use integer registers, we can never divide our equations unless everything is still an integer afterward. So the 1/2 that you've got there is the thing that stops us.
- Casey
Wow, this is just the right course, thank you for doing this. I hope you do not think I am playing the wise guy here.
My first thought was does it really matter, couldn't we just drop the one-half? It is really small against the r^2. Taking a small nap and thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula:
-(x^2 + y^2) +x + C < 0
-r^2 +x +C < 0
now the r^2 cancel out.
x + 1/2 < 0
or if you don't like the 1/2 2x + 1 < 0.
We do not even need r and y. All depends on x which makes sense since we only calculate the first slice.
What do you think? Did I make a mistake here?
I don't think I'm quite following what you are trying to do here... what is the goal of this simplification?
- Casey
I am not sure, this seems extremely simple. So I am trying to code this and try if it actually works. Unfortunately I am really bad at C++. I will try in Rust which I am more comfortable with, though this might also be a good opportunity to get familiar with C++ a little. I try to summarise again, provided I understood this right -2x^2 +2x -2y^2 + C < 0 with C = 2r^2 + 1 is what we try to evaluate to determine if we take the right or the left pixel.
Now we know that x^2 + y^2 = r^2 so we can replace this which leads to
2x + 1 < 0
So it should be sufficient to evaluate 2x + 1 < 0 to make a decision.
But, having said the above, I am new here and don’t want to irritate anyone. I will try to code this and see what happens.
One thing that I cannot grasp right now, is, if the simplification is too much because of the assumptions taken on the absolute value. But coding this should settle it, so give me some time to see if it actually works.
But that is only true if you have never moved from the initial point. X only equals R until you change X. Once you change X, that identity won't hold anymore. So I don't think you can do what you're doing here?
Another way to say this is, obviously the decision depends on both X and Y. If it didn't, it couldn't be a circle. So removing Y from the equation entirely means you've definitely done a substitution which only works for one particular Y value.
- Casey
This is math, so I am rather week here and I might be totally wrong. But my thought was the circle is x^2 + y^2 = r^2. r is a constant and if I fix x y is determined. Well not fully, since y = sqrt(r^2 - x^2) and there a two solutions, but we got this covered via the genius symmetry you apply. I am currently on my mobile, so I cannot code this right now, and it is rather late here in the UK, but once at home I will try to code it, and see if it works.
> thinking about what values we can have for x and y and that they are actually r*cos(x) and r*sin(y) and I thought since they are squared they are always 1, leading to just replacing the implicit formula
The point of the implicit formula is to *test* if the (x, y) coordinate is on the circle (or is outside or inside of it). Your substitution *assumes* (x, y) is on the circle, so isn't valid here.
Indeed, the simplification leads to a square, even dividing by two already deviates a lot from the actual circle.
Here is a working code example in Rust. Both ideas were total failures, so at least the code should be useful:
```
fn main() {
let mut bitmap = [[0u8; 64]; 64];
// NOTE(casey): Center and radius of the circle
let cx: i32 = 32;
let cy: i32 = 32;
let r = 20;
// NOTE(casey): Loop that draws the circle
{
let r2 = r * 2;
let mut x: i32 = r;
let mut y: i32 = 0;
let mut dy = -2;
let mut dx = r2 * 2 - 4;
let mut d = r2 - 1;
while y <= x {
bitmap[(cy - y) as usize][(cx - x) as usize] = 1;
bitmap[(cy - y) as usize][(cx + x) as usize] = 1;
bitmap[(cy + y) as usize][(cx - x) as usize] = 1;
bitmap[(cy + y) as usize][(cx + x) as usize] = 1;
bitmap[(cy - x) as usize][(cx - y) as usize] = 1;
bitmap[(cy - x) as usize][(cx + y) as usize] = 1;
bitmap[(cy + x) as usize][(cx - y) as usize] = 1;
bitmap[(cy + x) as usize][(cx + y) as usize] = 1;
d += dy;
dy -= 4;
y += 1;
// NOTE(casey): Branchless version
let mask = d >> 31;
d += dx & mask;
dx -= 4 & mask;
x += mask;
}
}
// NOTE(casey): Output the bitmap using roughly square elements
for row in &bitmap {
for &cell in row {
let (l, r) = if cell == 1 { ('[', ']') } else { (' ', ' ') };
print!("{}{}", l, r);
}
println!();
}
}
```
I kept Casey's comments so comparison is easier.
For the case where we go up and left, the calculation of the digital differential analyzer should be D(x - 1, y + 1), no? I wrote it out and it gets the same result as D(x - 1, y) which is 4x - 4 so technically the resulting circle is the same.
Not in this case, because we have already advanced D by the Y step, so we are only accounting for the X step.
- Casey
Oh, right. Should had read the code before asking, thank you for the answer. Also the result is not the same which makes more sense, it was a mistake on my part.
Right, it will be different because it is trying to do *both* DDA steps in a single step, which of course we could do if we wanted to make the algorithm have an "else" clause, but I don't know that there is an efficiency win there. I haven't played with other formulations yet, since I was even trying to find one in the first place, I just accidentally did :)
- Casey