I have a question. When writing a shader you can use components of a vector in any order (i just found out it is called swizzling https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math). So you can do v.xyz or v.zxy or v.wwxx, or anything, and i heard it is considered a "free" operation. Does it mean that there are wirings for all ordering permutation? It seems like too much it's like 4 + 4^2 + 4^3 + 4^4 = 340, wirings required just for reads, and we have to multiply it by allowed writes. And if it's not that, how would that work? Does it slowdown a regular read when it don't have to do any shifts?
Had to zoom in and out a couple of times on the Zen die shot in confusion about the tiling pattern and core count; and then just trusted it until it got to binning. Tense storytelling!
Thanks for your continued efforts in making us all performance aware! :D
Before starting this this course, unlike for CPU architecture, I have studied GPUs quite extensively, though I'd still consider myself a very junior programmer, so it'll be very interesting to fill in any gaps in knowledge I have about this.
Already I see amazing die shots and diagrams in the images as I was scrolling past. I'm super excited to dig into this.
Wow this is a deep one - fascinating about binning! Wouldn't of thought there was a way to make use of chips with defects..
It is even more extensive than what is covered in the video, too. I only touched on the basics, since it's not really a "cores" topic!
- Casey
Can also recommend this incredible video on how chips are made: https://youtu.be/dX9CGRZwD-w
Thank you for the video!
I have a question. When writing a shader you can use components of a vector in any order (i just found out it is called swizzling https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math). So you can do v.xyz or v.zxy or v.wwxx, or anything, and i heard it is considered a "free" operation. Does it mean that there are wirings for all ordering permutation? It seems like too much it's like 4 + 4^2 + 4^3 + 4^4 = 340, wirings required just for reads, and we have to multiply it by allowed writes. And if it's not that, how would that work? Does it slowdown a regular read when it don't have to do any shifts?
Had to zoom in and out a couple of times on the Zen die shot in confusion about the tiling pattern and core count; and then just trusted it until it got to binning. Tense storytelling!
If you think the AD102 is bad, you should see NVIDIA's previous generations! The patterns are way crazier.
- Casey
Ow this makes me happy :)
Thanks for your continued efforts in making us all performance aware! :D
Before starting this this course, unlike for CPU architecture, I have studied GPUs quite extensively, though I'd still consider myself a very junior programmer, so it'll be very interesting to fill in any gaps in knowledge I have about this.
Already I see amazing die shots and diagrams in the images as I was scrolling past. I'm super excited to dig into this.
Here we go...