i'm very happy you've started a longer-form, text-based, programming-centric, habit of communication, instead of making me parse your twitter threads. thanks, casey!
Found this interesting video that talks about an actual implementation (in hardware, as far as I can tell), and about exactly this problem (timestamp link): https://youtu.be/WzID6kk8RNs?t=567
The presentation is by Roger Espasa, from what I saw on the mailing list he's the co-chair of the RISC-V Vector work group / committee / thingy. He does mention the extra wiring, connecting the lanes, and a lot of complication needed for an out of order core, but it's hard (for me) to gauge the actual complexity from the presentation (like, did they work for 3 years on just this problem or was it work as usual)
Had watched this before but not closely so I didn't remember that Roger had discussed this exact issue in that talk, thank you for linking with timestamp! It sounds like for an OoO core specifically (which you want for perf) the problem is worse than the basic single shadow copy Casey talks about here as you need a whole bunch of shadow copies in that case... yikes!
i'm very happy you've started a longer-form, text-based, programming-centric, habit of communication, instead of making me parse your twitter threads. thanks, casey!
As one who goes to great lengths to avoid Twitter, I concur!
Found this interesting video that talks about an actual implementation (in hardware, as far as I can tell), and about exactly this problem (timestamp link): https://youtu.be/WzID6kk8RNs?t=567
The presentation is by Roger Espasa, from what I saw on the mailing list he's the co-chair of the RISC-V Vector work group / committee / thingy. He does mention the extra wiring, connecting the lanes, and a lot of complication needed for an out of order core, but it's hard (for me) to gauge the actual complexity from the presentation (like, did they work for 3 years on just this problem or was it work as usual)
Had watched this before but not closely so I didn't remember that Roger had discussed this exact issue in that talk, thank you for linking with timestamp! It sounds like for an OoO core specifically (which you want for perf) the problem is worse than the basic single shadow copy Casey talks about here as you need a whole bunch of shadow copies in that case... yikes!