▲ | adgjlsfhk1 11 hours ago | |
for some, you can cheat a little (e.g. partial sum reduction), but then you don't get to share IP with the rest of your ALUs. I do really want to see what an optimal 32 wide reduction and circuit looks like. For integers, you pretty clearly can do much better than pairwise reduction. float sounds tricky |