| ▲ | the_harpia_io 7 hours ago | |||||||||||||||||||||||||
honestly the AMD-first bit surprised me - usually ROCm support is an afterthought or just broken outright. curious about BVH traversal specifically. dynamic dispatch patterns across GPU backends can get weird fast. did KernelAbstractions hold up there or were there vendor-specific fallbacks needed for the heavier acceleration structure work? | ||||||||||||||||||||||||||
| ▲ | simondanisch 6 hours ago | parent [-] | |||||||||||||||||||||||||
Well I'm a bit of an AMD "fanboy" and really dislike NVIDIA's vendor lock in. I'm not sure what you mean by dynamic dispatch across GPU backends - nothing should be dynamic there and most easier primitives map quite nicely between vendors (e.g. local memory, work groups etc). To be honest, the BVH/TLAS has been pretty simple in comparison to the wavefront infrastructure. We haven't done anything fancy yet, but the performance is still really good. I'm sure there are still lots of things we can do to improve performance, but right now I've concentrated on getting something usable out. Right now, we're mostly matching pbrt-v4 performance, but I couldn't compare to their NVIDIA only GPU acceleration without an NVIDIA gpu. I can just say that the performance is MUCH better than what I initially aimed for and it feels equally usable as some of the state of the art renderers I've been using. A 1:1 comparison is still missing though, since it's not easy to do a good comparison without comparing apples to oranges (already mapping materials and light types from one render to another is not trivial). | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||