one is a build failure against my GPU (already reported, with a fix ready and pending release), and the other is … slow performance in one of the #Thrust API calls that we use!
Turns out, `sort_by_key`, at least in the way we use it, is somewhere between 25% and 30% slower on my #AMD iGPU when using the latest #rocThrust (from the 5.6.0 software stack) than it is on the *CPU* when using the latest #Thrust with the OpenMP backend!
So, one of the reasons why we could implement the #HIP backend easily in #GPUSPH is that #AMD provides #ROCm drop-in replacement for much of the #NVIDIA #CUDA libraries, including #rocThrust, which (as I mentioned in the other thread) is a fork of #Thrust with a #HIP/#ROCm backend.
This is good as it reduces porting effort, *but* it also means you have to trust the quality of the provided implementation.
#thrust #rocthrust #CUDA #nvidia #ROCm #amd #GPUSPH #hip