Part of the issue is that the presentation is more of a showcase of what #GPUSPH can do and how, so it's hard to find a classic bullet-point synthesis of the thing. I could go with «… and this is why it's awesome» but I'm not sure the audience has the sense of humor to take that the right way.
I'm sitting here trying to finish the presentation on #GPUSPH to be presented at #SIMAI2023 next week, and while the thing is “done” overall, I can't think of anything to put on the #conclusions (final slide). I'm stymied.
#conclusions #simai2023 #GPUSPH
(That being said, if anyone wants to implement a sort-by-key and segmented reduction that don't depend on Thrust, and contribute it to #GPUSPH, I'm not going to complain.)
So, one of the reasons why we could implement the #HIP backend easily in #GPUSPH is that #AMD provides #ROCm drop-in replacement for much of the #NVIDIA #CUDA libraries, including #rocThrust, which (as I mentioned in the other thread) is a fork of #Thrust with a #HIP/#ROCm backend.
This is good as it reduces porting effort, *but* it also means you have to trust the quality of the provided implementation.
#thrust #rocthrust #CUDA #nvidia #ROCm #amd #GPUSPH #hip
For comparison, with #GPUSPH the discrete GPU is over 50× faster than the CPU, and that's on the *low* side of things, actually, due to many kernels being memory-bound rather than compute-bound, and no optimization attempts having been made yet to run on this hardware.
But, and this is where things get surprising, the performance of the iGPU *drops*, failing to even get 2× over the CPU.
Why would something more *intense* have lower performance ratio?
For reference, I'm testing this hardware primarily with two pieces of software: one is an internal cellular automaton model that we use for the assessment of lava flow invasion hazard, and the other is the #FLOSS #GPUSPH I've already talked about. These two codebases are *very* different, and it's interesting to se how their differences impact the performance ratios I'm observing across the available hardware.
I've been horribly busy these days with lots of trivial but time-consuming bureaucratic stuff, to the point I've been unable to work on #GPUSPH at all. Worse, I haven't even started working on my presentation for #SPHERIC2023 (the material is ready, since the article for the proceedings has been submitted already, so it's really just a matter of building the presentation)
OK now I need a way to take a video of #GPUSPH in action on the #SteamDeck. I should probably ask @gamingonlinux for recommendations, but my understanding from his YT channel is that he uses an external camera rather than some kind of built-in screen recording capability.
Making progress with #AMD #ROCm on the #SteamDeck for #GPUSPH. One does need to install the @archlinux community repo, to get the packages, but that's relatively painless. However:
Auto-detected GCN arch gfx1033 with flag 0x97ff (AMD Custom GPU 0405)
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr39 = V_MOV_B32_dpp undef $vgpr39(tied-def 0), killed $vgpr52, 322, 15, 15, 0, implicit $exec
I'll have to look into this.
(This minithread brought to you by the need to add more validation test cases for #GPUSPH.)
Linearized models go a long way to describe the behavior of the wave, but some nonlinear effects can only be captured by a full 3D model, which in our case is #GPUSPH (I'm sure nobody that follows me is surprised by that ;-)).
If anybody is interested in discussing the findings, we've opened a topic on the GPUSPH Discourse forum at
https://gpusph.discourse.group/t/new-published-work-uses-gpusph-to-simulate-waves-overtopping-on-offshore-platforms/183
Why am I so interested in #SYCL for #GPUSPH? (See also my nudge nudge wink wink at @sri <https://fediscience.org/@giuseppebilotta/109942222462252885>)
Because SYCL is today the best bet we have at a unified #GPGPU API, and introducing such a backend would have great potential for our aims of “universal hardware support”.
Here's another interesting thing about #GPUSPH on #Android: currently, it uses the #CPU backend, because the only #GPU backends available in #GPUSPH are for #NVIDIA #CUDA and #AMD through #HIP. I've actually looked at adding #SYCL support too, but there are a few structural issues that currently prevent a solution as straightforward as the one used to implement CPU and HIP support.
#sycl #hip #amd #CUDA #nvidia #gpu #cpu #android #GPUSPH
I think I should create an actual account for the #GPUSPH project on the Fediverse.
I'm so ridiculously happy about this #GPUSPH on #Android thing, even if it's of no real use —nobody is going to run a serious #CFD simulation on their cellphone's CPU and get meaningful results in a reasonable time. It's really just for #nerdCred, even if not as #nerdCred as #GNU #Hurd support
https://fediscience.org/@giuseppebilotta/108922484526085684
#hurd #gnu #nerdcred #cfd #android #GPUSPH
between changes at the operating system level, changes in the Termux toolchain and changes in the code itself, I cannot really claim support for #Android in #GPUSPH: the program _does_ compile and run, but the results are completely bogus!
Worse, trying to debug the issue in _any_ way results in the weirdest undebuggable segmentation faults I've ever seen in my life. Now, it's definitely possible that this is just a matter of unreliability of the stuff in Termux, but it's still quite frustratng.
One of the first thing I tried after implementing support for the CPU backend in #GPUSPH was to try and port the software to #Android, “cheating” by setting up a build environment instead Termux to build and run it as a command-line application there.
It actually worked, that had me seriously thrilled about the thing.
However, the situation has changed now, and not for the best:
@anteru relevant issues concerning the (lack of) information are
https://github.com/RadeonOpenCompute/ROCm/issues/1714
and
https://github.com/RadeonOpenCompute/ROCm/pull/1738
This information is important to have, as accurate as possible, also to support developers that wish to advertise #AMD support. For example, we recently introduced #HIP support in #GPUSPH on a private branch, but we're not sure if we can announce it on the next public release without hitting significant support issues.
LOL, I was even able to make #GPUSPH buildable and runnable on #GNU #Hurd (just need to fence off a couple of unavailable headers and functions). Not sure how useful an #HPC program is on an OS that doesn't even fully support 64-bit processors though. I confess to having done this purely for #nerd credits.
The reason I decided to do this isn't so much to expand OS support in #GPUSPH (although that's always a nice bonus), but as a #learning opportunity for myself. I may do a write-up on this some time in the future.
(FWIW, so far the #FreeBSD experience “feels” way more friendly than that of other #BSD OSes.)
#bsd #freebsd #learning #GPUSPH