#PoCL 4.0 @openclapi Implementation Released With @IntelGraphics #oneAPI Level Zero Driver
https://www.phoronix.com/news/PoCL-4.0-Released
Original tweet : https://twitter.com/phoronix/status/1671820647540957185
🧵9/9
The source code for the experimental @FluidX3D P2P is available in this branch on #GitHub: https://github.com/ProjectPhysX/FluidX3D/tree/experimental-p2p
The PR for #PoCL with cudaMemcpy is available here: https://github.com/pocl/pocl/pull/1189
Credit and many thanks to Jan Solanti from Tampere University for visiting me at University of Bayreuth and testing this together with me, in his endeavour to implement/optimize #PoCL-Remote.
Thanks to @ShmarvDogg for testing P2P mode on his 2x A770 16GB "bigboi" PC!
🧵8/9
When running #FluidX3D with the #CUDA backend of #PoCL + P2P cudaMemcpy, performance is 40% faster compared to #OpenCL PCIe copy over CPU memory. PoCL's P2P backend is >3x faster than Nvidias own runtime here. This is the perf delta #Nvidia are giving up on.
🧵3/9
#fluidx3d #cuda #pocl #opencl #nvidia
#PoCL 3.1 Released - Improved #SPIRV For CPU & #NVIDIA CUDA Drivers, WIP @VulkanAPI Driver
https://www.phoronix.com/news/PoCL-3.1-Released
Original tweet : https://twitter.com/phoronix/status/1599755854789562368
#PoCL 3.1-RC1 Released With Improved SPIR-V Support For CPU & CUDA Drivers, @VulkanAPI WIP
-- This "portable OpenCL" implementation continues improving.
https://www.phoronix.com/news/PoCL-3.1-RC1-Released
Original tweet : https://twitter.com/phoronix/status/1595366799540994049
#OpenCL device fission / device partition works fine on #CPU with #PoCL (I can get it to use only 15 of my 16 cores/threads if I like) but on #AMD #GPU the call to clCreateSubDevices just returns "invalid value", which I guess means "not supported". I was hoping to leave 1 compute unit free in the hope that it wouldn't make my desktop environment completely unusable for the duration of the computations.
#howto get #OpenCL 1.2 on #debian #buster with #amdgpu :
- make sure your system is up to date https://github.com/RadeonOpenCompute/ROCm#first-make-sure-your-system-is-up-to-date
- add the rocm apt repository https://github.com/RadeonOpenCompute/ROCm#add-the-rocm-apt-repository
- install rocm-opencl-dev (using upstream kernel drivers) https://github.com/RadeonOpenCompute/ROCm#using-debian-based-rocm-with-upstream-kernel-drivers
- do NOT try to mess with anything dkms, it won't work
- purge mesa-opencl-icd and pocl-opencl-icd, they get in the way and stop the amdgpu icd from loading correctly(*)
#mesa #clover #GPU implementation doesn't go as high as OpenCL version 1.2, and #pocl is #CPU only, thus usually slower. There was a proprietary #amd CPU-based OpenCL implementation that I found in some random backports repository once, but in my test it was very slow, and it got uninstalled during my tinkering
tested on RX 580 GPU with Ryzen 2700X CPU, don't know about other hardware, maybe check for support online or just try it
I used Fractorium for testing, it needs OpenCL >= 1.2.
(*) works for me, your mileage may vary
#howto #opencl #debian #buster #amdgpu #mesa #clover #gpu #pocl #cpu #amd