Posted by mpreda 10/26/2024
Tell HN: GpuOwl/PRPLL, GPU software used to find the largest prime number
Feel free to ask questions about technical aspects of the GpuOwl implementation, about optimizations, tricks, efficient FFT implementation on GPUs etc. Or anything else.
[1] GpuOwl: https://github.com/preda/gpuowl [2] GIMPS: https://www.mersenne.org/
OTOH CUDA only works on Nvidia, and that's a major limitation.
GpuOwl uses heavily FP64 ("double" floating point), and FP64 is more readily available at consumer prices on AMD GPUs. We (the GIMPS project) use a lot of Radeon VII and Radeon Pro VII GPUs, which have great FP64 at a cheap price (I am personally running 8x Radeon Pro VII that I bought new for about $300 a piece).
So you see, for us AMD GPUs are the first citizen. Of course I want to support Nvidia GPUs as well, and OpenCL allows that. Luke Durant did run GpuOwl on a lot of Nvidia GPUs in the cloud, and I'm happy GpuOwl did work well for him on Nvidia.
Nvidia A100 GPU which was used to find a new Mersenne prime has specialized dedicated hardware like tensor cores, which on A100 can work not only for FP16 and FP32 but also for FP64. Are there any benefits of utilizing this capabilities?
And if the GPU provides some sort of matrix-multiplication on FP64, that we're not currently making use of -- clearly that would be a big opportunity.
But somebody needs to implement it, profile, test.. on some HW.
I thought though that prospective HPC users have more Nvidia A100 and H100 in mind when buying hardware.
But just to set it straight, GpuOwl received exactly $0 contributions or sponsoring from exactly nobody. It's a pleasure work from my side, and it's open sourced for the easy access of curious minds to the algorithms and techniques implemented. I did receive great help, in the form of source-code contributions, most importantly from George Woltman.