So even if CUDA is faster in a speed test, I personally find its not suitable for day-to-day work, at least on my system. Doing work in this way allows warps to be swapped out as they request data to be loaded from global memory so that the addition/multiplication instructions can be performed as soon as the data is ready, pipelining everything in a way. If I switch to OpenCL, I generally get better real-time performance, and far less issues and crashes. 4 loads per thread means that each thread works on 4 elements, with each element being strided by the width of the block. 128 threads per block obviously creates more blocks more blocks is essential to maximizing the GPUs performance as they fully occupy all the streaming multiprocessors (SMs). It gives a lot of detailed information when you open the links for each piece of hardware. This is a link on previous sorting algorithms test. I tried a bunch of different parameters, but the main thing was that 128 threads per block with 4 loads in each thread was clearly the winner in terms of performance overall. This did lead me to another interesting link people here might find interesting: an openCL-benchmark database that compares the compute performance between all the different hardware including AMD CPUs and the new Intel cores that support openCL. CUDA vs OpenCL vs SPU Part IV Finally Ive got radix sort implementation which is working on AMD OpenCL. After that, CUDA roughly outperforms Eigen by an order of magnitude. The main takeaway here is that CUDA equals the performance of Eigen starting at vectors of length 100,000 and above. SAXPY BenchmarkĬUDA 1 Vector SAXPY with 2 loads/thread 128 threadsĬUDA 1 Vector SAXPY with 4 loads/thread 128 threadsĬUDA 2 Vector SAXPY with 2 loads/thread 128 threadsĬUDA 2 Vector SAXPY with 4 loads/thread 128 threadsĬUDA 1 Vector SAXPY with 2 loads/thread 256 threadsĬUDA 1 Vector SAXPY with 4 loads/thread 256 threadsĬUDA 2 Vector SAXPY with 2 loads/thread 256 threadsĬUDA 2 Vector SAXPY with 4 loads/thread 256 threadsĬUDA 1 Vector SAXPY with 2 loads/thread 512 threadsĬUDA 1 Vector SAXPY with 4 loads/thread 512 threadsĬUDA 2 Vector SAXPY with 2 loads/thread 512 threadsĬUDA 2 Vector SAXPY with 4 loads/thread 512 threadsĬUDA 1 Vector SAXPY with 2 loads/thread 1024 threadsĬUDA 1 Vector SAXPY with 4 loads/thread 1024 threadsĬUDA 2 Vector SAXPY with 2 loads/thread 1024 threadsĬUDA 2 Vector SAXPY with 4 loads/thread 1024 threads The other main operation is a dot product, which is sum (X Y) where X and Y are arrays using element-wise array. Those two operations are the SAXPY operation, which is Y a X + Y where X, Y are vectors and a is a scalar. OpenGL is a graphics LIBRARY which is used for visualization, gaming and etc., where as CUDA is a programming language which could be utilized for general purpose compute intensive tasks such as image processing, machine learning. Like the FoldingHome cores being executed by hundreds of thousands of donors across the world to solve hard problems in protein dynamics, FAHBench is built on the molecular dynamics engine OpenMM. The other main operation is a dot product, which is sum(X * Y) where X and Y are arrays using element-wise array multiplication. Here are some benchmarking notes on CUDA vs the Eigen Library on the two most common operations in my research. FAHBench is the official FoldingHome benchmark. Those two operations are the SAXPY operation, which is Y = a * X + Y where X, Y are vectors and a is a scalar. The Tahiti GPUs (7950, 7970 & 7990) also have significantly more powerful FP64 performance than any 600/700 series card or any other 7000 series card, this translates into significantly better OpenCL & compute performance.Here are some benchmarking notes on CUDA vs the Eigen Library on the two most common operations in my research. I might just add that everyone needs to keep in mind that I am coming from a GeForce GT 430 w/1GB of DDR3 vRAM.Īlso keep in mind that a 7950 is significantly faster than a 7870 & a 760, you also get 3GB of VRAM instead of two, meaning the card will last you longer before you come across games that require more than 3GB. I'm not sure that I want to step quite considerably over my budget. Luckily Adobe seems to be on the ball with adding better OpenCL support. As you mentioned, its mainly used for AEs raytracing renderer, which Ive never really used. I already know about performance in gaming benchmark, value, p. This discussion is about OpenCL vs Cuda for CS6 programs and general for PS, video-editing and 3D rendering. I do like to know how comparison with a GTX 6xx and an AMD 7xxx. I kind of guessed that but I'd be looking at like roughly AU$40+ more to go with the 7950. The use of CUDA within After Effects is greatly exaggerated. First of all, My English is not very well please bear with me. The FFT single-precision test was also noticeably much faster with CUDA.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |