I am new in CUDA programming I would like to know how to write a kernel that compute the average of array with 32 elements ?
C++Programming TheoryProgramming
Last Comment
sarabande
8/22/2022 - Mon
Aaeshah
ASKER
I edit the question to be more direct !!!
gheist
Please look into CUDA SDK samples.
Adding 32 elements of a vector should present no challenge for 12MHz 8086. You will spend much more time on initializing CUDA than operation takes inlined on normal CPU.
You may try C++ offload, that should confirm indeed that CUDA is operational.