I need help in finalizing my CUDA based code
$15-25 USD / 時間
I am trying to solve this problem where I am trying to combine all separate functions into one big function call. There are now 3 function calls that need to be one. At this point, the code works good and I am getting the correct result but with merging With the second function called "GPUmatmul_nosqr_portion2_real" into the function "GPUmatmul_element". the code gives the wrong answer. From my debugging, after the function call "GPUmatmul_element" the next function has the issue of accessing the memory as if I debug, I cannot access the values of the first function in the immediate widow as it shows. However, when I use the cudaDeviceSynchronize(); before the second function call, I can access the values. But, interestingly, even though in the debugging mode, I cannot access the value, the whole existing code gives the correct answer without issue. The issue only starts when I tried merging the second function into the first function. This tells me that there is probably an issue with synchronization. I have tried using __syncthread(); but that didn't have any impact.
I would like to request some expert help and advice on how this can be avoided and how I can successfully merge all functions together. I am using Windows 10 and Visual Studio 2017.
I would really appreciate it if someone can please take a look and help me.
プロジェクトID: #33125272
プロジェクトについて
3 人のフリーランサーが、平均$35/時間で、この仕事に入札しています。
Very good understanding about cuda and experience in cuda. I have ported various image processing algorithms into nvidia hardware using CUDA