Having problem assigning a device variable in CUDA

Question

I'm having trouble trying to assign a value to a device variable and then copying this to a host variable.

I start with d_test and h_test = 0.0. I have a simple kernel to set the device variable, d_test, to 1.0. I then copy this to the host variable h_test and print. The problem is that when I print I get h_test = 0.0. What am I doing wrong? Here's the code:

// -*- mode: C -*-
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>

// device variable and kernel
__device__ float d_test;
__global__ void kernel1(float d_test) { d_test = 1.0; }


int main() {

  // initialise variables
  float h_test = 0.0;
  cudaMemset(&d_test,0,sizeof(float));

  // invoke kernel
  kernel1 <<<1,1>>> (d_test);

  // Copy device variable to host and print
  cudaMemcpy(&h_test,&d_test,sizeof(float),cudaMemcpyDeviceToHost);
  printf("%f\n",h_test);  

}

using quote inside cudaMemcpyFromSymbol causes 0.0 after copy operation so just remove quote signs to make it works. — erogol, Apr 14 '13 at 11:55

score 6 · Answer 1 · edited Nov 24 '19 at 03:39

There are a couple of issues with your code.

As pezcode correctly notes, kernel1's parameter d_test shadows your global variable, so when it assigns to d_test, it is actually changing the value of its parameter, instead of the global variable as you intend. kernel1 need not take an argument in this example.
Instead of cudaMemcpy, use cudaMemcpyFromSymbol when copying from a global __device__ variable.

Here's the full solution:

// -*- mode: C -*-
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>

// device variable and kernel
__device__ float d_test;
__global__ void kernel1() { d_test = 1.0; }

int main() {

  // initialise variables
  float h_test = 0.0;
  cudaMemset(&d_test,0,sizeof(float));

  // invoke kernel
  kernel1 <<<1,1>>> ();

  // Copy device variable to host and print
  cudaMemcpyFromSymbol(&h_test, d_test, sizeof(float), 0, cudaMemcpyDeviceToHost);
  // or cudaMemcpyFromSymbol(&h_test, "d_test", sizeof(float), 0, cudaMemcpyDeviceToHost);
  // until CUDA 5.0
  printf("%f\n",h_test);  
}

And the output:

$ nvcc test.cu -run
1.000000

UPDATE 2019 nov

CUDA Toolkit documentation states that "use of a string naming a variable as the symbol parameter was deprecated in CUDA 4.1 and removed in CUDA 5.0."

at this moment, `cudaMemset` returns error msg: 'cudaMemset'; and `cudaMemcpyFromSymbol` returns error msg: 'returns error msg' (API has changed, see: https://stackoverflow.com/a/9997978/1150712) — zhangxaochen, Jun 21 '18 at 15:20
not sure exactly why, but it works once you remove the quotes (e.g. cudaMemcpyFromSymbol(&h_test, "d_test", sizeof(float), 0, cudaMemcpyDeviceToHost); ) — user2343039, Feb 01 '19 at 09:44

score 4 · Answer 2 · answered Mar 14 '12 at 07:08

cudaMemcpyFromSymbol worked for me after struggling quite a while with cudaMemcpy and getting the wrong values back. I had to remove the quotes around "d_test" and got an invalid argument error with the cudaMemset, so just used the cudaMalloc which was previously in the code.

score 1 · Answer 3 · answered Jul 29 '11 at 10:46

1

My guess is that kernel1 changes its parameter d_test as it hides the global device variable. Rename one of them or if that works with CUDA, explicitly use global scope by setting ::d_test.

answered Jul 29 '11 at 10:46

pezcode

5,490
2
24
37

I've changed the name of d_test to test in the kernel, but that hasn't made any difference. What's the global scope ::d_test thingy? – Eddy Jul 29 '11 at 10:50
For some reason I thought you were using C++ where you can use scope::var to access a variable in a specific scope. Just using :: would mean global scope, thus accessing the global variable hidden away by the function parameter with the same name. Sorry for the confusion :s – pezcode Jul 29 '11 at 15:12

score 0 · Answer 4 · answered Feb 21 '15 at 09:08

this code work for me fine . the key is there .

the second variable should be like this \n

cudaMemcpyFromSymbol(&h_test, d_test, sizeof(float), 0, cudaMemcpyDeviceToHost);

and then compile your code with
nvcc -Xcompiler -arch=sm_30 sample.cu -o sample.exe

Having problem assigning a device variable in CUDA

4 Answers4

Linked