Questions tagged [cuda]
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.
14,512
questions
956
votes
30
answers
2.6m
views
How to get the CUDA version?
Is there any quick command or script to check for the version of CUDA installed?
I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed ...
567
votes
19
answers
831k
views
Nvidia NVML Driver/library version mismatch [closed]
When I run nvidia-smi, I get the following message:
Failed to initialize NVML: Driver/library version mismatch
An hour ago I received the same message and uninstalled my CUDA library and I was able ...
450
votes
32
answers
872k
views
How to tell if tensorflow is using gpu acceleration from inside python shell?
I have installed tensorflow in my ubuntu 16.04 using the second answer here with ubuntu's builtin apt cuda installation.
Now my question is how can I test if tensorflow is really using gpu? I have a ...
334
votes
7
answers
603k
views
Which TensorFlow and CUDA version combinations are compatible?
I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations ...
319
votes
8
answers
286k
views
Different CUDA versions shown by nvcc and NVIDIA-smi
I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So ...
306
votes
5
answers
162k
views
What is the canonical way to check for errors using the CUDA runtime API?
Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API ...
267
votes
17
answers
382k
views
A top-like utility for monitoring CUDA activity on a GPU [closed]
I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
266
votes
16
answers
746k
views
How to verify CuDNN installation?
I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to ...
257
votes
10
answers
411k
views
Using GPU from a docker container?
I'm searching for a way to use the GPU from inside a docker container.
The container will execute arbitrary code so i don't want to use the privileged mode.
Any tips?
From previous research i ...
182
votes
2
answers
81k
views
How do CUDA blocks/warps/threads map onto CUDA cores?
I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread.
I am studying the architecture from a didactic point of view (university project), so ...
178
votes
2
answers
171k
views
Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]
How are threads organized to be executed by a GPU?
170
votes
7
answers
507k
views
How do I select which GPU to run a job on?
In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?
As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several ...
169
votes
5
answers
128k
views
Using Java with Nvidia GPUs (CUDA)
I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data.
We ordered some CUDA GPUs to try it ...
158
votes
22
answers
268k
views
CUDA incompatible with my gcc version
I have troubles compiling some of the examples shipped with CUDA SDK.
I have installed the developers driver (version 270.41.19) and the CUDA toolkit,
then finally the SDK (both the 4.0.17 version).
...
157
votes
9
answers
107k
views
Difference between global and device functions
Can anyone describe the differences between __global__ and __device__ ?
When should I use __device__, and when to use __global__?.
143
votes
3
answers
152k
views
How do I choose grid and block dimensions for CUDA kernels?
This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here.
Following this link, the answer from talonmies contains a code ...
137
votes
7
answers
101k
views
GPU Emulator for CUDA programming without the hardware [closed]
Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware?
Info:
I'm looking to speed up a few simulations of mine in CUDA, ...
133
votes
9
answers
309k
views
Is it possible to run CUDA on AMD GPUs?
I'd like to extend my skill set into GPU computing. I am familiar with raytracing and realtime graphics(OpenGL), but the next generation of graphics and high performance computing seems to be in GPU ...
122
votes
5
answers
62k
views
What is a bank conflict? (Doing Cuda/OpenCL programming)
I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject ...
119
votes
13
answers
345k
views
How can I flush GPU memory using CUDA (physical reset is unavailable)
My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied.
I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported.
...
119
votes
11
answers
224k
views
How to get the nvidia driver version from the command line?
For debugging CUDA code and checking compatibilities I need to find out what nvidia driver version for the GPU I have installed. I found How to get the cuda version? but that does not help me here.
112
votes
2
answers
83k
views
nvidia-smi Volatile GPU-Utilization explanation?
I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number ...
110
votes
4
answers
85k
views
Streaming multiprocessors, Blocks and Threads (CUDA)
What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads?
What gets mapped to what and what is parallelized and how? and what is more ...
109
votes
6
answers
168k
views
Can I run CUDA on Intel's integrated graphics processor?
I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn ...
108
votes
10
answers
49k
views
NVIDIA vs AMD: GPGPU performance
I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA.
NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question ...
107
votes
4
answers
76k
views
In CUDA, what is memory coalescing, and how is it achieved?
What is "coalesced" in CUDA global memory transaction? I couldn't understand even after going through my CUDA guide. How to do it? In CUDA programming guide matrix example, accessing the matrix row by ...
97
votes
4
answers
47k
views
Why is CUDA pinned memory so fast?
I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it ...
96
votes
8
answers
50k
views
Best approach for GPGPU/CUDA/OpenCL in Java?
General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing.
I'd love to use GPGPU for image processing, ...
95
votes
8
answers
149k
views
LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in file.obj
I am Integrating Matlab, C and Cuda together in a project. I used Matlab mix in order to connect matlab mx function written in c with the cuda runtime library, a linking error appear about conflict in ...
92
votes
7
answers
335k
views
How to remove cuda completely from ubuntu?
I have ubuntu 18.04, and accidentally installed cuda 9.1 to run Tensorflow-gpu, but it seems tensorflow-gpu requires cuda 10.0, so I want to remove cuda first by executing:
martin@nlp-server:~$ sudo ...
91
votes
4
answers
110k
views
When to call cudaDeviceSynchronize?
when is calling to the cudaDeviceSynchronize function really needed?.
As far as I understand from the CUDA documentation, CUDA kernels are asynchronous, so it seems that we should call ...
78
votes
8
answers
64k
views
Passing pointers between C and Java through JNI
At the moment, i'm trying to create a Java-application which uses CUDA-functionality. The connection between CUDA and Java works fine, but i've got another problem and wanted to ask, if my thoughts ...
75
votes
2
answers
53k
views
GPU Programming, CUDA or OpenCL? [closed]
I am a newbie to GPU programming. I have a laptop with NVIDIA GeForce GT 640 card. I am faced with 2 dilemmas, suggestions are most welcome.
If I go for CUDA -- Ubuntu or Windows Clearly CUDA is more ...
70
votes
5
answers
95k
views
CUDA determining threads per block, blocks per grid
I'm new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I've found is that many examples have ...
70
votes
1
answer
24k
views
How and when should I use pitched pointer with the cuda API?
I have quite a good understanding about how to allocate and copy linear memory with cudaMalloc() and cudaMemcpy(). However, when I want to use the CUDA functions to allocate and copy 2D or 3D matrices,...
69
votes
6
answers
276k
views
Error Message : Cannot find or open the PDB file
I tried running sample programs provided at NVIDIA's official site. Most of the programs ran smoothly except few where I get similar error messages. How can I fix that? Here's a sample of error ...
67
votes
5
answers
117k
views
Does __syncthreads() synchronize all threads in the grid?
Does __syncthreads() synchronize all threads in the grid or just the threads in the current warp or block?
Also, when the threads in a particular block encounter (in the kernel) the following line
...
67
votes
5
answers
223k
views
Where did CUDA get installed on Ubuntu 14.04 on my computer?
I'm trying to install CUDA 7.5 in my ubuntu 14.04. I followed everything in this guide (installation through package): http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/...
66
votes
5
answers
90k
views
What is the difference between cuda vs tensor cores?
I am completely new to terms related to HPC computing, but I just saw that EC2 released its new type of instance on AWS that's powered by the new Nvidia Tesla V100, which has both kinds of "cores": ...
65
votes
7
answers
180k
views
How to let cmake find CUDA
I am trying to build this project, which has CUDA as a dependency. But the cmake script cannot find the CUDA installation on the system:
cls ~/workspace/gpucluster/cluster/build $ cmake ..
-- The C ...
65
votes
1
answer
58k
views
Pytorch. How does pin_memory work in Dataloader?
I want to understand how the pin_memory parameter in Dataloader works.
According to the documentation:
pin_memory (bool, optional) – If True, the data loader will copy tensors into CUDA pinned memory ...
64
votes
8
answers
105k
views
Error compiling CUDA from Command Prompt
I'm trying to compile a cuda test program on Windows 7 via Command Prompt,
I'm this command:
nvcc test.cu
But all I get is this error:
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
What may ...
63
votes
4
answers
110k
views
Cuda gridDim and blockDim
I get what blockDim is, but I have a problem with gridDim. Blockdim gives the size of the block, but what is gridDim? On the Internet it says gridDim.x gives the number of blocks in the x coordinate.
...
63
votes
12
answers
32k
views
Does CUDA support recursion?
Does CUDA support recursion?
61
votes
3
answers
45k
views
Structure of Arrays vs Array of Structures
From some comments that I have read in here, it is preferable to have Structure of Arrays (SoA) over Array of Structures (AoS) for parallel implementations like CUDA. If that is true, can anyone ...
61
votes
4
answers
71k
views
Coding CUDA with C#?
I've been looking for some information on coding CUDA (the nvidia gpu language) with C#. I have seen a few of the libraries, but it seems that they would add a bit of overhead (because of the p/...
60
votes
6
answers
47k
views
Compression library using Nvidia's CUDA [closed]
Does anyone know a project which implements standard compression methods (like Zip, GZip, BZip2, LZMA,...) using NVIDIA's CUDA library?
I was wondering if algorithms which can make use of a lot of ...
60
votes
5
answers
68k
views
Fortran vs C++, does Fortran still hold any advantage in numerical analysis these days? [closed]
With the rapid development of C++ compilers,especially the intel ones, and the abilities of directly applying SIMD functions in your C/C++ code, does Fortran still hold any real advantage in the world ...
59
votes
8
answers
184k
views
How to install CUDA in Google Colab GPU's
It seems that Google Colab GPU's doesn't come with CUDA Toolkit, how can I install CUDA in Google Colab GPU's. I am getting this error in installing mxnet in Google Colab.
Installing collected ...
58
votes
15
answers
34k
views
CUDA or FPGA for special purpose 3D graphics computations? [closed]
I am developing a product with heavy 3D graphics computations, to a large extent closest point and range searches. Some hardware optimization would be useful. While I know little about this, my boss (...