Skip to main content

Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

Filter by
Sorted by
Tagged with
956 votes
30 answers
2.6m views

How to get the CUDA version?

Is there any quick command or script to check for the version of CUDA installed? I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed ...
Hailiang Zhang's user avatar
567 votes
19 answers
831k views

Nvidia NVML Driver/library version mismatch [closed]

When I run nvidia-smi, I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my CUDA library and I was able ...
etal's user avatar
  • 14.2k
450 votes
32 answers
872k views

How to tell if tensorflow is using gpu acceleration from inside python shell?

I have installed tensorflow in my ubuntu 16.04 using the second answer here with ubuntu's builtin apt cuda installation. Now my question is how can I test if tensorflow is really using gpu? I have a ...
Tamim Addari's user avatar
  • 7,761
334 votes
7 answers
603k views

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations ...
whiletrue's user avatar
  • 10.8k
319 votes
8 answers
286k views

Different CUDA versions shown by nvcc and NVIDIA-smi

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So ...
yuqli's user avatar
  • 4,951
306 votes
5 answers
162k views

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API ...
talonmies's user avatar
  • 71.8k
267 votes
17 answers
382k views

A top-like utility for monitoring CUDA activity on a GPU [closed]

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
natorro's user avatar
  • 3,033
266 votes
16 answers
746k views

How to verify CuDNN installation?

I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to ...
alfredox's user avatar
  • 4,302
257 votes
10 answers
411k views

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container. The container will execute arbitrary code so i don't want to use the privileged mode. Any tips? From previous research i ...
Regan's user avatar
  • 8,561
182 votes
2 answers
81k views

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so ...
Daedalus's user avatar
  • 1,821
178 votes
2 answers
171k views

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]

How are threads organized to be executed by a GPU?
cibercitizen1's user avatar
170 votes
7 answers
507k views

How do I select which GPU to run a job on?

In a multi-GPU computer, how do I designate which GPU a CUDA job should run on? As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several ...
Steven C. Howell's user avatar
169 votes
5 answers
128k views

Using Java with Nvidia GPUs (CUDA)

I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data. We ordered some CUDA GPUs to try it ...
Hans's user avatar
  • 1,906
158 votes
22 answers
268k views

CUDA incompatible with my gcc version

I have troubles compiling some of the examples shipped with CUDA SDK. I have installed the developers driver (version 270.41.19) and the CUDA toolkit, then finally the SDK (both the 4.0.17 version). ...
fbielejec's user avatar
  • 3,620
157 votes
9 answers
107k views

Difference between global and device functions

Can anyone describe the differences between __global__ and __device__ ? When should I use __device__, and when to use __global__?.
Mehdi Saman Booy's user avatar
143 votes
3 answers
152k views

How do I choose grid and block dimensions for CUDA kernels?

This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here. Following this link, the answer from talonmies contains a code ...
user1292251's user avatar
  • 1,715
137 votes
7 answers
101k views

GPU Emulator for CUDA programming without the hardware [closed]

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, ...
Narcolapser's user avatar
  • 6,155
133 votes
9 answers
309k views

Is it possible to run CUDA on AMD GPUs?

I'd like to extend my skill set into GPU computing. I am familiar with raytracing and realtime graphics(OpenGL), but the next generation of graphics and high performance computing seems to be in GPU ...
Lee Jacobs's user avatar
  • 1,717
122 votes
5 answers
62k views

What is a bank conflict? (Doing Cuda/OpenCL programming)

I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject ...
smuggledPancakes's user avatar
119 votes
13 answers
345k views

How can I flush GPU memory using CUDA (physical reset is unavailable)

My CUDA program crashed during execution, before memory was flushed. As a result, device memory remained occupied. I'm running on a GTX 580, for which nvidia-smi --gpu-reset is not supported. ...
timdim's user avatar
  • 1,221
119 votes
11 answers
224k views

How to get the nvidia driver version from the command line?

For debugging CUDA code and checking compatibilities I need to find out what nvidia driver version for the GPU I have installed. I found How to get the cuda version? but that does not help me here.
Framester's user avatar
  • 34.8k
112 votes
2 answers
83k views

nvidia-smi Volatile GPU-Utilization explanation?

I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number ...
user3813674's user avatar
  • 2,623
110 votes
4 answers
85k views

Streaming multiprocessors, Blocks and Threads (CUDA)

What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads? What gets mapped to what and what is parallelized and how? and what is more ...
user avatar
109 votes
6 answers
168k views

Can I run CUDA on Intel's integrated graphics processor?

I have a very simple Toshiba Laptop with i3 processor. Also, I do not have any expensive graphics card. In the display settings, I see Intel(HD) Graphics as display adapter. I am planning to learn ...
Ankit's user avatar
  • 6,882
108 votes
10 answers
49k views

NVIDIA vs AMD: GPGPU performance

I'd like to hear from people with experience of coding for both. Myself, I only have experience with NVIDIA. NVIDIA CUDA seems to be a lot more popular than the competition. (Just counting question ...
Eugene Smith's user avatar
  • 9,238
107 votes
4 answers
76k views

In CUDA, what is memory coalescing, and how is it achieved?

What is "coalesced" in CUDA global memory transaction? I couldn't understand even after going through my CUDA guide. How to do it? In CUDA programming guide matrix example, accessing the matrix row by ...
kar's user avatar
  • 2,695
97 votes
4 answers
47k views

Why is CUDA pinned memory so fast?

I observe substantial speedups in data transfer when I use pinned memory for CUDA data transfers. On linux, the underlying system call for achieving this is mlock. From the man page of mlock, it ...
Gearoid Murphy's user avatar
96 votes
8 answers
50k views

Best approach for GPGPU/CUDA/OpenCL in Java?

General-purpose computing on graphics processing units (GPGPU) is a very attractive concept to harness the power of the GPU for any kind of computing. I'd love to use GPGPU for image processing, ...
Frederik's user avatar
  • 14.4k
95 votes
8 answers
149k views

LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in file.obj

I am Integrating Matlab, C and Cuda together in a project. I used Matlab mix in order to connect matlab mx function written in c with the cuda runtime library, a linking error appear about conflict in ...
Ahmed Hassan's user avatar
92 votes
7 answers
335k views

How to remove cuda completely from ubuntu?

I have ubuntu 18.04, and accidentally installed cuda 9.1 to run Tensorflow-gpu, but it seems tensorflow-gpu requires cuda 10.0, so I want to remove cuda first by executing: martin@nlp-server:~$ sudo ...
marlon's user avatar
  • 7,197
91 votes
4 answers
110k views

When to call cudaDeviceSynchronize?

when is calling to the cudaDeviceSynchronize function really needed?. As far as I understand from the CUDA documentation, CUDA kernels are asynchronous, so it seems that we should call ...
user1588226's user avatar
78 votes
8 answers
64k views

Passing pointers between C and Java through JNI

At the moment, i'm trying to create a Java-application which uses CUDA-functionality. The connection between CUDA and Java works fine, but i've got another problem and wanted to ask, if my thoughts ...
Volker's user avatar
  • 783
75 votes
2 answers
53k views

GPU Programming, CUDA or OpenCL? [closed]

I am a newbie to GPU programming. I have a laptop with NVIDIA GeForce GT 640 card. I am faced with 2 dilemmas, suggestions are most welcome. If I go for CUDA -- Ubuntu or Windows Clearly CUDA is more ...
Arkapravo's user avatar
  • 4,094
70 votes
5 answers
95k views

CUDA determining threads per block, blocks per grid

I'm new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I've found is that many examples have ...
dnbwise's user avatar
  • 1,092
70 votes
1 answer
24k views

How and when should I use pitched pointer with the cuda API?

I have quite a good understanding about how to allocate and copy linear memory with cudaMalloc() and cudaMemcpy(). However, when I want to use the CUDA functions to allocate and copy 2D or 3D matrices,...
Ernest_Galbrun's user avatar
69 votes
6 answers
276k views

Error Message : Cannot find or open the PDB file

I tried running sample programs provided at NVIDIA's official site. Most of the programs ran smoothly except few where I get similar error messages. How can I fix that? Here's a sample of error ...
KNU's user avatar
  • 2,508
67 votes
5 answers
117k views

Does __syncthreads() synchronize all threads in the grid?

Does __syncthreads() synchronize all threads in the grid or just the threads in the current warp or block? Also, when the threads in a particular block encounter (in the kernel) the following line ...
Wuschelbeutel Kartoffelhuhn's user avatar
67 votes
5 answers
223k views

Where did CUDA get installed on Ubuntu 14.04 on my computer?

I'm trying to install CUDA 7.5 in my ubuntu 14.04. I followed everything in this guide (installation through package): http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/...
krips89's user avatar
  • 1,733
66 votes
5 answers
90k views

What is the difference between cuda vs tensor cores?

I am completely new to terms related to HPC computing, but I just saw that EC2 released its new type of instance on AWS that's powered by the new Nvidia Tesla V100, which has both kinds of "cores": ...
Simon Ernesto Cardenas Zarate's user avatar
65 votes
7 answers
180k views

How to let cmake find CUDA

I am trying to build this project, which has CUDA as a dependency. But the cmake script cannot find the CUDA installation on the system: cls ~/workspace/gpucluster/cluster/build $ cmake .. -- The C ...
clstaudt's user avatar
  • 22.2k
65 votes
1 answer
58k views

Pytorch. How does pin_memory work in Dataloader?

I want to understand how the pin_memory parameter in Dataloader works. According to the documentation: pin_memory (bool, optional) – If True, the data loader will copy tensors into CUDA pinned memory ...
Ivan Belonogov's user avatar
64 votes
8 answers
105k views

Error compiling CUDA from Command Prompt

I'm trying to compile a cuda test program on Windows 7 via Command Prompt, I'm this command: nvcc test.cu But all I get is this error: nvcc fatal : Cannot find compiler 'cl.exe' in PATH What may ...
GennSev's user avatar
  • 1,636
63 votes
4 answers
110k views

Cuda gridDim and blockDim

I get what blockDim is, but I have a problem with gridDim. Blockdim gives the size of the block, but what is gridDim? On the Internet it says gridDim.x gives the number of blocks in the x coordinate. ...
ehah's user avatar
  • 685
63 votes
12 answers
32k views

Does CUDA support recursion?

Does CUDA support recursion?
JuanPablo's user avatar
  • 24.4k
61 votes
3 answers
45k views

Structure of Arrays vs Array of Structures

From some comments that I have read in here, it is preferable to have Structure of Arrays (SoA) over Array of Structures (AoS) for parallel implementations like CUDA. If that is true, can anyone ...
BugShotGG's user avatar
  • 5,070
61 votes
4 answers
71k views

Coding CUDA with C#?

I've been looking for some information on coding CUDA (the nvidia gpu language) with C#. I have seen a few of the libraries, but it seems that they would add a bit of overhead (because of the p/...
Jess's user avatar
  • 8,695
60 votes
6 answers
47k views

Compression library using Nvidia's CUDA [closed]

Does anyone know a project which implements standard compression methods (like Zip, GZip, BZip2, LZMA,...) using NVIDIA's CUDA library? I was wondering if algorithms which can make use of a lot of ...
Xn0vv3r's user avatar
  • 17.9k
60 votes
5 answers
68k views

Fortran vs C++, does Fortran still hold any advantage in numerical analysis these days? [closed]

With the rapid development of C++ compilers,especially the intel ones, and the abilities of directly applying SIMD functions in your C/C++ code, does Fortran still hold any real advantage in the world ...
user0002128's user avatar
  • 2,905
59 votes
8 answers
184k views

How to install CUDA in Google Colab GPU's

It seems that Google Colab GPU's doesn't come with CUDA Toolkit, how can I install CUDA in Google Colab GPU's. I am getting this error in installing mxnet in Google Colab. Installing collected ...
namerbenz's user avatar
  • 649
58 votes
15 answers
34k views

CUDA or FPGA for special purpose 3D graphics computations? [closed]

I am developing a product with heavy 3D graphics computations, to a large extent closest point and range searches. Some hardware optimization would be useful. While I know little about this, my boss (...
Fredriku73's user avatar
  • 3,180

1
2 3 4 5
291