Questions tagged [cuda]

CUDA is a parallel computing platform and programming model for Nvidia GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs.

0
votes
0answers
11 views

CUDA unified memory with structures causes mis-alignment

I'm using CUDA unified memory and overload the new and delete operators so that they use cudaMallocManaged and cudaFree. The weird thing I observed is that when dealing with structures containing ...
1
vote
1answer
31 views

Is it possible for a thread to atomically update 4 different places of the shared memory?

Suppose a thread of a kernel is trying to update 4 different places on the shared memory. Can I cause that operation to fail and be reversed if any other thread has overwritten any of those locations? ...
0
votes
0answers
38 views

why does a cublas function cause a 4-byte Pageable memcpy?

Why does a call to cublasIsamax_v2() creates a 4-byte pageable memcpy? On the other hand, cublasSgemv_v2() doesn't cause a pageable memcpy? Have a look at this image: Here is how I call iamax, ...
-3
votes
0answers
17 views

How CUDA implemented apriori algorithm in data mining library? [on hold]

I have found CUDA implementation of apriori algorithm.Here is it. Now I want to find any documentation about implementation(how it works, etc.).Because I have not any time to seat and understand all ...
0
votes
0answers
40 views

Any work-around to use polymorphism and virtual functions in CUDA kernels?

I implemented a simple CPU path tracer in OOP fashion, and now trying to port it into CUDA to speedup, and hit the problem that CUDA kernels doesn't take derived classes of virtual base classes. I'd ...
0
votes
0answers
29 views

How to get cmake to enable cuda when compiling yolo (darknet)?

I am currently using the cmake-gui to compile yolo darknet at https://github.com/AlexeyAB/darknet.git. However, it will not enable cuda and I am having a few other odd issues. These include when I run ...
0
votes
1answer
52 views

Does cuda stream has own __constant__ memory copy?

I have a kernel which uses a little __constant__ memory multiple times and needs to copy different values to __constant__ memory each time. Recently, I needed to make this kernel multi stream ...
0
votes
0answers
19 views

Static thrust::device_vector in dll function cause cudaErrorCudartUnloading (error 29) during termination [duplicate]

When declare a static thrust::device_vector in a function, wrapped into dll, it cause cudaErrorCudartUnloading (error 29) during the end of the program. It seems that the cuda context is destroyed ...
0
votes
1answer
44 views

CUDA Thrust Min_Element result equals 0

Very new to CUDA and C++ however have been working away at some problems I have noticed. I am wanting to generate the smallest number as well as the index in CUDA. Currently I have __global__ ...
0
votes
0answers
30 views

how to use multi-versions cuda in a shared computation machine

We need to use cuda-9.0 and cuda-10.0 in one computation machine. Its machine will be used by our team members. We don't want to access each computation. I tried to use docker. But anyone can access ...
-1
votes
0answers
28 views

ffmpeg compilation failed with cuda, libnpp not found

image: docker tensorflow/tensorflow:1.10.0-devel-gpu-py3 os: ubuntu 16.04 cuda: 9.0.176 ffmpeg pulled from github and nv-codec-headers downloaded from videolan, both the latest version step 1: cd /...
0
votes
0answers
29 views

CUDA was installed before VS, How to deal with it?

I want to use TensorFlow in Windows 10, but I didn't notice that I need to install Visio Studio before I install the CUDA. Now I can't run my HelloWorld.py properly. Any one knows how to deal with it? ...
0
votes
0answers
18 views

What's CUDA_HOST_COMPILER's value when using cmake find_package in CUDA demo

I tested a little demo of CUDA10 demo,and built the project by cmake in vs code, the CMakeLists is: project(helloworld) cmake_minimum_required(VERSION 2.8) find_package(CUDA REQUIRED) if(CUDA_FOUND)...
-2
votes
0answers
34 views

fail to unload nvidia driver [on hold]

I want to unload old nvidia driver this is module info $ lsmod | grep nvidia nvidia_uvm 786432 0 nvidia_drm 40960 0 nvidia_modeset 1036288 1 nvidia_drm nvidia ...
-1
votes
1answer
34 views

why is ceil used here and what purpose does it serve [duplicate]

I was looking at a game of life gpu code and could not understand why is ceil used for dim3 cpyBlockSize(BLOCK_SIZE,1,1); dim3cpysimulationRowssimulationSize((int) ceil (size/(float) ...
0
votes
0answers
36 views

CMake CUDA Language Support on Windows

I've been trying to build a project for some time now that enables the CUDA language on Windows 10 using Visual Studio as the generator (i.e. project(MyProject LANGUAGES CUDA CXX). I've tried the ...
0
votes
0answers
44 views

How to pass the structure to CUDA? [on hold]

I want to pass thia structure to CUDA device, and my structure contains pointers, the pointer is to another structure. there is visual studio 2017 community and cuda 10.0 there is the structure ...
0
votes
0answers
47 views

In cuda, how to dynamically specify the gpu device id?

I want to implement dynamic binding. I use LD_PRELOAD to hijack the cudaSetDevice function. After selecting the appropriate gpu, I modify the device id to implement dynamic binding, but there are some ...
-1
votes
1answer
35 views

How do I copy the cuda input array into the shared array?

I'm trying to copy a cuda input array into a shared memory array. The first n values copy into the shared array perfectly but after that there are some pretty weird patterns happening. Can anyone find ...
0
votes
1answer
56 views

Maximum number of CUDA blocks?

I want to implement an algorithm in CUDA that takes an input of size N and uses N^2 threads to execute it (this is the way the particular algorithm words). I've been asked to make a program that can ...
0
votes
1answer
58 views

how to create a matrix in gpu and print it on cpu?

This is a code to create a matrix on gpu and print it out on cpu. Can anyone tell me where am I going wrong. Thank you. # include <stdio.h> __global__ void create(int **d_a){ int i = ...
0
votes
1answer
51 views

How to transfer datatype declaration from .cpp file to .cu file?

I found that cuda support use "template" keyword for the code, now I would like to link the usage of "template" between nvcc and g++. But it seems that I cannot find a proper way to implement it, so I ...
-1
votes
0answers
37 views

I am new to cuda just curious why is this happening, something wrong with memory allocation maybe? [closed]

Here is a vector addition code, if i use c[blockidx.X] = a[blockidx.x] +b[blockidx.x] the result is garbage value or zero. How do I intialize c[ ] to 0.This is my first time asking a question, I ...
-1
votes
0answers
20 views

error code=77 when increasing the array size

I am using the CUDA to process a vector of (numElements) vertices , the code is working well, but when I increase the vertices no. to 61 there is an error code=77(cudaErrorIllegalAddress) "...
-1
votes
0answers
24 views

curand_init prevents __global__ method from compiling in Cuda

When compiling this code, I have noticed that when the thread count is high (above i = 64 and j = 10) the method will not compile. No matter what I try to print the code just will not run here. If I ...
1
vote
1answer
46 views

CUDA coalesced memory access speed depending on word size

I have a CUDA program where one warp needs to access (for example) 96 bytes of global memory. It properly aligns the memory location and lane indices such that the access is coalesced and done in a ...
0
votes
0answers
47 views

concurrent execution of cuda kernels from different contexts

https://docs.nvidia.com/deploy/mps/index.html#topic_4_1 says GPU's with Hyper-Q have a concurrent scheduler to schedule work from work queues belonging to a single CUDA context. but when I ...
0
votes
1answer
41 views

L1 cache in GPU

I see some similar terms while reading memoryt hierarchy of GPUs and since there were some architectural modifications in past versions, I don't know if they can be used together or has different ...
0
votes
0answers
27 views

Can't find CUDA_INCLUDE_DIRS in latest CMAKE

Since in CMAKE 3.10, CUDA macro is supported by default (https://cmake.org/cmake/help/latest/module/FindCUDA.html). But I can't find the variable CUDA_INCLUDE_DIRS cmake_minimum_required(VERSION 3....
0
votes
0answers
25 views

Timing cuDNN operations [closed]

I searched a lot to find a proper way to measure the runtime for cuDNN operations, particularly forward convolutions, but I couldn't find anything. Can I use CUDA events to measure the GPU timers or ...
0
votes
1answer
28 views

local cache hit metric in cuda profiler

For some CUDA application profilings, I see that the value of local hit rate (local_hit_rate metric) is 0%. I want to distinguish the following concepts with that value. The application has no ...
0
votes
1answer
36 views

Struggles with RSA Encryption on CUDA

I am trying to accelerate encryption using the RSA algorithm using CUDA. I can't properly perform power-modulo in the kernel function. I am using Cuda compilation tools on AWS, release 9.0, V9.0.176 ...
0
votes
0answers
28 views

1D FFTs on MxNxD cube

I need to do 1D cuFFTs on a cube with MxNxD dimension. The total count of 1D ffts is MxD. The memory layout is created as: [x111, x112, ... x11D, x121, X122, ... X12D, X1N1, X1N2, ... X1ND, x211, ...
-2
votes
0answers
39 views

I try to reuse CUDA device memory, but hit 0700 “CUDA_ERROR_ILLEGAL_ADDRESS” error

I have been trying to create some cuda program to init context at start time, and then allocate some device memory for later use. Here is pseudo-code: global vars: int gdevID; CUdevice gcuDevice; ...
-3
votes
0answers
33 views

What would cause a Numba Cuda kernel to do nothing but not return an error?

What would cause a Numba Cuda kernel to do nothing but not return an error? Problem: My code runs fine with small threadblocks (16x8 or less) but Above a certain number of threads my call to the ...
-2
votes
1answer
57 views

Translation of CUDA inline asm from GAS to Intel

I have some C-CUDA code that contains inline PTX assembly, which compiles OK on Linux with g++ backend. I need to build it under Windows, and clearly MSVC backend does not recognize inline asm ...
-3
votes
0answers
31 views

Can you run CUDA on Linux Fedora 28? [closed]

I want to install CUDA on my Linux Fedora 28 machine, and am following this guide: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html In Table 1 in that guide, the only Fedora ...
-2
votes
0answers
34 views

CUDA: speed of memory access vs bit shift

I has this code (well, most of it isn't really important): __constant__ uint8_t bitmask[] = {1, 2, 4, 8, 16, 32, 64, 128}; template <typename T> struct null_conversion { null_conversion(T* ...
-2
votes
0answers
54 views

Cuda 2d texture memory without pitched memory allocation

I am trying to access my data through 2D texture memory object. Because I use cufft before that operation and cufft only accepts pitch data in terms of elements rather than bytes, I allocated a linear ...
-1
votes
0answers
53 views

CUDA device-only multidimensional arrays

I am trying to write a CUDA c++ code for getting the inverse of a matrix, for which i require to process cofactors of the matrix. I was wondering whether there is a way other than converting the 2D ...
0
votes
0answers
23 views

How to install PyCUDA without root/admin privelleges [duplicate]

Good day, I am running CUDA on a server I am SSH'd into. I would like to run pycuda as well for some specific applications, however it appears that every installation process requires some form of ...
-5
votes
0answers
46 views

Calculate eigenvalue inside cuda kernel function [closed]

I want to calculate eigenvalues of a matrix inside the cuda `global kernep function. I have tried to look for some algorithm so that I can write a __device__ function for eigenvalues and call it ...
0
votes
1answer
74 views

Can float3 enjoy CUDA memory coalescing?

From my understanding, only accessing memory by 4 bytes, 8 bytes or 16 bytes per thread can enjoy CUDA global memory coalescing. Following this, the frequently used float3 is a 612-byte type and is ...
0
votes
1answer
48 views

CUDA data initialization

As far as the tutorial regarding on CUDA, most of the data is sent to Device by kernel invocation. I wonder if there is anyway I can perform Init Data -- Process(s) -- clean up sort of operation in ...
0
votes
0answers
41 views

CUDA C allocating GPU memory for a struct of structs [duplicate]

I implemented a representation of screen with some number of pixels that can draw a simple pyramid (by assigning an RBG value for each pixel in the screen). While learning CUDA C, I wanted to re-...
-1
votes
0answers
42 views

is there a way to call any API with my gpu?

I want to know if there is a way to call any api with cuda. What i want to do is a stress test of a local app, and i want to try to do it with my gpu, so i can "consume" this local app with more "...
-2
votes
0answers
24 views

I have a 2D block kernel and i want to modify it so as to compute a different row

__global__ void MatAdd (float *A, float *B, float * C, int N) { int j = blockIdx.x * blockDim.x + threadIdx.x //rows int i = blockIdx.y * blockDim.y + threadIdx.y; //columns int index=i*N+j; ...
0
votes
1answer
47 views

How to find installation path in virtual environment?

I created virtual environment using anaconda python. I installed cuda toolkit in the created environment. Now I have to give path of cuda installation in makefile. Default path /usr/local/cuda/include/...
-4
votes
0answers
67 views

nvidia-drm error when installing Cuda drivers on ubuntu 18.04 [closed]

I am running and got the following options: CUDA Installer │ │ - [X] Driver ...
-1
votes
1answer
45 views

Add multiple vectors concurrently in cuda

I want to design a kernel to add a matrix row pairs concurrently, but I don't know how to accomplish it. For example, I have a data matrix, which size is (512, 1024), and I want to add its row pairs(...