The results are in! See what nearly 90,000 developers picked as their most loved, dreaded, and desired coding languages and more in the 2019 Developer Survey.

Questions tagged [opencl]

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors.

0
votes
0answers
15 views

OpenCL usable when compiling host application with Address Sanitizer

I'm debugging a crashing issue of my OpenCL application. So I turned on asan to pin down where the problem originates. But then I discovered that by turning on asan and recompiling, my application ...
0
votes
0answers
6 views

clEnqueueAcquireD3D11ObjectsKHR blocks for a long time

In my application, I have a processing thread that enqueues an OpenCL kernel that writes to a ID3D11Texture2D object. Everything works fine in terms of correctness. I can successfully acquire the ...
0
votes
1answer
37 views

Did opencv use some acceleration technology(i.e. opencl) on gaussian blur?

I implement my own gaussian filter by C++ and neon. Pseudo code: oneDimensionBlur(src, temp1, width, height) //implement by C++ transposeMatrix(temp1, temp2, width, height) //implement by neon ...
-1
votes
0answers
36 views

Tesseract - OpenCL installation

I'm quite unclear as to how OpenCL is installed and there's no clear instructions anywhere (specifically I need OpenCL to speed up tesseract). As I understand it the nvidia drivers and cuda toolkit ...
0
votes
0answers
24 views

OpenCL on MacOS: SIGABRT in release build, EXC_BAD_INSTRUCTION in libdispatch in debug build when using AMD Radeon 555 as CL device

I'm encountering a hard to track down bug on MacOS in an OpenCL-based application. In a release build my code crashes with a SIGABRT at some point, in a release build I get an EXC_BAD_INSTRUCTION on a ...
0
votes
0answers
35 views

gerrit diff view syntax highlighting for CUDA and OpenCL

The Gerrit code review site does not have syntax highlighting for CUDA and OpenCL. Traditionally, to get syntax highlighting of e..g .cu/.cuh/.clh was possible on the old UI, thought not painless, by ...
0
votes
1answer
40 views

How to cast an int to a float in OpenCL?

I am writing a mandelbrot fractal renderer in Java using OpenCL. In my kernel code I need to cast an int to a float. But when I say printf("%d", sizeX, "\n%d", (float) sizeX, "\n\n"); (sizeX is an int)...
0
votes
0answers
28 views

How do I set my system up to use PyOpenCL?

I'm trying to use a Python library (PyNoise) which depends on PyOpenCL to run. I've installed both packages successfully, but as soon as my Python app tries to import pyopencl, I get the following ...
0
votes
0answers
41 views

OpenCL: Implementing Dining Philosophers problem?

I'm trying to implement a variation of the Dining Philosophers problem using OpenCL. I have posted the code below but it's not giving the expected output. I'm using a semaphore array for the ...
2
votes
1answer
43 views

What is the meaning of having a certain number of OpenCL work-items into a CPU?

I'm trying tu understand why I could have more work-items in a CPU than a GPU in one dimension. PLATFORM 0 DEVICE 0 == CPU == DEVICE_VENDOR: Intel DEVICE NAME: Intel(R) Core(TM) i5-5257U CPU @ 2....
0
votes
1answer
21 views

How to enable the “basic” device in pocl?

I have installed pocl. make check shows all 145 tests passed. The build shows that --******** Enabled features: ...... -- OCL_DRIVERS (Drivers built): basic pthreads ...... But clinfo command ...
0
votes
1answer
33 views

My OpenCL code is slower on GPU than on my CPU

I am starting with OpenCL for some computer vision tasks. I use the python pyopencl module. My code runs faster on an Intel cpu than on my Nvidia GTX 750Ti. I have an example code that multiplies a (...
0
votes
3answers
50 views

Multidimensional Kernel launch in openCL not working

I am trying to launch openCL in 3 dimensions as follows: size_t globalWorkSize[3] = {32, 3, 3}; size_t localWorkSize[2] = {32, 32}; err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, ...
1
vote
1answer
38 views

OpenCL's clEnqueueReadBufferRect works for int but not double data type

I need to copy some data with a certain stride from the device to the host. I already have a solution using a simple OpenCL kernel, but for certain circumstances I'd like to have the option to not use ...
1
vote
1answer
32 views

OpenCL C++ Bindings: How to implement a callback for enqueueWriteBuffer competition

I'm just getting started with OpenCL 1.2 and the C++ Bindings. I want to enqueue a write buffer asynchronous and get a callback once the operation has been completed. Here is a stripped down version ...
2
votes
1answer
62 views

problem with ray quad intersection code (looks like tearing)

Hi I am implementing monte carlo path tracing and I have it working fine but looks like there is some issue with intersection code . Following is image If you see in red in left corner there seems ...
0
votes
1answer
58 views

Opencl - single queue with 2 devices

Im trying to port CUDA a test to Opencl. It requires a copy of a buffer from PCIe device-1 onto device-2 of the same type (same brand, same driver etc) In CUDA it is quite simple: Allocate memory on ...
0
votes
1answer
86 views

Why is OpenCL nested loop only working for some elements

I am trying to implement the following loop in an OpenCL kernel. for(i=0;i<N;i++) for(j=0;j<M;j++) weights[i*M+j] += gradients[i] * input[j]; This is my kernel. I am currently hardcoding M to ...
0
votes
1answer
49 views

How to write/read a single float value(buffer) from OpenCL device

There are lots of questions about how to read an array from the device, but I only wanna read a single float value from the device. Or it only can read an array from the device? I create a buffer for ...
0
votes
0answers
21 views

How to link Your Host Application to the Khronos ICD Loader Library

I am getting this warning Warning: Cannot find any Intel(R) FPGA Board libraries. No Intel(R) FPGA devices will be loaded. Please contact your board vender or see section "Linking Your Host ...
1
vote
1answer
33 views

Using OpenCL with Android JNI produces slow code due to some overhead

I implemented an algorithm on android using OpenCL and OpenMP. The OpenMP implementation runs about 10 times slower than the OpenCL one. OpenMP: ~250 ms OpenCL: ~25 ms But overall, if I measure the ...
2
votes
1answer
73 views

OpenCL Pipeline failed to allocate buffer with cl_mem_object_allocation_failure

I have an OpenCL pipeline that process image/video and it can be greedy with the memory sometimes. It is crashing on cl::Buffer() allocation like this: cl_int err = CL_SUCCESS; cl::Buffer tmp = cl::...
1
vote
1answer
102 views

How to fix that OpenCL freezes?

I'm trying to detect blinking pixels. I've written the code in C++ first but I realized, that a CPU is not suitable for it. So I found the OpenCL library. I've never used it before. Besides, I haven't ...
0
votes
2answers
43 views

CL/cl.h not found in SYCL

I have just started working on SYCL and ran ComputeCpp_info on my system and following data on 3 devices is showed ComputeCpp Info (CE 1.1.0) SYCL 1.2.1 revision 3 Device 1 ( GeForce GTX 1050 = NO ...
0
votes
1answer
44 views

How can I install opencl on Ubuntu for AMD Ryzen Mobile CPU

I have a notebook with an AMD Ryzen 5 2500u (with integrated radeon vega 8 mobile GPU) processor and use Ubuntu 18.04.2. I would like to run some OpenCL calculations with C++ on the CPU and GPU. My ...
1
vote
1answer
39 views

Value of parametric constant t in Plane and Ray intersection?

I am implementing ray and rectangle intersection test . For that first I test if ray is intersecting plane if it does then I see if it lies in bounds of rectangle. Following is the code: float ...
1
vote
1answer
46 views

Implementing custom atomic_add() which works with floats

I'm trying to follow the B.12 section of https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html for atomic add, which works with floats. Simply copying and pasting the code from there and ...
0
votes
0answers
54 views

atomicAdd doesn't perform float addition

I'm attempting to use atomicAdd() to sum the vector of floats, however, it appears to not perform any addition. I'm using GTX 970 GPU which has the compute ability of > 2.x which is what is required ...
3
votes
1answer
39 views

a specific OpenCL kernel performs differently on mobile and PC

I was trying to run an OpenCL kernel on both Adreno 630 and my laptop, it turns out that the kernel runs perfectly on mobile but crashes my laptop every single time. I am still trying to figure out ...
0
votes
1answer
27 views

Operate only on a subset of buffer in OpenCL kernel

Newbie to OpenCL here. I'm trying to convert a numerical method I've written to OpenCL for acceleration. I'm using the PyOpenCL package as I've written this once in Python already and as far as I can ...
-1
votes
2answers
35 views

OpenCL vector data type usage

I'm using a GPU driver that is optimized to work with 16-element vector data type. However, I'm not sure how to use it properly. Should I declare it as, for example, cl_float16 on host with a size 16 ...
-3
votes
0answers
29 views

Visual profiler for opencl on NVIDIA GPU

I am working with OpenCL on NVIDIA GPU and I cannot find any profiler for my programs. What I have find already is using the nvvp nvidia visual profiler but it is not working yet, as the command line ...
0
votes
0answers
39 views

Use CUDA without CUDA enabled GPU - ROCm or OpenCL

I'm doing academic robotics research, so we need to integrate several libraries in the field of vision, sensing, actuators. There's a huge problem when trying to use libraries that solve problems and ...
7
votes
2answers
121 views

Cannot import PyOpenCL in Juypter Notebook

I'm running inside an anacoda environment with pyopencl installed: $> conda list | grep pyopencl pyopencl 2018.2.5 py37h9888f84_0 conda-forge And from that same ...
0
votes
1answer
57 views

GPU ARM Mali and OpenCL driver

I have my TinkerBoard powered by an ARM-based Mali™-T764 GPU. I am running Debian linaro v2.0.8 strech. I am looking for an OpenCL support, how can I enable the GPU MALI with OpenCL 1.2 FP? If you ...
0
votes
1answer
39 views

CPU and GPU memory sharing

If the (discrete) GPU has its own video RAM, I have to copy my data from RAM to VRAM to be able to use them. But if the GPU is integrated with the CPU (e.g. AMD Ryzen) and shares the memory, do I ...
0
votes
1answer
50 views

PyOpenCL performance issue on VideoCoreIV VC4CL (Raspberry Pi GPU)

i'm new in OpenCL/PyOpenCL and i'm trying to understand how OpenCL on Raspberry GPU (VideoCoreIV) compare to Numpy (on CPU) in vector and matrix multiplications on my hardware. I'm using VC4CL as ...
0
votes
1answer
36 views

How should I handle: Error C2039 'assign': is not a member of 'cl::string' in Visual Studio 2017?

I want to build a simple OpenCL code in Visual Studio C++ but there is an error during the build. The error is Error C2039 'assign': is not a member of 'cl::string' The issue is about cl::string. ...
1
vote
1answer
43 views

How to get assembly code for OpenCL kernel on AMD GPU with OSX

I am trying to view the assembly code of an OpenCL kernel that runs on the AMD GPU of my Mac. Based on this SO question, OpenCL online compilation: get assembly from cl::program or cl::kernel, I used ...
0
votes
1answer
73 views

openCL hello World display garbage output

i am trying a simple helloWorld openCL code, it compiles without errors but display garbage : ╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠ the error function detected an error while building so a ...
0
votes
0answers
21 views

How to calculate FLOPS(floating point operation per seconds) of a standard neural network

I have just implemented a standard neural network for the MNIST dataset in C++ using OpenCL and ran it in Nvidia GPU. I have computed the normal training time and error calculation time. but I need to ...
3
votes
2answers
54 views

OpenCL performance issue on GPU

I'm using OpenCL to optimize some code in Raspberry Pi GPU (Videocore IV). I'm using VC4CL implementation which offers a maximum work group size of 12. However, with simple kernels like summing two ...
1
vote
0answers
24 views

Strange behavior of PCIe throughput when used in OpenCL

Tested the throughput of PCIe with OpenCL and I am getting strange results. I am using PCIe 3, x16. DATA_SIZE = 2097152 (int) Data size bytes = 8.388608 MB The code is following: struct timeval ...
1
vote
1answer
42 views

Best way to debug OpenCL Kernel

I have following openCL kernel I want to debug. I have put some printf in it but those are not useful as work items are schedules randomly and values printed are not always right. How I can make my ...
0
votes
1answer
29 views

Barriers seems to synchronize only within a time window

currently I tried to implement the FDTD-Method to solve the Maxwell equations using OpenCL. The algorithm is pretty simple, calculate the current h-field from the old electric field and calculate the ...
0
votes
2answers
29 views

OpenCL POCL + asan or valgrind

I am trying to debug my OpenCL kernel. I think, the error is in wrong memory allocation. So, I'am looking for a way to detect it. Long story short, could I just run OpenCL kernel on POCL platform and ...
1
vote
1answer
41 views

Non-recursive random number generator

I have searched for pseudo-RNG algorithms but all I can find seem to generate the next number by using the previous result as seed. Is there a way to generate them non-recursively? The scenario where ...
0
votes
0answers
18 views

Trying to install de5a_net_i2 driver using 'aocl install' but failed

I am trying to install the driver for DE5-NET FPGA. I am using Intel FPGA SDK for OpenCL 16.0 on Ubuntu 16.04. aoc --list-boards gives the output de5a_net_e1 However after this step when I try to ...
2
votes
1answer
78 views

Correct way to write and call custom C functions of ArrayFire in Julia

I'm working in Julia and I need call some customize C functions that use ArraFire library, when I use a code like: void copy(const af::array &A, af::array &B,size_t length) { // 2....
0
votes
1answer
31 views

How to process string in opencl kernel from buffer of N fixed length strings?

I am required to process N fixed-length strings in parallel on an OpenCL device. Processing a string involves calling function that is provided, that takes a string as input represented as a buffer,...