tag:forum.jogamp.org,2006:forum-782173Nabble - jocl2024-03-29T03:24:27ZEverything related to JOCL, the Java Bindings to OpenCL - <a href="http://jocl.jogamp.org" target="_top" rel="nofollow" link="external">jocl.jogamp.org</a>tag:forum.jogamp.org,2006:post-4042433NullPointerException from native method2023-04-11T14:34:39Z2023-04-11T14:34:39Zbananafish
To debug my JOCL-based app I switched to OCLGrind as my device to find possible issues. It typically works fine but I suddenly started getting this exception on invocations of putNDRangeKernel():
<br/><br/><pre>java.lang.NullPointerException
at com.jogamp.opencl.llb.impl.CLAbstractImpl.dispatch_clEnqueueNDRangeKernel0(Native Method)
at com.jogamp.opencl.llb.impl.CLAbstractImpl.clEnqueueNDRangeKernel(CLAbstractImpl.java:1346)
at com.jogamp.opencl.CLCommandQueue.putNDRangeKernel(CLCommandQueue.java:1656)
at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1523)
at com.jogamp.opencl.CLCommandQueue.put1DRangeKernel(CLCommandQueue.java:1493)</pre><br/>I suspect this has something to do with OCLGrind but the origin of the NullPointerException - apparently a deliberately-thrown java exception on the native side - something of which OCLGrind has no concept- is puzzling Here's one example of the offending code, with asserts to try to catch something amiss:
<br/><br/><pre>assert numWorkItems > 0;
assert queue.getID() > 0;
assert kernel.getID() > 0;
assert queue.getContext().getID() > 0;
assert queue.getContext().getCL() != null;
queue.put1DRangeKernel(kernel, 0, numWorkItems, 0);</pre><br/>My POM deps have:
<br/><pre><dependency>
<groupId>org.jocl</groupId>
<artifactId>jocl</artifactId>
<version>2.0.4</version>
</dependency>
<dependency>
<groupId>org.jogamp.gluegen</groupId>
<artifactId>gluegen-rt-main</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.jogamp.jocl</groupId>
<artifactId>jocl-main</artifactId>
<version>2.3.2</version>
</dependency></pre><br/>This exception is happening throughout the app at persistent but bafflingly-random selections of kernels. Switching back to Intel GPU as the device removes the issue and results in normal function.
<br/><br/>...any idea as to what is throwing this NPE?
tag:forum.jogamp.org,2006:post-4041731Unable to include header files2022-04-22T15:10:24Z2022-04-22T15:10:24Zammendes
Hi all,
<br/><br/>I've been using JOCL with success to GPU-accelerate my Java algorithms.
<br/>It seems I'm unable to include header files (e.g., #include <stdio.h>). So far I've gotten away with coding with just basic stuff but at this point I need to do some Fourier transforms and would like to use a library (FFTW).
<br/><br/>I use IntelliJ IDEA as an IDE and a Macbook Pro with Big Sur.
<br/><br/>Any pointers on how to include these libraries?
tag:forum.jogamp.org,2006:post-4041094Support for image types2021-04-08T15:20:27Z2021-04-08T15:33:49Zekjo014
I don't see a way to create several of the image types that are supported by OpenCL in jocl.
<br/>i.e. image1d_t, image1d_buffer_t, image1d_array_t, image2d_array_t
<br/>c.f. <a href="https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/otherDataTypes.html" target="_top" rel="nofollow" link="external">https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/otherDataTypes.html</a><br/><br/>I also don't see a way to use the low-level binding API to call clCreateImage with the right parameters since it doesn't seem to be exposed on CLImageBinding
<br/><a href="https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateImage.html" target="_top" rel="nofollow" link="external">https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateImage.html</a><br/><br/>It is in the header file though. So, not sure why its not available?
<br/><a href="https://github.com/JogAmp/jocl/blob/master/make/stub_includes/CL12/cl.h#L676" target="_top" rel="nofollow" link="external">https://github.com/JogAmp/jocl/blob/master/make/stub_includes/CL12/cl.h#L676</a><br/><br/>Is there a way to do this?
tag:forum.jogamp.org,2006:post-4041097GLCLInteroperabilityDemo: segmentation fault on Intel(R) UHD Graphics 6302021-04-09T07:03:15Z2021-04-09T07:03:15ZBibi
Using the demo, if the default platform is <i>NVIDIA CUDA</i>, it works fine.
<br/>If the platform is Intel(R) OpenCL, I get <i>Intel(R) UHD Graphics 630</i>.
<br/><br/>On <a href="https://github.com/JogAmp/jocl-demos/blob/master/src/com/jogamp/opencl/demos/joglinterop/GLCLInteroperabilityDemo.java#L186" target="_top" rel="nofollow" link="external">createFromGLBuffer</a> I get a segmentation fault.
<br/><br/>Any reason for that so that we can skip those devices?
<br/>A workaround?
<br/><br/>Note: the driver is up-to-date.
<br/><br/>Some characteristics:
<br/> - CL_DEVICE_NAME: Intel(R) UHD Graphics 630
<br/> - CL_DEVICE_VERSION: OpenCL 2.1 NEO
<br/> - CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
<br/> - CL_DRIVER_VERSION: 26.20.100.6911
<br/><br/>Memory size don't seem to be the issue.
<br/>- CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4294959104
<br/>- CL_DEVICE_GLOBAL_MEM_SIZE: 13681233920
<br/>- CL_DEVICE_LOCAL_MEM_SIZE: 65536
<br/>- CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 4294959104
<br/>- CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
<br/>- CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 524288
<br/>
tag:forum.jogamp.org,2006:post-4040767Writing range of CLBuffer to GPU2020-08-12T06:40:36Z2020-08-12T06:40:36Zbananafish
Writing a range of an array to an offset of a GPU-domain buffer in OpenCL isn't working out. In other words, I wish to read from my java array at a given offset and write it to the CLBuffer on the GPU at another specified offset and lengths.
<br/><br/>The original C API makes this possible by exposing control of the offset of the buffer object (GPU end) to begin writing, and explicit length as well. Unfortunately, the source for CLCommandQueue.putWriteBuffer() hard-codes the offset to zero, closing that option off.
<br/><br/>I tried using CLBuffer.getBuffer().position(start).limit(end) then invoking putWriteBuffer(...) but ended up with a segfault. I found that CLMemory.getNIOSize() is using Buffer.capacity(), and I believe this is resulting in an overflow.
<br/><br/>There was an older post on this forum suggesting that getNIOSize() use the limit() method instead for determining length and use that for the write length. I disagree with this: If position() is used to determine the host offset when calculating a pointer from the NIO buffer, position() + limit() will exceed capacity() and again result in overflow. I assert that the proper method to use is remaining() * getElementSize() when calculating the number of bytes to transfer, and perhaps position() * getElementSize() for calculating the offset on the GPU. Alternatively, a method could be written which takes an offset (in elements). This makes the offsets on Host and GPU independent.
<br/><br/>
tag:forum.jogamp.org,2006:post-4040656Is it possible to create a screenshot with JOCL?2020-05-28T09:30:26Z2020-05-28T09:30:26Zsblantipodi
As title.
<br/>I need to capture 30 frames per second at 4K resolution.
<br/><br/>AWT Robot class is not enough for this purpose, is it possible to use JOCL for this purpose?
<br/>Is there an API I can use to capture the screen?
<br/><br/>Thanks
tag:forum.jogamp.org,2006:post-4039311Test ... ignore2018-12-18T11:15:00Z2018-12-18T11:15:00Znyholku
I posted a question but I do not see it anywhere...so this is to test if I can post
tag:forum.jogamp.org,2006:post-4037984Troubles with running radix sort demo2017-05-13T11:55:01Z2017-05-13T11:55:01ZAndrew
Hi! I just faced a strange problem running radix sort demo. It compiles properly, without any warnings in CL kernels, but execution produces incorrect results. These results look like they've undergone data races or incorrect bit operations -- for example (for maxValue = 10), the program outputs the following:
<br/><br/> = = = workgroup size: 128 = = =
<br/>array size: 0.131072MB; elements: 32K
<br/>snapshot before sorting: 0, 3, 8, 4, 0, 5, 5, 8, 9, 3, 2, 2, 6, 2, 6, 2, 6, 0, 3, 9, ...; 32748 more
<br/>time: 6.600048ms
<br/>time: 5.044378ms
<br/>time: 4.78385ms
<br/>time: 4.060292ms
<br/>time: 3.898054ms
<br/>time: 3.924897ms
<br/>time: 3.750422ms
<br/>time: 5.213722ms
<br/>time: 4.366611ms
<br/>time: 4.253321ms
<br/>snapshot: 0, 0, 2, 4, 7, 0, 0, 13, 0, 0, 2, 4, 7, 0, 0, 13, 0, 0, 0, 0, ...; 32748 more
<br/>validating...
<br/>Exception in thread "main" java.lang.RuntimeException: not sorted 7 !> 0
<br/><br/>The set looks sorted only partially, like when data races occur. Furthermore, I presume that no numbers greater than 9 could appear in data set during normal execution. I tried -Werror and -cl-opt-disable for CL builds, but this doesnt help -- nor the problem disappears, neither the builds fall with errors. Could someone please help me with this issue?
<br/>
tag:forum.jogamp.org,2006:post-4037954Troubles with CL/GL interoperability2017-05-04T08:37:59Z2017-05-05T01:25:25ZAndrew
I am trying to write very basic program using CL/GL interop, but problem appears -- JVM crashes with access violation exception while creating CLGLContext. Here is the code:
<br/><br/>=====
<br/> public static void main(String[] args) {
<br/> JFrame window = new JFrame("test");
<br/> GLCanvas c = new GLCanvas();
<br/> c.addGLEventListener(new GLEventListener() {
<br/><br/> @Override
<br/> public void init(GLAutoDrawable drawable) {
<br/> GLContext context = drawable.getContext();
<br/> CLGLContext clglContext = CLGLContext.create(context, CLDevice.Type.GPU); // access violation here
<br/> System.out.println(clglContext);
<br/> }
<br/><br/> @Override
<br/> public void dispose(GLAutoDrawable drawable) {}
<br/> @Override
<br/> public void display(GLAutoDrawable drawable) {}
<br/> @Override
<br/> public void reshape(GLAutoDrawable drawable, int x, int y, int width, int height) {}
<br/> });
<br/> c.setSize(300, 300);
<br/> window.add(c);
<br/> window.setSize(300, 300);
<br/> window.pack();
<br/> window.setVisible(true);
<br/> }
<br/>=====
<br/><br/>Any ideas/suggestions how to fix that?
<br/><br/>UPD: Just realized that there is an official JogAmp GL/CL interop example, but it doesn't help much -- when I run it, the problem remains.
<br/>My platforms/GPU devices:
<br/>CLPlatform [name: Intel(R) OpenCL, vendor: Intel(R) Corporation, profile: FULL_PROFILE, version: OpenCL 1.2 ]
<br/> CLDevice [id: 468205296 name: Intel(R) HD Graphics 4000 type: GPU profile: FULL_PROFILE]
<br/>CLPlatform [name: AMD Accelerated Parallel Processing, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE, version: OpenCL 2.0 AMD-APP (2117.14)]
<br/> CLDevice [id: 468587040 name: Capeverde type: GPU profile: FULL_PROFILE]
<br/>AMD platform is the one where the error arises. Intel one works perfectly.
<br/><br/>UPD2: It seems like the cause is that Intel device is being selected as default device in JOGL, but in JOCL the default device is AMD device, and interoperability obviously isn't going to work. I can easily select which CL device to use, but cannot find a way to do it in JOGL (except for changing the default device globally for OS, but I'm not really comfortable with that).
<br/>The question now is: how can I select specific GL device to work with in JOGL?
tag:forum.jogamp.org,2006:post-4037674How to queue in a multidevice environment2017-02-25T07:34:19Z2017-02-25T07:34:19ZArnold
I am trying to run the HelloWorld example in a multidevice environment. As I understand it, I set up a context for the devices I want to run the kernels on, create a program from it, from that program I create the kernels and for each kernel a buffer is created. Next I enumerate all devices, create a subbuffer and a command queue and run the lot (see sample code below, questions are put in comments almost at the bottom).
<br/><br/>I have four questions
<br/>1. Is this understanding correct?
<br/>2. If so, how do I create a commandqueue without running it?
<br/>3. How do I start all commandqueues at the same moment?
<br/>4. How do I wait for the results?
<br/><br/>I apologize for all these questions, but the documentation about CLCommandQueue is somewhat scanty.
<br/><br/>Thanks in advance for your time and for all your patience so far :-)
<br/><br/>Code:
<br/><br/> // Collect all relevant devices in CLDevice [] devices;
<br/> context = CLContext.create (devices);
<br/> program = context.createProgram (MultiBench.class.getResourceAsStream ("VectorFunctions.cl"));
<br/> program.build ("", devices);
<br/> kernels = program.createCLKernels ();
<br/> for (Map.Entry<String, CLKernel> entry: kernels.entrySet ())
<br/> {
<br/> CLKernel kernel = entry.getValue ();
<br/> // Now it’s getting tricky, creatinmg buffers and subbuffers
<br/> // not sure whether this is correctly done
<br/><br/> int elementCount = 20000000;
<br/> int localWorkSize = min (devices [0].getMaxWorkGroupSize(), 256); // Local work size dimensions
<br/> int globalWorkSize = roundUp (localWorkSize, elementCount); // rounded up to the nearest multiple of the localWorkSize
<br/> int nDevices = devices.length;
<br/> int sliceSize = elementCount / nDevices;
<br/> int extra = elementCount - nDevices * sliceSize;
<br/> CLCommandQueue q [] = new CLCommandQueue [nDevices];
<br/>
<br/> CLSubBuffer<DoubleBuffer> [] CLSubArrayA = new CLSubBuffer [nDevices];
<br/> CLSubBuffer<DoubleBuffer> [] CLSubArrayB = new CLSubBuffer [nDevices];
<br/> CLSubBuffer<DoubleBuffer> [] CLSubArrayC = new CLSubBuffer [nDevices];
<br/><br/> // A, B are input buffers, C is for the result
<br/> CLBuffer<DoubleBuffer> clBufferA = context.createDoubleBuffer(globalWorkSize, READ_ONLY);
<br/> CLBuffer<DoubleBuffer> clBufferB = context.createDoubleBuffer(globalWorkSize, READ_ONLY);
<br/> CLBuffer<DoubleBuffer> clBufferC = context.createDoubleBuffer(globalWorkSize, WRITE_ONLY);
<br/>
<br/> for (int i = 0; i < nDevices; i++)
<br/> {
<br/> int size = sliceSize;
<br/> if (i == nDevices - 1) size += extra;
<br/> CLSubBuffer<DoubleBuffer> sbA = clBufferA.createSubBuffer (i * sliceSize, size, READ_ONLY);
<br/> CLSubArrayA [i] = sbA;
<br/> CLSubBuffer<DoubleBuffer> sbB = clBufferB.createSubBuffer (i * sliceSize, size, READ_ONLY);
<br/> CLSubArrayB [i] = sbB;
<br/> CLSubBuffer<DoubleBuffer> sbC = clBufferC.createSubBuffer (i * sliceSize, size, READ_ONLY);
<br/> CLSubArrayC [i] = sbC;
<br/> } // for
<br/><br/> kernel.putArgs(clBufferA, clBufferB, clBufferC).putArg(elementCount);
<br/>
<br/> for (int i = 0; i < nDevices; i++)
<br/> {
<br/> CLDevice device = devices [i];
<br/> q [i] = device.createCommandQueue ();
<br/> // asynchronous write of data to GPU device,
<br/> // followed by blocking read to get the computed results back.
<br/> q [i]
<br/> .putWriteBuffer (CLSubArrayA [i], false)
<br/> .putWriteBuffer (CLSubArrayB [i], false)
<br/> .put1DRangeKernel (kernel, 0, globalWorkSize,
<br/>localWorkSize)
<br/> .putReadBuffer (CLSubArrayC [i], true);
<br/>// I know the queing command above is not correct as you are not supposed
<br/>// to block while more enqueing commands follow. How to do this correctly?
<br/> } // for
<br/>// How do start all queues and how to wait for the results?
<br/><br/> } // for
<br/>
tag:forum.jogamp.org,2006:post-4037629Trouble running platform filter2017-02-13T13:51:21Z2017-02-13T13:51:21ZHenryS
Probably some very basic thing but I have trouble running the simple example on <a href="https://jogamp.org/deployment/jogamp-next/javadoc/jocl/javadoc/" target="_top" rel="nofollow" link="external">https://jogamp.org/deployment/jogamp-next/javadoc/jocl/javadoc/</a> to filter GPU-only platforms. The line
<br/><br/> CLPlatform platform = CLPlatform.getDefault(type(GPU));
<br/><br/>does not compile (GPU does not resolve to a variable). I tried several other variants but I get (other) compiler errors as well. I import the necessary packages (opencl.* and opencl.util.*). Any ideas what I am doing wrong?
tag:forum.jogamp.org,2006:post-4037633Understanding float and double2017-02-14T14:07:35Z2017-02-14T14:07:35ZArnold
I am trying to rebuild the Mandelbrot example. I now have a very simple program that creates a mandelbrot fractal. I compare it with a simple (serial) and parallel (threaded) implementation I use forbenchmark results. The results are really great: my serial implementation runs in 2229ms, parallel: 432 ms and openCL in 9ms. An improvement with almost a factor 50!
<br/><br/>To "dive" deep into a Mandelbrot fractal one needs doubles, else you reach too quickly the resolution of the floats. I noticed that the openCL solution used floats, while my NVidia GTX 1060 and the Intel core i7 920 both have a setting of cl_khr_fp64 = true. The Mandelbrot.cl kernel has a neat way of dealing with floats, making it dependent on the floating point setting. I set it explicitly to double but that did not help. I have listed the kernel below. In my java program I exclusive use double.
<br/><br/>Anyone any idea how to have the kernel using double variables?
<br/><br/>#ifdef DOUBLE_FP
<br/> #ifdef AMD_FP
<br/> #pragma OPENCL EXTENSION cl_amd_fp64 : enable
<br/> #else
<br/> #pragma OPENCL EXTENSION cl_khr_fp64 : enable
<br/> #endif
<br/> typedef double varfloat;
<br/>#else
<br/> typedef float varfloat;
<br/>#endif
<br/><br/>/**
<br/> * For a description of this algorithm please refer to
<br/> * <a href="http://en.wikipedia.org/wiki/Mandelbrot_set" target="_top" rel="nofollow" link="external">http://en.wikipedia.org/wiki/Mandelbrot_set</a><br/> * @author Michael Bien
<br/> */
<br/>kernel void Mandelbrot
<br/> (
<br/> const int width,
<br/> const int height,
<br/> const int maxIterations,
<br/> const double x0,
<br/> const double y0,
<br/> const double stepX,
<br/> const double stepY,
<br/> global int *output
<br/> )
<br/>{
<br/><br/> unsigned int ix = get_global_id (0);
<br/> unsigned int iy = get_global_id (1);
<br/><br/> double r = x0 + ix * stepX;
<br/> double i = y0 + iy * stepY;
<br/><br/> double x = 0;
<br/> double y = 0;
<br/><br/> double magnitudeSquared = 0;
<br/> int iteration = 0;
<br/><br/> while (magnitudeSquared < 4 && iteration < maxIterations)
<br/> {
<br/> varfloat x2 = x*x;
<br/> varfloat y2 = y*y;
<br/> y = 2 * x * y + i;
<br/> x = x2 - y2 + r;
<br/> magnitudeSquared = x2+y2;
<br/> iteration++;
<br/> }
<br/><br/> output [iy * width + ix] = iteration;
<br/>}
<br/>
tag:forum.jogamp.org,2006:post-4037603Enigmatic benchmark results2017-01-25T13:36:46Z2017-01-25T13:36:46ZArnold
Hi,
<br/><br/>I started programming in openCL. My system is a core i7-920 12 GB and an NVidia GTX 1060 with 6GB. I started experimenting with the HelloWorld example and have it run 3 kernels: a+v=c, a*b=c and a/b=-c. Each vector contains 20mln elements just to make the benchmark results meaningful.
<br/><br/>I got the following results (in ms, mileage may vary):
<br/><br/>+ 159 159
<br/>* 152 179
<br/>/ 156 153
<br/><br/>Well, there go my dreams of having built a small supercomputer, the graphics card is about as fast as the CPU. But it gets worse: I built a small benchmark subroutine that computes a/b=c 20 mln times and it does so in 78ms!
<br/><br/>Results are changing when I throw in trigoniometric functions like sin(a)/cos(b)=c (the benchmark takes 4 times as much computation time). Is this normal? I hope not because I find it really somewhat discouraging. Note that I only time the queue.put... function. I can post the code when desired but maybe one of you cracks knows the answer beforehand.
<br/><br/>One of the questions I have is about the number of processors. This is 8 for the i7-920 which is ok. But the GTX 1060 has just 10, while the specs tell me it has 1280 stream processors. Does anyone how that relates?
<br/><br/>Thanks very much for your time.
<br/>
tag:forum.jogamp.org,2006:post-4037592Cannot build jocl-demos from Eclipse2017-01-22T23:37:34Z2017-01-22T23:37:34ZArnold
I am new to JOCL and openCL in general. I downloaded the latest stable build and got the HelloWorld from Michael Bien running. Next I cloned the jocl-demos, changed the build path to the jogamp library jars and ran into a problem when trying to build it in Eclips:
<br/> Errors occurred during the build.
<br/> Errors running builder 'Integrated External Tool Builder' on project 'jocl-demos-master'.
<br/> The file does not exist for the external tool named jocl-demos builder.
<br/>
<br/>I disabled the external builder and enabled the Java builder and then I got:
<br/>JOGL> Hello JOAL
<br/>Exception in thread "main" java.lang.NoClassDefFoundError: com/jogamp/openal/JoalVersion
<br/> at jogamp.opengl.openal.av.ALDummyUsage.main(ALDummyUsage.java:14)
<br/>Caused by: java.lang.ClassNotFoundException: com.jogamp.openal.JoalVersion
<br/> at java.net.URLClassLoader.findClass(Unknown Source)
<br/> at java.lang.ClassLoader.loadClass(Unknown Source)
<br/> at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
<br/> at java.lang.ClassLoader.loadClass(Unknown Source)
<br/> ... 1 more
<br/><br/>Could anybody help me setting up the demos?
<br/><br/>Thanks very much for your time.
tag:forum.jogamp.org,2006:post-4037537Maven deploy scripts2016-12-19T08:51:10Z2016-12-19T08:51:10Zbacondave
Hi,
<br/><br/>First of all great project!
<br/><br/>Now my question:
<br/><br/>I've been looking for the "jocl-main.pom.sh" file or a way to install/deploy the maven artifacts.
<br/><br/>How would I do this?
<br/><br/>It looks to be working for the glugen project.
<br/><br/>Best regards
<br/>Dave
tag:forum.jogamp.org,2006:post-4037407CLMemory#getNIOSize should use buffer.limit() instead of buffer.capacity().2016-11-15T04:47:02Z2016-11-15T04:47:02ZEmily Leiviskä
User story:
<br/>We are dealing with video processing. We use native code to load images from streams which requires the image buffers to be aligned on 16 byte boundaries. The images are stored in direct NIO buffers and owned by our java program.
<br/><br/>To obtain aligned NIO buffers we allocate a slightly larger direct ByteBuffer, read out the NIO address using reflection set the position to be aligned, slice the buffer and set the limit to the desired size. This yields us a buffer that has slightly larger capacity than the limit, the limit is set to the desired size and the start address has the desired alignment in the native domain.
<br/><br/>We have a ring buffer of the recent frames as device buffers and need to transfer the read image data into the next available device buffer for processing without constantly freeing and allocating the device buffers. So we would like to reuse our CLResources and we would also like to avoid the any memcopies on the way.
<br/><br/>We allocate the device memory based on the image size and pixel mode and associate the aligned NIO buffer with the correct device CLBuffer and attempt to write to the device. This fails with CL_INVALID_VALUE because the capacity of the direct buffer is larger than the size of the device buffer.
<br/><br/>Suggestion:
<br/>I debugged the error and found the following code in CLMemory line 155:
<br/><br/> public int getNIOSize() {
<br/> if(buffer == null) {
<br/> return 0;
<br/> }
<br/> return getElementSize() * buffer.capacity();
<br/> }
<br/><br/>In the above I strongly believe that buffer.capacity() should be buffer.limit() as limit() more closely represents the users desire for the buffer dimension and how much to copy.
<br/><br/>Short, Self-Contained Correct Example:
<br/><br/>import java.nio.ByteBuffer;
<br/><br/>import com.jogamp.opencl.CLBuffer;
<br/>import com.jogamp.opencl.CLCommandQueue;
<br/>import com.jogamp.opencl.CLContext;
<br/>import com.jogamp.opencl.CLDevice;
<br/><br/>public class SSCCE {
<br/><br/> public static void main(String[] args) {
<br/> final CLContext context = CLContext.create();
<br/> final CLDevice device = context.getMaxFlopsDevice();
<br/> final CLCommandQueue queue = device.createCommandQueue();
<br/><br/> final int size = 30;
<br/> final int padding = 16; // Change to 0 and both work
<br/> final ByteBuffer directBuffer = ByteBuffer.allocateDirect(size + padding);
<br/> directBuffer.limit(size);
<br/><br/> // This is what we want to do, but it fails with CL_INVALID_VALUE
<br/> final CLBuffer<?> buffer = context.createBuffer(size).cloneWith(directBuffer);
<br/><br/> // This works but not what we want, it allocates a new device buffer every time.
<br/> // final CLBuffer<ByteBuffer> buffer = context.createBuffer(directBuffer);
<br/><br/> queue.putWriteBuffer(buffer, true);
<br/> }
<br/>}
<br/>
tag:forum.jogamp.org,2006:post-4036560CL_DEVICE_NOT_AVAILABLE when creating CLContext with 64-bit Java on GTX 9702016-03-30T11:54:39Z2016-03-30T11:54:39ZSuspiciousDroid
Heyo everyone!
<br/><br/>I'm having trouble running JOCL with 64-bit versions of Java on an nVidia GPU.
<br/>When I run the program with a 32-bit version (I tested jre1.6.0_45-32bit) everything works flawlessly, but when I switch to a 64-bit version (tested some versions of Java 7 and 8) I get an exception.
<br/><br/>Basically, the program fails at the very first line where JOCL is involved: clContext = CLContext.create(CLDevice.Type.GPU);
<br/>1st line of the stack trace: Exception in thread "main" com.jogamp.opencl.CLException$CLDeviceNotAvailableException: can not create CL context [error: CL_DEVICE_NOT_AVAILABLE]
<br/><br/>When I run the program on the CPU instead of the GPU, everything works perfectly with both 32-bit and 64-bit Java.
<br/><br/>Relevant system specs:
<br/>- Intel i7-2600k
<br/>- nVidia GTX 970
<br/>- Windows 10 64-bit
<br/>- Using JOCL 2.3.2
<br/><br/>At first I thought this was a driver problem, so I updated both the general nVidia driver to v364.72 (which includes its OpenCL drivers) and the Intel OpenCL driver to v15.1. Unfortunately, this didn't fix the problem.
<br/><br/>Then I got my brother who has a very similar computer (Win10-64bit, GTX 970, Java 8 64-bit) to run the program and he ran into the exact same problem, so it's not just my computer either.
<br/><br/>I even went so far as to re-code everything with both the JOCL from jocl.org and with JavaCL.
<br/>The results were similar, but not identical: Both worked with 32-bit Java and failed with 64-bit Java, but the error message was different. Both frameworks returned CL_OUT_OF_RESOURCES instead of CL_DEVICE_NOT_AVAILABLE.
<br/>I assume this is because these frameworks both call clCreateContext directly, while JOCL calls clCreateContextFromType.
<br/><br/>At this point, I don't know what is causing this issue and what I should troubleshoot next.
<br/>I'm considering nuking all my nVidia drivers because there might be old left-over junk and then reinstalling the drivers. I'm not sure this would help, though, as brother's computer, which has been set up just recently, encounters the same problem.
<br/><br/>Any idea what might be causing this issue or what I should be trying next?
tag:forum.jogamp.org,2006:post-4036753Multiple OpenCL contexts2016-05-28T00:22:42Z2016-05-28T00:22:42Zshaman
Hi,
<br/><br/>I have a problem regarding the following pull request to jME3 (<a href="https://github.com/jMonkeyEngine/jmonkeyengine/pull/494" target="_top" rel="nofollow" link="external">https://github.com/jMonkeyEngine/jmonkeyengine/pull/494</a>).
<br/>I want to create multiple OpenCL contexts at the same time.
<br/>I have experienced, however, that when a second context is created, the previous one is automatically released. This invalidates all associated programs, buffers and so on, leading to an INVALID_CONTEXT, INVALID_MEM_OBJECT, ... being thrown.
<br/><br/>What causes this problem? Is it intended (I do not hope so)? How can I work around it?
<br/>
tag:forum.jogamp.org,2006:post-4036615How to retain a CLBuffer possibly calling clRetainMemObject2016-04-18T08:41:06Z2016-04-18T08:41:06ZAndrew Bailey
Hi,
<br/><br/>I want to execute 2 kernels the second one processing a CLBuffer generated by the first, however it appears that using the high level jocl api that the buffers are being released automatically.
<br/><br/>There is a method CLBuffer.release() however retain() does not appear.
<br/><br/>I have looked for a way of calling the low level cl_int clRetainMemObject (cl_mem memobj)
<br/>using a class such as org.jocl.utils.Mems however there are only methods to create and release org.jocl.cl_mem.
<br/><br/>The only way I have managed to get the program to work so far is by copying the buffer to host memory and creating a new buffer and copying the data back, I would like to avoid this.
<br/><br/>Is there another way of implementing the desired funcionality?
<br/><br/>Is there any reason why retain() is not implemented in CLBuffer?
<br/><br/>Thanks in advance
tag:forum.jogamp.org,2006:post-4036591Access cl_kernel_preferred_work_group_size_multiple2016-04-11T13:24:35Z2016-04-11T13:24:35ZArToX
Hello everyone,
<br/><br/>I was wondering how to access the cl_kernel_preferred_work_group_size_multiple value through jocl.
<br/>That thread: <a href="http://forum.jogamp.org/How-to-get-quot-CL-KERNEL-PREFERRED-WORK-GROUP-SIZE-MULTIPLE-quot-td4033302.html" target="_top" rel="nofollow" link="external">http://forum.jogamp.org/How-to-get-quot-CL-KERNEL-PREFERRED-WORK-GROUP-SIZE-MULTIPLE-quot-td4033302.html</a> provide a good reading of the source, except that getWorkGroupInfo() method is private and therefore not available for that purpose.
<br/>And of course there's no such a get_preferred_work_group_multiple() method.
<br/><br/>Any reason to this? Any idea to bypass/fix this limitation?
<br/><br/>Thanks and best regards.
tag:forum.jogamp.org,2006:post-4036419GLCLInteroperabilityDemo and float42016-03-03T11:10:30Z2016-03-03T11:10:30Zpaolofuse
Hi all,
<br/>I'm running correctly GLCLInteroperabilityDemo. But I want to understand why it's using a float4 vertex and not a float3 vertex.
<br/>Why I cannot use this kernel:
<br/><br/>kernel void sineWave(global float3 * vertex, int size, float time) {
<br/><br/> unsigned int x = get_global_id(0);
<br/> unsigned int y = get_global_id(1);
<br/><br/> // calculate uv coordinates
<br/> float u = x / (float) size;
<br/> float v = y / (float) size;
<br/><br/> u = u*2.0f - 1.0f;
<br/> v = v*2.0f - 1.0f;
<br/><br/> // calculate simple sine wave pattern
<br/> float freq = 4.0f;
<br/> float w = sin(u*freq + time) * cos(v*freq + time) * 0.5f;
<br/><br/> // write output vertex
<br/> vertex[y*size + x] = (float3)(u*10.0f, w*10.0f, v*10.0f);
<br/>}
<br/><br/>and in display method:
<br/><br/>gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, glObjects[VERTICES]);
<br/>gl.glVertexPointer(3, GL2.GL_FLOAT, 0, 0);
<br/>gl.glColor3f(1, 1, 1);
<br/>gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
<br/>gl.glDrawArrays(GL2.GL_POINTS, 0, MESH_SIZE * MESH_SIZE);
<br/>gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
<br/>gl.glBindBuffer(GL2.GL_ARRAY_BUFFER, 0);
<br/><br/>This doesn't work. There's an explanation?
<br/>Thanks
tag:forum.jogamp.org,2006:post-4036002Allocating host memory2016-01-13T04:38:38Z2016-01-13T04:38:38ZEmily Leiviskä
I'm trying to allocated pinned memory (floats) on the host as per page 10 of <a href="https://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf" target="_top" rel="nofollow" link="external">https://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf</a> .
<br/><br/>So I want to get a direct FloatBuffer that is backed by memory allocated from clCreateBuffer with CL_MEM_ALLOC_HOST_PTR.
<br/><br/>I scoured through the source of CLContext and found that all createXBuffer calls eventually boil down to:
<br/><br/>(1)
<br/> public final CLBuffer<?> createBuffer(final int size, final int flags) {
<br/> final CLBuffer<?> buffer = CLBuffer.create(this, size, flags);
<br/> memoryObjects.add(buffer);
<br/> return buffer;
<br/> }
<br/><br/>or
<br/><br/>(2)
<br/> public final <B extends Buffer> CLBuffer<B> createBuffer(final B directBuffer, final int flags) {
<br/> final CLBuffer<B> buffer = CLBuffer.create(this, directBuffer, flags);
<br/> memoryObjects.add(buffer);
<br/> return buffer;
<br/> }
<br/><br/>My understanding is that (1) will get me a CLBuffer which has a null Buffer, so this is not what I want. And (2) will not work with a null argument for directBuffer as it calls CLBuffer.create:
<br/><br/>(3)
<br/> static <B extends Buffer> CLBuffer<B> create(final CLContext context, final B directBuffer, final int flags) {
<br/><br/> if(!directBuffer.isDirect()) <---------------------------------------------------- NullPointerException here
<br/> throw new IllegalArgumentException("buffer is not direct");
<br/><br/> B host_ptr = null;
<br/> if(isHostPointerFlag(flags)) {
<br/> host_ptr = directBuffer;
<br/> }
<br/><br/> final CLBufferBinding binding = context.getPlatform().getBufferBinding();
<br/> final int[] result = new int[1];
<br/> final int size = Buffers.sizeOfBufferElem(directBuffer) * directBuffer.capacity();
<br/> final long id = binding.clCreateBuffer(context.ID, flags, size, host_ptr, result, 0);
<br/> CLException.checkForError(result[0], "can not create cl buffer");
<br/><br/> return new CLBuffer<B>(context, directBuffer, size, id, flags);
<br/> }
<br/><br/>As a side note I don't think that (2) or (3) should accept Mem.ALLOCATE_BUFFER as the buffer passed is already allocated... or am I missing something?
<br/><br/>How would I go about getting a NIO buffer into a memory block allocated by clCreateBuffer with CL_MEM_ALLOC_HOST_PTR?
tag:forum.jogamp.org,2006:post-4035890Process database table (or complex data)?2015-12-09T06:37:17Z2015-12-09T06:37:17Zbojan
Hi!
<br/><br/>I am newbie to jocl. I was given a task to do some data processing on GPU, so jocl sounded like a good solution. What I need to do is to load a table from database (actually a few of them), do some processing and return a results (usually one, filtered, table).
<br/>In Java I load database table to List<List<Object>> ("List<Object>" is one row from database, it contains different data types (numbers, dates, strings,...) ).
<br/><br/>So my question if how to pass these input data to jocl script? Currently I didn't find a way to pass it as it is, so I am loading column by column, fill buffer with it and giving it to a script. It works ok for numbers, but I can do the same thing with string or dates (at least not yet). And to be honest - I don't like this approach - if I have 3 tables, each of them has 5 columns - it means 15 arguments in my script - not so nice. Any suggestion how to do this?
tag:forum.jogamp.org,2006:post-4035817Running JOCL through Matlab2015-11-18T23:26:47Z2015-11-18T23:26:47ZEric Barnhill
I have integrated JOCL into my medical imaging library -- works really great. The library is designed to be called from Matlab, and I get OpenCL errors when I call it from Matlab -- "can not enumerate platforms" which I gather means OpenCL is not found at all. All needed .jars are in the Matlab classpath. I realise Matlab doesn't support OpenCL but I was surprised that this would interfere with Java. Do you have any idea why the Matlab environment would disrupt calling JOCL and any suggestions for fixing it?
tag:forum.jogamp.org,2006:post-4033186OpenCL 1.2 and 2.0 support plans?2014-09-20T18:30:25Z2014-09-20T18:30:25Zchippies
Hi all,
<br/><br/>Looking through the documentation and source code, it seems like OpenCL 1.2 is not supported yet. From the milestones and Bugzilla it doesn't seem like OpenCL 2.0 support is planned. What is the timeline for 1.2 and 2.0 support in JOCL?
tag:forum.jogamp.org,2006:post-4035399HelloJOCL requesting "jogamp/natives/" folder2015-09-28T02:53:31Z2015-09-28T02:53:31ZEric Barnhill
HelloJOCL compiles but (prior to fix below) gives the runtime error:
<br/><br/> Can't load library: /home/[...]/jogamp/natives/linux-amd64//libgluegen-rt.so
<br/><br/>Not surprising, since I've followed the jogamp build instructions and this has only given me the folders: jogamp/gluegen, jogamp/jocl, jogamp/jogl .
<br/><br/>So I created the folder jogamp/natives/linux-amd64/ and copied in libgluegen-rt.so from /gluegen/build/obj/libgluegen-rt.so, and it runs fine.
<br/><br/>But I am worried I have set myself up for problems down the road. Was this the right thing to do?
<br/><br/>
tag:forum.jogamp.org,2006:post-4035340Bug 1075: Add support for OpenCL 1.2 and 2.02015-09-21T15:44:27Z2015-09-21T15:44:27ZSven Gothel
It seems Wade has made excellent progress
<br/>in enhancing JOCL:
<br/><br/><<a href="https://jogamp.org/bugzilla/show_bug.cgi?id=1075#c3" target="_top" rel="nofollow" link="external">https://jogamp.org/bugzilla/show_bug.cgi?id=1075#c3</a>>
<br/><<a href="https://jogamp.org/bugzilla/show_bug.cgi?id=1075#c4" target="_top" rel="nofollow" link="external">https://jogamp.org/bugzilla/show_bug.cgi?id=1075#c4</a>>
<br/><br/>One may like to join the discussion via bugzilla
<br/> <<a href="https://jogamp.org/bugzilla/show_bug.cgi?id=1075#7" target="_top" rel="nofollow" link="external">https://jogamp.org/bugzilla/show_bug.cgi?id=1075#7</a>>
<br/><br/> [1] Will the implementing classes CLImpl11, CLImpl12, CLImpl20,
<br/> being directly exposed to the user
<br/> or via public interfaces, i.e. CL11, CL12, CL20?
<br/> I prefer the latter, since it will be analog to [2].
<br/><br/> [2] Common CL profiles as JOGL's GL2ES2
<br/> being implemented by CLImpl11, CLImpl12, CLImpl20.
<br/> For example:
<br/> - CL11_20
<br/> - CL20_21
<br/> This will be analog to [1].
<br/><br/> [3] How to solve/maintain the higher level API
<br/> using the common CL profiles? -> [1] [2]
<br/><br/>Big KUDOS to Wade!
<br/><br/>Cheers, Sven
<br/><br/><!--start-attachments--><div class="small"><br/><img src="https://forum.jogamp.org/images/icon_attachment.gif" > <strong>signature.asc</strong> (828 bytes) <a href="https://forum.jogamp.org/attachment/4035340/0/signature.asc" target="_top" rel="nofollow" link="external">Download Attachment</a></div><!--end-attachments-->
tag:forum.jogamp.org,2006:post-4034847How to re-use global memory between kernel invocations2015-07-08T14:19:12Z2015-07-08T14:19:12Zdevmonkey
Hi,
<br/><br/>My usecase is neural network related.
<br/><br/>I need to copy a large amount of sample data to global memory, this data does not change between training cycles (which are kernel invocations) so can sit on the card all day/week. However the weights to the net are updated after every invocation (updated on the host) and therefore need to be copied back to the card on every kernel invocation.
<br/><br/>Can anyone suggest the correct approach to this or should I not be copying data at all but rather mapping memory from the card back to the host and writing to it?
<br/><br/>Thanks, Joe
tag:forum.jogamp.org,2006:post-4034813Dual GPU 7990 card not using both GPUs concurrently2015-07-03T08:29:54Z2015-07-03T08:29:54Zdevmonkey
Hi,
<br/><br/>I'm very new to OpenCL and JOCL.
<br/><br/>I'm using a dual gpu HD7990 on win7 64, Catalyst 14.12 and having some problems getting reliable execution across both gpus concurrently. If I run the JOCLMultiDeviceSample I can see it enqueue work to both gpus however the work fully executes on one gpu before the work on the second gpu commences.
<br/><br/><img src="https://forum.jogamp.org/file/n4034813/jocl.png" border="0"/><br/><br/>Note that since the sample runs over all devices I can also see the CPU work occurring concurrently to the GPU work.
<br/><br/>My own launcher and kernels behave the same way - If I schedule the same work to both gpus and use clWaitForEvents(...) to wait for both jobs to complete they execute serially taking 2x the time of one job. When I run a thread per device (with separate context) AND force the work on the first core to be scheduled before that on the second core then both jobs execute concurrently and both jobs complete in the same time as a single job on one gpu.
<br/><br/>Any pointers as to whether this is an issue with the dual card / driver or a synchronisation problem in the JOCL wrapper?
<br/><br/>Thanks, Joe
tag:forum.jogamp.org,2006:post-3729033Julia3d out of memory problem2012-02-09T03:21:51Z2012-02-09T03:21:51ZMartin
Hi,
<br/>Trying to run Julia3d throws the following exception:
<br/><br/>BUILD_SUCCESS
<br/>CLDevice [id: 161313944 name: GeForce 310M type: GPU profile: FULL_PROFILE] build log:
<br/> <empty><br/>Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Direct buffer memory
<br/> at java.nio.Bits.reserveMemory(Unknown Source)
<br/> at java.nio.DirectByteBuffer.<init>(Unknown Source)
<br/> at java.nio.ByteBuffer.allocateDirect(Unknown Source)
<br/> at com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:69)
<br/> at com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:111)
<br/> at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:318)
<br/> at com.jogamp.opencl.demos.julia3d.Julia3d.update(Julia3d.java:92)
<br/> at com.jogamp.opencl.demos.julia3d.Renderer.reshape(Renderer.java:188)
<br/><br/>I tried to use -Xmx1400m, and can't go further with my computer otherwise JVM can't start.
<br/><br/>Should this memory error be solved with a larger heap, and in that case which size, or am I going on the wrong track using -Xmx for solving this issue?
<br/><br/>Best regards,
<br/>Martin
tag:forum.jogamp.org,2006:post-4031306Multi-GPU processing inconsistent.2014-01-23T14:48:16Z2014-01-23T14:48:16ZThe.Scotsman
An algorithm used compares a list of objects A,B,C... to each other.
<br/>i.e. A-B, A-C, B-C, etc.
<br/><br/>The jogmap code uses:
<br/><br/>CLCommandQueuePool.invokeAll(List<CLTask>)
<br/><br/>...where each CLTask.execute() has (in simplified format):
<br/><br/>CLContext = CLSimpleQueueContext.getCLContext()
<br/>CLCommandQueue = CLSimpleQueueContext.getQueue()
<br/>CLBuffer = CLContext.createFloatBuffer(n, CLMemory.Mem.READ_WRITE)
<br/>CLBuffer = CLContext.createFloatBuffer(m, CLMemory.Mem.READ_ONLY)
<br/><br/>CLKernel = CLSimpleQueueContext.getKernel("myKernel")
<br/>CLCommandQueue.putWriteBuffer(CLBuffer, false);
<br/>CLCommandQueue.put2DRangeKernel(CLKernel, ...)
<br/>CLCommandQueue.putReadBuffer(n, true)
<br/><br/>I've run this on several different single GPU configurations (both AMD and NVidia), and the results are always correct.
<br/>However, when it's been run on multi-GPU configurations (both AMD and NVidia), the results are randomly incorrect.
<br/>(about 95% correct, 5% incorrect).
<br/><br/>There is nothing within the CLTask above that can "bleed through" to other CLTask's except possibly the CL calls.
<br/><br/>Obviously, at a given time, both an A-B and an A-C comparison could be performed at the same time on different devices.
<br/>So the question is: Am I doing this correctly?
<br/>Many thanks.
<br/>
tag:forum.jogamp.org,2006:post-4030974Hardcoded float not working?2013-12-19T10:53:46Z2013-12-19T10:53:46ZMiha
Hello,
<br/><br/>compiler somehow recognizes all hardcoded floats as integers.
<br/><br/>Starting from netbeans jocl template (<a href="http://plugins.netbeans.org/plugin/39980/netbeans-opencl-pack" target="_top" rel="nofollow" link="external">http://plugins.netbeans.org/plugin/39980/netbeans-opencl-pack</a>), example:
<br/><br/>***
<br/>kernel void fill(global float* a, const int size, const int value) {
<br/> int index = get_global_id(0);
<br/> if (index >= size) {
<br/> return;
<br/> }
<br/>
<br/> a[index] = 0.5f;
<br/>}
<br/>***
<br/><br/>From java:
<br/>System.out.println(value + "\t");
<br/><br/>I get all zeros. If I put a[index]=1/(float)2 it works.
<br/><br/>Also it works normally if I do not use jocl but C++ and g++ directly.
<br/><br/>Any suggestions? It is driving me crazy.
tag:forum.jogamp.org,2006:post-4034077crash in Java_com_jogamp_opencl_llb_impl_CLImpl_clBuildProgram02015-02-27T10:22:27Z2015-02-27T10:22:27ZBram
Hello,
<br/><br/>I am facing a jocl crash issue with clBuildProgram.
<br/>Important info : I am using <b>RHEL6 64b</b> and the latest nvidia driver for k20 (<b>340.65</b>)
<br/><br/>When I build a kernel using jocl, the jvm crashes.
<br/>I have recompiled jocl to investigate where this comes from and I'm a bit stuck now so I turn to you guys.
<br/>it seems that the issue comes from the clBuildProgram function call in the JNI file.
<br/><br/>Four interesting points :
<br/><br/>1/ Building the same kernel with a plain old c program does not crash (same parameters passed to the driver)
<br/>2/ Building the same kernel with a plain old c program using dlopen and dlsym just like jocl does not crash either.
<br/>3/ The crash occurs only on 340.65, I have been able to downgrade to an older version (304.xx) and jocl works just fine.
<br/>4/ 340.65 works fine on windows
<br/><br/>here is the stack before the crash :
<br/><br/>#172 0x00007f10fdfca5c7 in NvCliCompileProgram () from /usr/lib64/libnvidia-compiler.so.340.65
<br/>#173 0x00007f11271c20ac in ?? () from /usr/lib64/libnvidia-opencl.so.1
<br/>#174 0x00007f11271b64c5 in ?? () from /usr/lib64/libnvidia-opencl.so.1
<br/>#175 0x00007f112c3bb8c9 in Java_com_jogamp_opencl_llb_impl_CLImpl_clBuildProgram0 ()
<br/><br/>Any ideas on why this could happen ? Maybe it's a bug on the driver side, but if it's the case, I would like to be able to send them a reproduction case without jocl in the middle.
<br/><br/>Thanks for your help, any ideas are welcome.
<br/><br/>Bram
<br/>
tag:forum.jogamp.org,2006:post-2922911Passing array of arrays to OpenCL via JOCL?2011-05-10T05:47:21Z2011-05-10T05:47:21ZGiovanni Idili
Are you ready for yet another OpenCL/JOCL beginner question?
<br/><br/>Here we go: I am in need of passing down to a given kernel an array of arrays, frantically looking for examples but can't seem to find any.
<br/><br/>Can I do something like the following?
<br/><br/>CLBuffer<CLBuffer<FloatBuffer>> clBufferOfBuffers = new etc.
<br/><br/>Or am I better off just passing different buffers one for each of the items I need to pass for a given index?
<br/><br/>Any help/advice appreciated.
<br/><br/><br/><br/>
tag:forum.jogamp.org,2006:post-4033567Maximum size of CLBuffer<FloatBuffer>2014-11-17T18:11:52Z2014-11-17T18:11:52ZMartin
Hi all,
<br/><br/>here is my problem. When I run:
<br/><br/>context.createFloatBuffer(bufLength, Mem.WRITE_ONLY);
<br/><br/>I need to make sure that "bufLength" is in the range of [0, 2^29-1] otherwise the buffer will throw an error like this:
<br/><br/>java.lang.IllegalArgumentException: Negative capacity: -2147483648
<br/> at java.nio.Buffer.<init>(Buffer.java:191)
<br/> at java.nio.ByteBuffer.<init>(ByteBuffer.java:276)
<br/> at java.nio.ByteBuffer.<init>(ByteBuffer.java:284)
<br/> at java.nio.MappedByteBuffer.<init>(MappedByteBuffer.java:89)
<br/> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:119)
<br/> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
<br/> at com.jogamp.common.nio.Buffers.newDirectByteBuffer(Buffers.java:73)
<br/> at com.jogamp.common.nio.Buffers.newDirectFloatBuffer(Buffers.java:115)
<br/> at com.jogamp.opencl.CLContext.createFloatBuffer(CLContext.java:316)
<br/><br/>Apparently the float buffer will instantiate a byte buffer, i.e. the length for the byte buffer is now multiplied by 4.
<br/>I guess this results in an overflow of the integer array indexing because 2^29*4 = 2^31, which is already considered to be a negative number!
<br/>This effectively limits the CLMemory usage to < 2GB for a single float buffer. If float could be indexed directly, the range would be extended to < 8GB. We are working in the field of medical image processing where we are dealing with large volumes of up to 1024^3 = 2^30 float values just in case one might wonder why we would need such large linear data arrays.
<br/><br/>Is there a known solution to this problem? Or can you give me a hint on how to implement a workaround for such large arrays?
<br/><br/>Thanks
<br/>Martin