jogamp › jocl

Understanding float and double

Classic

List

Threaded

6 messages Options

Arnold

Understanding float and double

I am trying to rebuild the Mandelbrot example. I now have a very simple program that creates a mandelbrot fractal. I compare it with a simple (serial) and parallel (threaded) implementation I use forbenchmark results. The results are really great: my serial implementation runs in 2229ms, parallel: 432 ms and openCL in 9ms. An improvement with almost a factor 50!

To "dive" deep into a Mandelbrot fractal one needs doubles, else you reach too quickly the resolution of the floats. I noticed that the openCL solution used floats, while my NVidia GTX 1060 and the Intel core i7 920 both have a setting of cl_khr_fp64 = true. The Mandelbrot.cl kernel has a neat way of dealing with floats, making it dependent on the floating point setting. I set it explicitly to double but that did not help. I have listed the kernel below. In my java program I exclusive use double.

Anyone any idea how to have the kernel using double variables?

#ifdef DOUBLE_FP
#ifdef AMD_FP
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
typedef double varfloat;
#else
typedef float varfloat;
#endif

/**
* For a description of this algorithm please refer to
* http://en.wikipedia.org/wiki/Mandelbrot_set
* @author Michael Bien
*/
kernel void Mandelbrot
(
const int width,
const int height,
const int maxIterations,
const double x0,
const double y0,
const double stepX,
const double stepY,
global int *output
)
{

unsigned int ix = get_global_id (0);
unsigned int iy = get_global_id (1);

double r = x0 + ix * stepX;
double i = y0 + iy * stepY;

double x = 0;
double y = 0;

double magnitudeSquared = 0;
int iteration = 0;

while (magnitudeSquared < 4 && iteration < maxIterations)
{
varfloat x2 = x*x;
varfloat y2 = y*y;
y = 2 * x * y + i;
x = x2 - y2 + r;
magnitudeSquared = x2+y2;
iteration++;
}

output [iy * width + ix] = iteration;
}

Wade Walker

Re: Understanding float and double

Administrator

Hmm, not sure why that doesn't work. Are you really passing doubles into the kernel when you invoke it? If you were still passing floats as arguments, it would give the same results even with doubles used inside the kernel.

Arnold

Re: Understanding float and double

Hi Wadewalker,

Thanks for your remark. It pointed me to an error I had overlooked: I had forgotten to replace the varfloat by double inside the Mandelbrot loop. When I did that all went well. But that meant implicitly that varfloat was replaced by float instead of double. So I wanted to see which branch would be followed by the #ifdef's. I adjusted the heading of the kernel as follows:

#ifdef DOUBLE_FP
#ifdef AMD_FP
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#error cl_amd_fp64 detected
#else
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#error cl_khr_fp64 detected
#endif
typedef double varfloat;
#else
typedef float varfloat;
#error no DOUBLE_FP detected
#endif

The compilation of the kernel crashes: no DOUBLE_FP detected (see the last lines below). That is strange because by explicitly declaring doubles, doubles are being used. I show you the compilation output of the program together with a list of all properties of the device for which the kernel was built. In this example it is the GTX 1060, but I have the same results when I choose the Intel core i7.

***CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE]
--- Properties of device GeForce GTX 1060 6GB
CL_DEVICE_NAME = GeForce GTX 1060 6GB
CL_DEVICE_TYPE = GPU
CL_DEVICE_EXTENSIONS = [cl_khr_global_int32_base_atomics, cl_khr_fp64, cl_nv_compiler_options, cl_khr_byte_addressable_store, cl_nv_copy_opts, cl_khr_global_int32_extended_atomics, cl_khr_icd, cl_nv_pragma_unroll, cl_nv_d3d10_sharing, cl_nv_device_attribute_query, cl_khr_local_int32_extended_atomics, cl_nv_d3d11_sharing, cl_khr_gl_sharing, cl_khr_d3d10_sharing, cl_nv_d3d9_sharing, cl_khr_local_int32_base_atomics]
CL_DEVICE_AVAILABLE = true
CL_DEVICE_VERSION = OpenCL 1.2 CUDA
CL_DRIVER_VERSION = 372.90
CL_DEVICE_MAX_WORK_GROUP_SIZE = 1024
CL_DEVICE_ENDIAN_LITTLE = true
CL_DEVICE_VENDOR_ID = 4318
CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2
CL_DEVICE_ADDRESS_BITS = 64
CL_DEVICE_GLOBAL_MEM_SIZE = 6442450944
CL_DEVICE_LOCAL_MEM_SIZE = 49152
CL_DEVICE_HOST_UNIFIED_MEMORY = false
CL_DEVICE_MAX_SAMPLERS = 32
CL_DEVICE_HALF_FP_CONFIG = []
CL_DEVICE_LOCAL_MEM_TYPE = LOCAL
cl_khr_icd = true
CL_DEVICE_MAX_COMPUTE_UNITS = 10
CL_DEVICE_MAX_CLOCK_FREQUENCY = 1708
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF = 0
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT = 1
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE = 1
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
CL_DEVICE_MAX_WORK_ITEM_SIZES = [1024, 1024, 64]
CL_DEVICE_MAX_PARAMETER_SIZE = 4352
CL_DEVICE_MAX_MEM_ALLOC_SIZE = 1610612736
CL_DEVICE_MEM_BASE_ADDR_ALIGN = 4096
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 65536
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 128
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 163840
CL_DEVICE_MAX_CONSTANT_ARGS = 9
CL_DEVICE_IMAGE_SUPPORT = true
CL_DEVICE_MAX_READ_IMAGE_ARGS = 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS = 16
CL_DEVICE_IMAGE2D_MAX_WIDTH = 16384
CL_DEVICE_IMAGE2D_MAX_HEIGHT = 32768
CL_DEVICE_IMAGE3D_MAX_WIDTH = 16384
CL_DEVICE_IMAGE3D_MAX_HEIGHT = 16384
CL_DEVICE_IMAGE3D_MAX_DEPTH = 16384
CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1000
CL_DEVICE_EXECUTION_CAPABILITIES = [EXEC_KERNEL]
CL_DEVICE_SINGLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]
CL_DEVICE_DOUBLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE = READ_WRITE
CL_DEVICE_QUEUE_PROPERTIES = [OUT_OF_ORDER_MODE, PROFILING_MODE]
CL_DEVICE_COMPILER_AVAILABLE = true
CL_DEVICE_ERROR_CORRECTION_SUPPORT = false
cl_khr_fp16 = false
cl_khr_fp64 = true
cl_khr_gl_sharing | cl_APPLE_gl_sharing = true
CL_DEVICE_PROFILE = FULL_PROFILE
CL_DEVICE_VENDOR = NVIDIA Corporation
created CLContext [id: 586574304, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1]
using CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE]
Exception in thread "JavaFX Application Thread" com.jogamp.opencl.CLException$CLBuildProgramFailureException:
CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE] build log:
<kernel>:12:6: error: no DOUBLE_FP detected
#error no DOUBLE_FP detected
^ [error: CL_BUILD_PROGRAM_FAILURE]

Wade Walker

Re: Understanding float and double

Administrator

Are you setting "-D DOUBLE_FP" when you build the kernel? If not, it would show the behavior you describe.

Arnold

Re: Understanding float and double

Sorry for my late answer, I was just building a new (opencl enabled) computer. I had missed that part of kernel building. I added it and it works, sorry for having bothered on something that trivial and thanks for your patience!

Wade Walker

Re: Understanding float and double

Administrator

Hey, I'm just glad it works :)