I am trying to rebuild the Mandelbrot example. I now have a very simple program that creates a mandelbrot fractal. I compare it with a simple (serial) and parallel (threaded) implementation I use forbenchmark results. The results are really great: my serial implementation runs in 2229ms, parallel: 432 ms and openCL in 9ms. An improvement with almost a factor 50!
To "dive" deep into a Mandelbrot fractal one needs doubles, else you reach too quickly the resolution of the floats. I noticed that the openCL solution used floats, while my NVidia GTX 1060 and the Intel core i7 920 both have a setting of cl_khr_fp64 = true. The Mandelbrot.cl kernel has a neat way of dealing with floats, making it dependent on the floating point setting. I set it explicitly to double but that did not help. I have listed the kernel below. In my java program I exclusive use double. Anyone any idea how to have the kernel using double variables? #ifdef DOUBLE_FP #ifdef AMD_FP #pragma OPENCL EXTENSION cl_amd_fp64 : enable #else #pragma OPENCL EXTENSION cl_khr_fp64 : enable #endif typedef double varfloat; #else typedef float varfloat; #endif /** * For a description of this algorithm please refer to * http://en.wikipedia.org/wiki/Mandelbrot_set * @author Michael Bien */ kernel void Mandelbrot ( const int width, const int height, const int maxIterations, const double x0, const double y0, const double stepX, const double stepY, global int *output ) { unsigned int ix = get_global_id (0); unsigned int iy = get_global_id (1); double r = x0 + ix * stepX; double i = y0 + iy * stepY; double x = 0; double y = 0; double magnitudeSquared = 0; int iteration = 0; while (magnitudeSquared < 4 && iteration < maxIterations) { varfloat x2 = x*x; varfloat y2 = y*y; y = 2 * x * y + i; x = x2 - y2 + r; magnitudeSquared = x2+y2; iteration++; } output [iy * width + ix] = iteration; } |
Administrator
|
Hmm, not sure why that doesn't work. Are you really passing doubles into the kernel when you invoke it? If you were still passing floats as arguments, it would give the same results even with doubles used inside the kernel.
|
Hi Wadewalker,
Thanks for your remark. It pointed me to an error I had overlooked: I had forgotten to replace the varfloat by double inside the Mandelbrot loop. When I did that all went well. But that meant implicitly that varfloat was replaced by float instead of double. So I wanted to see which branch would be followed by the #ifdef's. I adjusted the heading of the kernel as follows: #ifdef DOUBLE_FP #ifdef AMD_FP #pragma OPENCL EXTENSION cl_amd_fp64 : enable #error cl_amd_fp64 detected #else #pragma OPENCL EXTENSION cl_khr_fp64 : enable #error cl_khr_fp64 detected #endif typedef double varfloat; #else typedef float varfloat; #error no DOUBLE_FP detected #endif The compilation of the kernel crashes: no DOUBLE_FP detected (see the last lines below). That is strange because by explicitly declaring doubles, doubles are being used. I show you the compilation output of the program together with a list of all properties of the device for which the kernel was built. In this example it is the GTX 1060, but I have the same results when I choose the Intel core i7. ***CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE] --- Properties of device GeForce GTX 1060 6GB CL_DEVICE_NAME = GeForce GTX 1060 6GB CL_DEVICE_TYPE = GPU CL_DEVICE_EXTENSIONS = [cl_khr_global_int32_base_atomics, cl_khr_fp64, cl_nv_compiler_options, cl_khr_byte_addressable_store, cl_nv_copy_opts, cl_khr_global_int32_extended_atomics, cl_khr_icd, cl_nv_pragma_unroll, cl_nv_d3d10_sharing, cl_nv_device_attribute_query, cl_khr_local_int32_extended_atomics, cl_nv_d3d11_sharing, cl_khr_gl_sharing, cl_khr_d3d10_sharing, cl_nv_d3d9_sharing, cl_khr_local_int32_base_atomics] CL_DEVICE_AVAILABLE = true CL_DEVICE_VERSION = OpenCL 1.2 CUDA CL_DRIVER_VERSION = 372.90 CL_DEVICE_MAX_WORK_GROUP_SIZE = 1024 CL_DEVICE_ENDIAN_LITTLE = true CL_DEVICE_VENDOR_ID = 4318 CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2 CL_DEVICE_ADDRESS_BITS = 64 CL_DEVICE_GLOBAL_MEM_SIZE = 6442450944 CL_DEVICE_LOCAL_MEM_SIZE = 49152 CL_DEVICE_HOST_UNIFIED_MEMORY = false CL_DEVICE_MAX_SAMPLERS = 32 CL_DEVICE_HALF_FP_CONFIG = [] CL_DEVICE_LOCAL_MEM_TYPE = LOCAL cl_khr_icd = true CL_DEVICE_MAX_COMPUTE_UNITS = 10 CL_DEVICE_MAX_CLOCK_FREQUENCY = 1708 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_INT = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF = 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT = 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE = 1 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_MAX_WORK_ITEM_SIZES = [1024, 1024, 64] CL_DEVICE_MAX_PARAMETER_SIZE = 4352 CL_DEVICE_MAX_MEM_ALLOC_SIZE = 1610612736 CL_DEVICE_MEM_BASE_ADDR_ALIGN = 4096 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 65536 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 128 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 163840 CL_DEVICE_MAX_CONSTANT_ARGS = 9 CL_DEVICE_IMAGE_SUPPORT = true CL_DEVICE_MAX_READ_IMAGE_ARGS = 256 CL_DEVICE_MAX_WRITE_IMAGE_ARGS = 16 CL_DEVICE_IMAGE2D_MAX_WIDTH = 16384 CL_DEVICE_IMAGE2D_MAX_HEIGHT = 32768 CL_DEVICE_IMAGE3D_MAX_WIDTH = 16384 CL_DEVICE_IMAGE3D_MAX_HEIGHT = 16384 CL_DEVICE_IMAGE3D_MAX_DEPTH = 16384 CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1000 CL_DEVICE_EXECUTION_CAPABILITIES = [EXEC_KERNEL] CL_DEVICE_SINGLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA] CL_DEVICE_DOUBLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA] CL_DEVICE_GLOBAL_MEM_CACHE_TYPE = READ_WRITE CL_DEVICE_QUEUE_PROPERTIES = [OUT_OF_ORDER_MODE, PROFILING_MODE] CL_DEVICE_COMPILER_AVAILABLE = true CL_DEVICE_ERROR_CORRECTION_SUPPORT = false cl_khr_fp16 = false cl_khr_fp64 = true cl_khr_gl_sharing | cl_APPLE_gl_sharing = true CL_DEVICE_PROFILE = FULL_PROFILE CL_DEVICE_VENDOR = NVIDIA Corporation created CLContext [id: 586574304, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1] using CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE] Exception in thread "JavaFX Application Thread" com.jogamp.opencl.CLException$CLBuildProgramFailureException: CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE] build log: <kernel>:12:6: error: no DOUBLE_FP detected #error no DOUBLE_FP detected ^ [error: CL_BUILD_PROGRAM_FAILURE] |
Administrator
|
Are you setting "-D DOUBLE_FP" when you build the kernel? If not, it would show the behavior you describe.
|
Sorry for my late answer, I was just building a new (opencl enabled) computer. I had missed that part of kernel building. I added it and it works, sorry for having bothered on something that trivial and thanks for your patience!
|
Administrator
|
Hey, I'm just glad it works :)
|
Free forum by Nabble | Edit this page |