Login  Register

Re: Understanding float and double

Posted by Arnold on Feb 15, 2017; 6:40pm
URL: https://forum.jogamp.org/Understanding-float-and-double-tp4037633p4037636.html

Hi Wadewalker,

Thanks for your remark. It pointed me to an error I had overlooked: I had forgotten to replace the varfloat by double inside the Mandelbrot loop. When I did that all went well. But that meant implicitly that varfloat was replaced by float instead of double. So I wanted to see which branch would be followed by the #ifdef's. I adjusted the heading of the kernel as follows:

#ifdef DOUBLE_FP
    #ifdef AMD_FP
        #pragma OPENCL EXTENSION cl_amd_fp64 : enable
        #error cl_amd_fp64 detected
    #else
        #pragma OPENCL EXTENSION cl_khr_fp64 : enable
        #error cl_khr_fp64 detected
    #endif
    typedef double varfloat;
#else
    typedef float varfloat;
    #error no DOUBLE_FP detected
#endif

The compilation of the kernel crashes: no DOUBLE_FP detected (see the last lines below). That is strange because by explicitly declaring doubles, doubles are being used. I show you the compilation output of the program together with a list of all properties of the device for which the kernel was  built. In this example it is the GTX 1060, but I have the same results when I choose the Intel core i7.

***CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE]
   --- Properties of device GeForce GTX 1060 6GB
  CL_DEVICE_NAME = GeForce GTX 1060 6GB
  CL_DEVICE_TYPE = GPU
  CL_DEVICE_EXTENSIONS = [cl_khr_global_int32_base_atomics, cl_khr_fp64, cl_nv_compiler_options, cl_khr_byte_addressable_store, cl_nv_copy_opts, cl_khr_global_int32_extended_atomics, cl_khr_icd, cl_nv_pragma_unroll, cl_nv_d3d10_sharing, cl_nv_device_attribute_query, cl_khr_local_int32_extended_atomics, cl_nv_d3d11_sharing, cl_khr_gl_sharing, cl_khr_d3d10_sharing, cl_nv_d3d9_sharing, cl_khr_local_int32_base_atomics]
  CL_DEVICE_AVAILABLE = true
  CL_DEVICE_VERSION = OpenCL 1.2 CUDA
  CL_DRIVER_VERSION = 372.90
  CL_DEVICE_MAX_WORK_GROUP_SIZE = 1024
  CL_DEVICE_ENDIAN_LITTLE = true
  CL_DEVICE_VENDOR_ID = 4318
  CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.2
  CL_DEVICE_ADDRESS_BITS = 64
  CL_DEVICE_GLOBAL_MEM_SIZE = 6442450944
  CL_DEVICE_LOCAL_MEM_SIZE = 49152
  CL_DEVICE_HOST_UNIFIED_MEMORY = false
  CL_DEVICE_MAX_SAMPLERS = 32
  CL_DEVICE_HALF_FP_CONFIG = []
  CL_DEVICE_LOCAL_MEM_TYPE = LOCAL
  cl_khr_icd = true
  CL_DEVICE_MAX_COMPUTE_UNITS = 10
  CL_DEVICE_MAX_CLOCK_FREQUENCY = 1708
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 1
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 1
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_INT = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF = 0
  CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT = 1
  CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE = 1
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3
  CL_DEVICE_MAX_WORK_ITEM_SIZES = [1024, 1024, 64]
  CL_DEVICE_MAX_PARAMETER_SIZE = 4352
  CL_DEVICE_MAX_MEM_ALLOC_SIZE = 1610612736
  CL_DEVICE_MEM_BASE_ADDR_ALIGN = 4096
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 65536
  CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 128
  CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 163840
  CL_DEVICE_MAX_CONSTANT_ARGS = 9
  CL_DEVICE_IMAGE_SUPPORT = true
  CL_DEVICE_MAX_READ_IMAGE_ARGS = 256
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS = 16
  CL_DEVICE_IMAGE2D_MAX_WIDTH = 16384
  CL_DEVICE_IMAGE2D_MAX_HEIGHT = 32768
  CL_DEVICE_IMAGE3D_MAX_WIDTH = 16384
  CL_DEVICE_IMAGE3D_MAX_HEIGHT = 16384
  CL_DEVICE_IMAGE3D_MAX_DEPTH = 16384
  CL_DEVICE_PROFILING_TIMER_RESOLUTION = 1000
  CL_DEVICE_EXECUTION_CAPABILITIES = [EXEC_KERNEL]
  CL_DEVICE_SINGLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]
  CL_DEVICE_DOUBLE_FP_CONFIG = [DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]
  CL_DEVICE_GLOBAL_MEM_CACHE_TYPE = READ_WRITE
  CL_DEVICE_QUEUE_PROPERTIES = [OUT_OF_ORDER_MODE, PROFILING_MODE]
  CL_DEVICE_COMPILER_AVAILABLE = true
  CL_DEVICE_ERROR_CORRECTION_SUPPORT = false
  cl_khr_fp16 = false
  cl_khr_fp64 = true
  cl_khr_gl_sharing | cl_APPLE_gl_sharing = true
  CL_DEVICE_PROFILE = FULL_PROFILE
  CL_DEVICE_VENDOR = NVIDIA Corporation
created CLContext [id: 586574304, platform: NVIDIA CUDA, profile: FULL_PROFILE, devices: 1]
using CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE]
Exception in thread "JavaFX Application Thread" com.jogamp.opencl.CLException$CLBuildProgramFailureException:
CLDevice [id: 520350624 name: GeForce GTX 1060 6GB type: GPU profile: FULL_PROFILE] build log:
<kernel>:12:6: error: no DOUBLE_FP detected
    #error no DOUBLE_FP detected
     ^ [error: CL_BUILD_PROGRAM_FAILURE]