Administrator
|
Ah, after reading more closely I think I get it: you've got sample input data that you're using to train a neural net (copied to the device once and left there), and you've got the neuron weights (modified on the host, not sure whether modified on the card).
Performance still depends on whether you're on a shared-memory device or not, but usually you'd create the sample data buffers with CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR, and the weights buffers with CL_MEM_USE_HOST_PTR (assuming the host already created the sample data and weights buffers initially). This should minimize copying, even though on a non-shared-memory device there will still be unavoidable copying out to the device. Not sure if you can also use CL_MEM_READ_ONLY on the weights data, that depends on your algorithm and whether your kernels ever write to the weights.
|