I am using PyOpenCL to process images in Python and to send a 3D numpy array (height
x width
x 4
) to the kernel. I am having trouble indexing the 3D array inside the kernel code. For now I am only able to copy the whole input array to the output. The current code looks like this, where img
is the image with img.shape = (320, 512, 4)
:
我正在使用PyOpenCL来处理Python中的图像并将3D numpy数组(height x width x 4)发送到内核。我在内核代码中索引3D数组时遇到问题。现在我只能将整个输入数组复制到输出中。当前代码如下所示,其中img是img.shape =(320,512,4)的图像:
__kernel void part1(__global float* img, __global float* results)
{
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);
unsigned int z = get_global_id(2);
int index = x + 320*y + 320*512*z;
results[index] = img[index];
}
However, I do not quite understand how this work. For example, how do I index the Python equivalent of img[1, 2, 3]
inside this kernel? And further, which index should be used into results
for storing some item if I want it to be on the position results[1, 2, 3]
in the numpy array when I get the results back to Python?
但是,我不太明白这项工作如何。例如,如何在此内核中索引img [1,2,3]的Python等价物?而且,当我将结果返回到Python时,如果我希望它在numpy数组中的位置结果[1,2,3],那么应该将哪个索引用于存储某些项的结果?
To run this I am using this Python code:
要运行它,我使用这个Python代码:
import pyopencl as cl
import numpy as np
class OpenCL:
def __init__(self):
self.ctx = cl.create_some_context()
self.queue = cl.CommandQueue(self.ctx)
def loadProgram(self, filename):
f = open(filename, 'r')
fstr = "".join(f.readlines())
self.program = cl.Program(self.ctx, fstr).build()
def opencl_energy(self, img):
mf = cl.mem_flags
self.img = img.astype(np.float32)
self.img_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.img)
self.dest_buf = cl.Buffer(self.ctx, mf.WRITE_ONLY, self.img.nbytes)
self.program.part1(self.queue, self.img.shape, None, self.img_buf, self.dest_buf)
c = np.empty_like(self.img)
cl.enqueue_read_buffer(self.queue, self.dest_buf, c).wait()
return c
example = OpenCL()
example.loadProgram("get_energy.cl")
image = np.random.rand(320, 512, 4)
image = image.astype(np.float32)
results = example.opencl_energy(image)
print("All items are equal:", (results==image).all())
3 个解决方案
#1
1
Update: The OpenCL docs state (in 3.5), that
更新:OpenCL文档状态(在3.5中),即
"Memory objects are categorized into two types: buffer objects, and image objects. A buffer
object stores a one-dimensional collection of elements whereas an image object is used to store a
two- or three- dimensional texture, frame-buffer or image."
so, a buffer always is linear, or linearized as you can see from my sample below.
所以,缓冲区总是线性的,或线性化的,如下面的示例所示。
import pyopencl as cl
import numpy as np
h_a = np.arange(27).reshape((3,3,3)).astype(np.float32)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
d_a = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)
prg = cl.Program(ctx, """
__kernel void p(__global const float *d_a) {
printf("Array element is %f ",d_a[10]);
}
""").build()
prg.p(queue, (1,), None, d_a)
Gives me
"Array element is 10"
as output. So, the buffer actually is the linearized array. Nevertheless, the naive [x,y,z] approach known from numpy doesn't work that way. Using an 2 or 3-D Image instead of a buffer should work nevertheless.
作为输出。因此,缓冲区实际上是线性化阵列。然而,从numpy知道的天真的[x,y,z]方法并不是那样的。但是,使用2或3-D图像而不是缓冲区应该可以工作。
#2
0
Although this is not the opitimal solution, I linearized the array in Python and sent it as 1D. In kernel code I calculated x
, y
and z
from the linear index. When returned to Pyhon I reshaped it back to the original shape.
虽然这不是最优解决方案,但我在Python中对数组进行了线性化并将其作为1D发送。在内核代码中,我从线性索引计算了x,y和z。当我回到Pyhon时,我将它重新塑造成原来的形状。
#3
-1
I encountered the same problem. On https://lists.tiker.net/pipermail/pyopencl/2009-October/000134.html is a simple example how to use 3d arrays with PyOpenCL that worked for me. I quote the code here for future reference:
我遇到了同样的问题。在https://lists.tiker.net/pipermail/pyopencl/2009-October/000134.html上有一个简单的例子,说明如何使用适用于我的PyOpenCL的3d数组。我在这里引用代码以供将来参考:
import pyopencl as cl
import numpy
import numpy.linalg as la
sizeX=4
sizeY=2
sizeZ=5
a = numpy.random.rand(sizeX,sizeY,sizeZ).astype(numpy.float32)
ctx = cl.Context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)
prg = cl.Program(ctx, """
__kernel void sum(__global const float *a, __global float *b)
{
int x = get_global_id(0);
int y = get_global_id(1);
int z = get_global_id(2);
int idx = z * %d * %d + y * %d + x;
b[idx] = a[idx] * x + 3 * y + 5 * z;
}
""" % (sizeY, sizeX, sizeX) ).build()
prg.sum(queue, a.shape, a_buf, dest_buf)
cl.enqueue_read_buffer(queue, dest_buf, a).wait()
print a
#1
1
Update: The OpenCL docs state (in 3.5), that
更新:OpenCL文档状态(在3.5中),即
"Memory objects are categorized into two types: buffer objects, and image objects. A buffer
object stores a one-dimensional collection of elements whereas an image object is used to store a
two- or three- dimensional texture, frame-buffer or image."
so, a buffer always is linear, or linearized as you can see from my sample below.
所以,缓冲区总是线性的,或线性化的,如下面的示例所示。
import pyopencl as cl
import numpy as np
h_a = np.arange(27).reshape((3,3,3)).astype(np.float32)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
d_a = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)
prg = cl.Program(ctx, """
__kernel void p(__global const float *d_a) {
printf("Array element is %f ",d_a[10]);
}
""").build()
prg.p(queue, (1,), None, d_a)
Gives me
"Array element is 10"
as output. So, the buffer actually is the linearized array. Nevertheless, the naive [x,y,z] approach known from numpy doesn't work that way. Using an 2 or 3-D Image instead of a buffer should work nevertheless.
作为输出。因此,缓冲区实际上是线性化阵列。然而,从numpy知道的天真的[x,y,z]方法并不是那样的。但是,使用2或3-D图像而不是缓冲区应该可以工作。
#2
0
Although this is not the opitimal solution, I linearized the array in Python and sent it as 1D. In kernel code I calculated x
, y
and z
from the linear index. When returned to Pyhon I reshaped it back to the original shape.
虽然这不是最优解决方案,但我在Python中对数组进行了线性化并将其作为1D发送。在内核代码中,我从线性索引计算了x,y和z。当我回到Pyhon时,我将它重新塑造成原来的形状。
#3
-1
I encountered the same problem. On https://lists.tiker.net/pipermail/pyopencl/2009-October/000134.html is a simple example how to use 3d arrays with PyOpenCL that worked for me. I quote the code here for future reference:
我遇到了同样的问题。在https://lists.tiker.net/pipermail/pyopencl/2009-October/000134.html上有一个简单的例子,说明如何使用适用于我的PyOpenCL的3d数组。我在这里引用代码以供将来参考:
import pyopencl as cl
import numpy
import numpy.linalg as la
sizeX=4
sizeY=2
sizeZ=5
a = numpy.random.rand(sizeX,sizeY,sizeZ).astype(numpy.float32)
ctx = cl.Context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)
prg = cl.Program(ctx, """
__kernel void sum(__global const float *a, __global float *b)
{
int x = get_global_id(0);
int y = get_global_id(1);
int z = get_global_id(2);
int idx = z * %d * %d + y * %d + x;
b[idx] = a[idx] * x + 3 * y + 5 * z;
}
""" % (sizeY, sizeX, sizeX) ).build()
prg.sum(queue, a.shape, a_buf, dest_buf)
cl.enqueue_read_buffer(queue, dest_buf, a).wait()
print a