金属内核在新的MacBook Pro(2016年末)GPU上运行不正常

时间:2023-01-01 02:42:54

I'm working on macOS project that uses Swift and Metal for image processing on the GPU. Last week, I received my new 15-inch MacBook Pro (late 2016) and noticed something strange with my code: kernels that were supposed to write to a texture did not seem to do so...

我正在研究使用Swift和Metal在GPU上进行图像处理的macOS项目。上周,我收到了我的新款15英寸MacBook Pro(2016年末)并注意到我的代码有些奇怪:应该写入纹理的内核似乎没有这样做......

After a lot of digging, I found that the problem is related to which GPU is used by Metal (AMD Radeon Pro 455 or Intel(R) HD Graphics 530) to do the computation.

经过大量挖掘后,我发现问题与Metal(AMD Radeon Pro 455或Intel(R)HD Graphics 530)使用哪个GPU进行计算有关。

Initializing the MTLDevice using MTLCopyAllDevices() returns an array of devices representing the Radeon and the Intel GPUs (while MTLCreateSystemDefaultDevice() returns the default device which is the Radeon). In any case, the code works as expected with the Intel GPU but that is not the case with the Radeon GPU.

使用MTLCopyAllDevices()初始化MTLDevice返回表示Radeon和Intel GPU的设备阵列(而MTLCreateSystemDefaultDevice()返回默认设备,即Radeon)。在任何情况下,代码都可以像英特尔GPU一样工作,但Radeon GPU并非如此。

Let me show you an example.

让我举个例子。

To start, here is a simple kernel that takes an input texture and copies its colour to an output texture:

首先,这是一个简单的内核,它接受输入纹理并将其颜色复制到输出纹理:

    kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]],
                            texture2d<uint, access::write> outTexture [[texture(1)]],
                            uint2 gid [[thread_position_in_grid]])
    {
        uint4 out = inTexture.read(gid);
        outTexture.write(out, gid);
    }

I order to use this kernel, I use this piece of code:

我命令使用这个内核,我使用这段代码:

    let devices = MTLCopyAllDevices()
    for device in devices {
        print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530"
    }

    let device = devices[0] 
    let library = device.newDefaultLibrary()
    let commandQueue = device.makeCommandQueue()

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough")

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!)

    let commandBuffer = commandQueue.makeCommandBuffer()
    let commandEncoder = commandBuffer.makeComputeCommandEncoder()

    commandEncoder.setComputePipelineState(cps)

    // Texture setup
    let width = 16
    let height = 16
    let byteCount = height*width*4
    let bytesPerRow = width*4
    let region = MTLRegionMake2D(0, 0, width, height)
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false)

    // inTexture
    var inData = [UInt8](repeating: 255, count: Int(byteCount))
    let inTexture = device.makeTexture(descriptor: textureDescriptor)
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow)

    // outTexture
    var outData = [UInt8](repeating: 128, count: Int(byteCount))
    let outTexture = device.makeTexture(descriptor: textureDescriptor)
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow)

    commandEncoder.setTexture(inTexture, at: 0)
    commandEncoder.setTexture(outTexture, at: 1)
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1))

    commandEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()

    // Get the data back from the GPU
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0)

    // Validation
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() {
        if outElement != inData[i] {
            print("Dest: \(outElement) != Src: \(inData[i]) at \(i))")
        }
    }

When running this code with let device = devices[0] (Radeon GPU), outTexture is never written to (my supposition) and as a result outData stays unchanged. On the other hand, when running this code with let device = devices[1] (Intel GPU), everything works as expected and outData is updated with the values in inData.

当使用let device = devices [0](Radeon GPU)运行此代码时,outTexture永远不会写入(我的假设),因此outData保持不变。另一方面,当使用let device = devices [1](Intel GPU)运行此代码时,一切都按预期工作,并使用inData中的值更新outData。

1 个解决方案

#1


8  

I think that whenever the GPU writes to a MTLStorageModeManaged resource such as a texture and you then want to read that resource from the CPU (e.g. using getBytes()), you need to synchronize it using a blit encoder. Try putting the following above the commandBuffer.commit() line:

我认为每当GPU写入MTLStorageModeManaged资源(如纹理)并且您想要从CPU读取该资源(例如使用getBytes())时,您需要使用blit编码器对其进行同步。尝试将以下命令放在commandBuffer.commit()行上面:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

You may get away without this on an integrated GPU because the GPU is using system memory for the resource and there's nothing to synchronize.

如果没有这个,你可能会在集成GPU上逃脱,因为GPU正在使用系统内存作为资源而没有任何东西可以同步。

#1


8  

I think that whenever the GPU writes to a MTLStorageModeManaged resource such as a texture and you then want to read that resource from the CPU (e.g. using getBytes()), you need to synchronize it using a blit encoder. Try putting the following above the commandBuffer.commit() line:

我认为每当GPU写入MTLStorageModeManaged资源(如纹理)并且您想要从CPU读取该资源(例如使用getBytes())时,您需要使用blit编码器对其进行同步。尝试将以下命令放在commandBuffer.commit()行上面:

let blitEncoder = commandBuffer.makeBlitCommandEncoder()
blitEncoder.synchronize(outTexture)
blitEncoder.endEncoding()

You may get away without this on an integrated GPU because the GPU is using system memory for the resource and there's nothing to synchronize.

如果没有这个,你可能会在集成GPU上逃脱,因为GPU正在使用系统内存作为资源而没有任何东西可以同步。