Repa 2和3 api之间的关键区别是什么?

时间:2022-03-26 21:19:27

To be more specific, I have the following innocuous-looking little Repa 3 program:

更具体地说,我有下面这个看似无害的小Repa 3程序:

{-# LANGUAGE QuasiQuotes #-}

import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2

main = do
  [s] <- getArgs
  img <- runIL $ readImage s

  let out = output x where RGB x = img
  runIL . writeImage "out.bmp" . Grey =<< computeP out

output img = map cast . blur . blur $ blur grey
  where
    grey              = traverse img to2D luminance
    cast n            = floor n :: Word8
    to2D (Z:.i:.j:._) = Z:.i:.j

---------------------------------------------------------------

luminance f (Z:.i:.j)   = 0.21*r + 0.71*g + 0.07*b :: Float
  where
    (r,g,b) = rgb (fromIntegral . f) i j

blur = map (/ 9) . convolve kernel
  where
    kernel = [stencil2| 1 1 1
                        1 1 1
                        1 1 1 |]

convolve = mapStencil2 BoundClamp

rgb f i j = (r,g,b)
  where
    r = f $ Z:.i:.j:.0
    g = f $ Z:.i:.j:.1
    b = f $ Z:.i:.j:.2

Which takes this much time to process a 640x420 image on my 2Ghz core 2 duo laptop:

这花了很多时间在我的2Ghz core 2 duo笔记本电脑上处理640x420图像:

real    2m32.572s
user    4m57.324s
sys     0m1.870s

I know something must be quite wrong, because I have gotten much better performance on much more complex algorithms using Repa 2. Under that API, the big improvement I found came from adding a call to 'force' before every array transform (which I understand to mean every call to map, convolve, traverse etc). I cannot quite make out the analogous thing to do in Repa 3 - in fact I thought the new manifestation type parameters are supposed to ensure there is no ambiguity about when an array needs to be forced? And how does the new monadic interface fit into this scheme? I have read the nice tutorial by Don S, but there are some key gaps between the Repa 2 and 3 APIs that are little discussed online AFAIK.

我知道有些事情一定是错的,因为我在使用Repa 2的复杂算法上获得了更好的性能。在那个API下,我发现的最大的改进来自于在每个数组转换之前添加一个“force”的调用(我理解这意味着每一个调用映射、convolve、遍历等)。在Repa 3中,我不太可能做出类似的事情——事实上,我认为新的显化类型参数应该确保当一个数组需要被强制时,没有歧义。新的一元界面是如何适应这个计划的呢?我已经读过了Don’S的好教程,但是在Repa 2和3 api之间有一些关键的空白,这些api在在线的AFAIK上很少讨论。

More simply, is there a minimally impactful way to fix the above program's efficiency?

更简单地说,是否有一种最小有效的方法来解决上述程序的效率?

2 个解决方案

#1


10  

The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:

新的表示类型参数在需要时不会自动生效(这可能是一个很难的问题)——您仍然需要手动强制执行。在Repa 3中,这是通过computeP函数完成的:

computeP
  :: (Monad m, Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> m (Array r2 sh e)

I personally really don't understand why it's monadic, because you can just as well use Monad Identity:

我个人真的不明白为什么它是一元的,因为你也可以使用Monad Identity:

import Control.Monad.Identity (runIdentity)
force
  :: (Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP

So, now your output function can be rewritten with appropriate forcing:

所以,现在你的输出函数可以用适当的强迫来重写:

output img = map cast . f . blur . f . blur . f . blur . f $ grey
  where ...

with an abbreviation f using a helper function u to aid type inference:

使用辅助函数u来辅助类型推断的缩写f:

u :: Array U sh e -> Array U sh e
u = id
f = u . force

With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).

有了这些变化,加速效果非常显著——这是可以预料到的,因为没有中间强迫,每个输出像素最终会得到比必要的更多的评估(中间值不是共享的)。

Your original code:

你的原始代码:

real    0m25.339s
user    1m35.354s
sys     0m1.760s

With forcing:

要求:

real    0m0.130s
user    0m0.320s
sys     0m0.028s

Tested with a 600x400 png, the output files were identical.

用600x400 png测试,输出文件是相同的。

#2


7  

computeP is the new force.

computeP是新的力量。

In Repa 3 you need to use computeP everywhere you would have used force in Repa 2.

在Repa 3中,你需要在任何地方使用computeP,在Repa 2中使用force。

The Laplace example from repa-examples is similar to what you're doing. You should also use cmap instead of plain map in your blur function. There will be a paper explaining why on my homepage early next week.

repa示例中的拉普拉斯示例与您所做的类似。你也应该在模糊函数中使用cmap而不是普通地图。下周初我会在我的主页上解释原因。

#1


10  

The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:

新的表示类型参数在需要时不会自动生效(这可能是一个很难的问题)——您仍然需要手动强制执行。在Repa 3中,这是通过computeP函数完成的:

computeP
  :: (Monad m, Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> m (Array r2 sh e)

I personally really don't understand why it's monadic, because you can just as well use Monad Identity:

我个人真的不明白为什么它是一元的,因为你也可以使用Monad Identity:

import Control.Monad.Identity (runIdentity)
force
  :: (Repr r2 e, Fill r1 r2 sh e)
  => Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP

So, now your output function can be rewritten with appropriate forcing:

所以,现在你的输出函数可以用适当的强迫来重写:

output img = map cast . f . blur . f . blur . f . blur . f $ grey
  where ...

with an abbreviation f using a helper function u to aid type inference:

使用辅助函数u来辅助类型推断的缩写f:

u :: Array U sh e -> Array U sh e
u = id
f = u . force

With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).

有了这些变化,加速效果非常显著——这是可以预料到的,因为没有中间强迫,每个输出像素最终会得到比必要的更多的评估(中间值不是共享的)。

Your original code:

你的原始代码:

real    0m25.339s
user    1m35.354s
sys     0m1.760s

With forcing:

要求:

real    0m0.130s
user    0m0.320s
sys     0m0.028s

Tested with a 600x400 png, the output files were identical.

用600x400 png测试,输出文件是相同的。

#2


7  

computeP is the new force.

computeP是新的力量。

In Repa 3 you need to use computeP everywhere you would have used force in Repa 2.

在Repa 3中,你需要在任何地方使用computeP,在Repa 2中使用force。

The Laplace example from repa-examples is similar to what you're doing. You should also use cmap instead of plain map in your blur function. There will be a paper explaining why on my homepage early next week.

repa示例中的拉普拉斯示例与您所做的类似。你也应该在模糊函数中使用cmap而不是普通地图。下周初我会在我的主页上解释原因。