To be more specific, I have the following innocuous-looking little Repa 3 program:
更具体地说,我有下面这个看似无害的小Repa 3程序:
{-# LANGUAGE QuasiQuotes #-}
import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2
main = do
[s] <- getArgs
img <- runIL $ readImage s
let out = output x where RGB x = img
runIL . writeImage "out.bmp" . Grey =<< computeP out
output img = map cast . blur . blur $ blur grey
where
grey = traverse img to2D luminance
cast n = floor n :: Word8
to2D (Z:.i:.j:._) = Z:.i:.j
---------------------------------------------------------------
luminance f (Z:.i:.j) = 0.21*r + 0.71*g + 0.07*b :: Float
where
(r,g,b) = rgb (fromIntegral . f) i j
blur = map (/ 9) . convolve kernel
where
kernel = [stencil2| 1 1 1
1 1 1
1 1 1 |]
convolve = mapStencil2 BoundClamp
rgb f i j = (r,g,b)
where
r = f $ Z:.i:.j:.0
g = f $ Z:.i:.j:.1
b = f $ Z:.i:.j:.2
Which takes this much time to process a 640x420 image on my 2Ghz core 2 duo laptop:
这花了很多时间在我的2Ghz core 2 duo笔记本电脑上处理640x420图像:
real 2m32.572s
user 4m57.324s
sys 0m1.870s
I know something must be quite wrong, because I have gotten much better performance on much more complex algorithms using Repa 2. Under that API, the big improvement I found came from adding a call to 'force' before every array transform (which I understand to mean every call to map, convolve, traverse etc). I cannot quite make out the analogous thing to do in Repa 3 - in fact I thought the new manifestation type parameters are supposed to ensure there is no ambiguity about when an array needs to be forced? And how does the new monadic interface fit into this scheme? I have read the nice tutorial by Don S, but there are some key gaps between the Repa 2 and 3 APIs that are little discussed online AFAIK.
我知道有些事情一定是错的,因为我在使用Repa 2的复杂算法上获得了更好的性能。在那个API下,我发现的最大的改进来自于在每个数组转换之前添加一个“force”的调用(我理解这意味着每一个调用映射、convolve、遍历等)。在Repa 3中,我不太可能做出类似的事情——事实上,我认为新的显化类型参数应该确保当一个数组需要被强制时,没有歧义。新的一元界面是如何适应这个计划的呢?我已经读过了Don’S的好教程,但是在Repa 2和3 api之间有一些关键的空白,这些api在在线的AFAIK上很少讨论。
More simply, is there a minimally impactful way to fix the above program's efficiency?
更简单地说,是否有一种最小有效的方法来解决上述程序的效率?
2 个解决方案
#1
10
The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:
新的表示类型参数在需要时不会自动生效(这可能是一个很难的问题)——您仍然需要手动强制执行。在Repa 3中,这是通过computeP函数完成的:
computeP
:: (Monad m, Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> m (Array r2 sh e)
I personally really don't understand why it's monadic, because you can just as well use Monad Identity:
我个人真的不明白为什么它是一元的,因为你也可以使用Monad Identity:
import Control.Monad.Identity (runIdentity)
force
:: (Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP
So, now your output
function can be rewritten with appropriate forcing:
所以,现在你的输出函数可以用适当的强迫来重写:
output img = map cast . f . blur . f . blur . f . blur . f $ grey
where ...
with an abbreviation f
using a helper function u
to aid type inference:
使用辅助函数u来辅助类型推断的缩写f:
u :: Array U sh e -> Array U sh e
u = id
f = u . force
With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).
有了这些变化,加速效果非常显著——这是可以预料到的,因为没有中间强迫,每个输出像素最终会得到比必要的更多的评估(中间值不是共享的)。
Your original code:
你的原始代码:
real 0m25.339s
user 1m35.354s
sys 0m1.760s
With forcing:
要求:
real 0m0.130s
user 0m0.320s
sys 0m0.028s
Tested with a 600x400 png, the output files were identical.
用600x400 png测试,输出文件是相同的。
#2
7
computeP
is the new force
.
computeP是新的力量。
In Repa 3 you need to use computeP
everywhere you would have used force
in Repa 2.
在Repa 3中,你需要在任何地方使用computeP,在Repa 2中使用force。
The Laplace example from repa-examples is similar to what you're doing. You should also use cmap
instead of plain map
in your blur
function. There will be a paper explaining why on my homepage early next week.
repa示例中的拉普拉斯示例与您所做的类似。你也应该在模糊函数中使用cmap而不是普通地图。下周初我会在我的主页上解释原因。
#1
10
The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:
新的表示类型参数在需要时不会自动生效(这可能是一个很难的问题)——您仍然需要手动强制执行。在Repa 3中,这是通过computeP函数完成的:
computeP
:: (Monad m, Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> m (Array r2 sh e)
I personally really don't understand why it's monadic, because you can just as well use Monad Identity:
我个人真的不明白为什么它是一元的,因为你也可以使用Monad Identity:
import Control.Monad.Identity (runIdentity)
force
:: (Repr r2 e, Fill r1 r2 sh e)
=> Array r1 sh e -> Array r2 sh e
force = runIdentity . computeP
So, now your output
function can be rewritten with appropriate forcing:
所以,现在你的输出函数可以用适当的强迫来重写:
output img = map cast . f . blur . f . blur . f . blur . f $ grey
where ...
with an abbreviation f
using a helper function u
to aid type inference:
使用辅助函数u来辅助类型推断的缩写f:
u :: Array U sh e -> Array U sh e
u = id
f = u . force
With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).
有了这些变化,加速效果非常显著——这是可以预料到的,因为没有中间强迫,每个输出像素最终会得到比必要的更多的评估(中间值不是共享的)。
Your original code:
你的原始代码:
real 0m25.339s
user 1m35.354s
sys 0m1.760s
With forcing:
要求:
real 0m0.130s
user 0m0.320s
sys 0m0.028s
Tested with a 600x400 png, the output files were identical.
用600x400 png测试,输出文件是相同的。
#2
7
computeP
is the new force
.
computeP是新的力量。
In Repa 3 you need to use computeP
everywhere you would have used force
in Repa 2.
在Repa 3中,你需要在任何地方使用computeP,在Repa 2中使用force。
The Laplace example from repa-examples is similar to what you're doing. You should also use cmap
instead of plain map
in your blur
function. There will be a paper explaining why on my homepage early next week.
repa示例中的拉普拉斯示例与您所做的类似。你也应该在模糊函数中使用cmap而不是普通地图。下周初我会在我的主页上解释原因。