Whenever I consider learning a new language -- haskell in this case -- I try to hack together a primitive grep clone to see how good the language implementation and/or its libraries are at text processing, because that's a major use case for me.
每当我考虑学习一种新语言 - 在这种情况下是haskell - 我会尝试将一个原始的grep克隆组合在一起,看看语言实现和/或它的库在文本处理方面有多好,因为这对我来说是一个主要的用例。
Inspired by code on the haskell wiki, I came up with the following naive attempt:
受到haskell wiki代码的启发,我想出了以下天真的尝试:
{-# LANGUAGE FlexibleContexts, ExistentialQuantification #-}
import Text.Regex.PCRE
import System.Environment
io :: ([String] -> [String]) -> IO ()
io f = interact (unlines . f . lines)
regexBool :: forall r l .
(RegexMaker Regex CompOption ExecOption r,
RegexLike Regex l) =>
r -> l -> Bool
regexBool r l = l =~ r :: Bool
grep :: forall r l .
(RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) =>
r -> [l] -> [l]
grep r = filter (regexBool r)
main :: IO ()
main = do
argv <- getArgs
io $ grep $ argv !! 0
This appears to be doing what I want it to, but unfortunately, it's really slow -- about 10 times slower than a python script doing the same thing. I assume it's not the regex library that's at fault here, because it's calling into PCRE which should be plenty fast (switching to Text.Regex.Posix
slows things down quite a bit further). So it must be the String
implementation, which is instructive from a theoretical point of view but inefficient according to what I've read.
这似乎正在做我想要的,但不幸的是,它真的很慢 - 大约比执行相同的python脚本慢10倍。我认为这不是regex库在这里有问题,因为它调用PCRE应该足够快(切换到Text.Regex.Posix会使事情进一步减慢)。所以它必须是String实现,从理论的角度来看是有启发性的,但根据我所读的内容效率低下。
Is there an alternative to String
s in haskell that's both efficient and convenient (i.e. there's little or no friction when switching to using that instead of String
s) and that fully and correctly handles UTF-8-encoded Unicode, as well as other encodings without too much hassle if possible? Something that everybody uses when doing text processing in haskell but that I just don't know about because I'm a complete beginner?
haskell中的字符串是否有替代方法既高效又方便(即切换到使用它而不是字符串时几乎没有摩擦)并且完全正确地处理UTF-8编码的Unicode,以及其他编码而没有太多麻烦,如果可能的话?在haskell中进行文本处理时每个人都使用的东西,但我只是不知道因为我是一个完全的初学者?
1 个解决方案
#1
1
It's possible that the slow speed is caused by using the standard library's list type. I've often run into performance problems with it in the past.
使用标准库的列表类型可能导致速度慢。我过去经常遇到性能问题。
It would be a good idea to profile your executable, to see where it spends its time: Tools for analyzing performance of a Haskell program. Profiling Haskell programs is really easy (compile with a switch and execute your program with an added argument, and the report is written to a text file in the current working directory).
分析您的可执行文件,查看它花费时间的位置是一个好主意:用于分析Haskell程序性能的工具。分析Haskell程序非常简单(使用开关编译并使用添加的参数执行程序,并将报告写入当前工作目录中的文本文件)。
As a side note, I use exactly the same approach as you when learning a new language: create something that works. My experience doing this with Haskell is that I can easily gain an order of magnitude or two in performance by profiling and making relatively simple changes (usually a couple of lines).
作为旁注,在学习新语言时,我使用的方法与您完全相同:创建有效的方法。我使用Haskell的经验是,通过分析和进行相对简单的更改(通常是几行),我可以轻松地获得一个或两个数量级的性能。
#1
1
It's possible that the slow speed is caused by using the standard library's list type. I've often run into performance problems with it in the past.
使用标准库的列表类型可能导致速度慢。我过去经常遇到性能问题。
It would be a good idea to profile your executable, to see where it spends its time: Tools for analyzing performance of a Haskell program. Profiling Haskell programs is really easy (compile with a switch and execute your program with an added argument, and the report is written to a text file in the current working directory).
分析您的可执行文件,查看它花费时间的位置是一个好主意:用于分析Haskell程序性能的工具。分析Haskell程序非常简单(使用开关编译并使用添加的参数执行程序,并将报告写入当前工作目录中的文本文件)。
As a side note, I use exactly the same approach as you when learning a new language: create something that works. My experience doing this with Haskell is that I can easily gain an order of magnitude or two in performance by profiling and making relatively simple changes (usually a couple of lines).
作为旁注,在学习新语言时,我使用的方法与您完全相同:创建有效的方法。我使用Haskell的经验是,通过分析和进行相对简单的更改(通常是几行),我可以轻松地获得一个或两个数量级的性能。