为什么第一个Haskell函数FAIL处理无限列表，而第二个片段SUCCEEDS带有无限列表？

I have two Haskell functions, both of which seem very similar to me. But the first one FAILS against infinite lists, and the second one SUCCEEDS against infinite lists. I have been trying for hours to nail down exactly why that is, but to no avail.

我有两个Haskell函数,这两个函数看起来与我非常相似。但是第一个FAILS反对无限列表,第二个是SUCCEEDS反对无限列表。我一直在努力确定原因,但无济于事。

Both snippets are a re-implementation of the "words" function in Prelude. Both work fine against finite lists.

两个片段都是Prelude中“单词”功能的重新实现。两者都可以对抗有限列表。

Here's the version that does NOT handle infinite lists:

这是不处理无限列表的版本:

myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
   where 
      step space ([]:xs)      | charIsSpace space = []:xs    
      step space (x:xs)       | charIsSpace space = []:x:xs
      step space []           | charIsSpace space = []
      step char (x:xs)                            = (char : x) : xs
      step char []                                = [[char]]

Here's the version that DOES handle infinite lists:

这是处理无限列表的版本:

myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
   where 
      step x result | not . charIsSpace $ x = [x:(head result)]++tail result
                    | otherwise             = []:result

Note: "charIsSpace" is merely a renaming of Char.isSpace.

注意:“charIsSpace”仅仅是Char.isSpace的重命名。

The following interpreter session illustrates that the first one fails against an infinite list while the second one succeeds.

以下解释器会话说明第一个失败列表无效列表,而第二个失败列表成功。

*Main> take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
*** Exception: stack overflow

*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]

EDIT: Thanks to the responses below, I believe I understand now. Here are my conclusions and the revised code:

编辑:感谢下面的回复,我相信我现在明白了。以下是我的结论和修订后的代码:

Conclusions:

The biggest culprit in my first attempt were the 2 equations that started with "step space []" and "step char []". Matching the second parameter of the step function against [] is a no-no, because it forces the whole 2nd arg to be evaluated (but with a caveat to be explained below).

我第一次尝试的最大罪魁祸首是以“step space []”和“step char []”开头的2个方程式。将步骤函数的第二个参数与[]匹配是一个禁忌,因为它强制对整个第二个arg进行求值(但需要注意以下解释)。

At one point, I had thought (++) might evaluate its right-hand argument later than cons would, somehow. So, I thought I might fix the problem by changing " = (char:x):xs" to "= [char : x] ++ xs". But that was incorrect.

有一次,我曾想过(++)可能会稍后评估它的右手论证,而不是以某种方式。所以,我想我可以通过将“=(char:x):xs”更改为“= [char:x] ++ xs”来解决问题。但这是不正确的。

At one point, I thought that pattern matching the second arg against (x:xs) would cause the function to fail against infinite lists. I was almost right about this, but not quite! Evaluating the second arg against (x:xs), as I do in a pattern match above, WILL cause some recursion. It will "turn the crank" until it hits a ":" (aka, "cons"). If that never happened, then my function would not succeed against an infinite list. However, in this particular case, everything is OK because my function will eventually encounter a space, at which point a "cons" will occur. And the evaluation triggered by matching against (x:xs) will stop right there, avoiding the infinite recursion. At that point, the "x" will be matched, but the xs will remain a thunk, so there's no problem. (Thanks to Ganesh for really helping me grasp that).

有一次,我认为将第二个arg与(x:xs)匹配的模式会导致函数对无限列表失败。我几乎是对的,但并不完全!正如我在上面的模式匹配中那样,对(x:xs)计算第二个arg会导致一些递归。它将“转动曲柄”直到它击中“:”(又名“cons”)。如果这种情况从未发生过,那么我的函数就无法在无限列表中成功。但是,在这种特殊情况下,一切都很好,因为我的函数最终会遇到一个空格,此时会出现“缺点”。通过匹配(x:xs)触发的评估将在那里停止,避免无限递归。此时,“x”将匹配,但xs仍然是一个thunk,所以没有问题。 (感谢Ganesh帮我理解这一点)。

In general, you can mention the second arg all you want, as long as you don't force evaluation of it. If you've matched against x:xs, then you can mention xs all you want, as long as you don't force evaluation of it.

一般来说,你可以提到你想要的第二个arg,只要你不强制评估它。如果你已经匹配x:xs,那么你可以提到所有你需要的xs,只要你不强制评估它。

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.

所以,这是修改后的代码。我通常试图避免头部和尾部,仅仅因为它们是部分功能,而且因为我需要练习编写相应的模式匹配。

myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

This correctly works against infinite lists. Note there's no head, tail or (++) operator in sight.

这正确地适用于无限列表。请注意,看不到头部,尾部或(++)操作员。

Now, for an important caveat: When I first wrote the corrected code, I did not have the 3rd equation, which matches against "step _ []". As a result, I received the warning about non-exhaustive pattern matches. Obviously, it is a good idea to avoid that warning.

现在,一个重要的警告:当我第一次写出更正的代码时,我没有第三个等式,它与“step _ []”相匹配。结果,我收到了关于非详尽模式匹配的警告。显然,避免这种警告是个好主意。

But I thought I was going to have a problem. I already mentioned above that it is not OK to pattern match the second arg against []. But I would have to do so in order to get rid of the warning.

但我以为我会遇到问题。我已经在上面提到过,将第二个arg与[]进行模式匹配是不行的。但我必须这样做才能摆脱警告。

However, when I added the "step _ []" equation, everything was fine! There was still no problem with infinite lists!. Why?

但是,当我添加“step _ []”等式时,一切都很好!无限列表仍然没有问题!为什么?

Because the 3rd equation in the corrected code IS NEVER REACHED!

因为修正后的代码中的第3个等式永远不会达到!

In fact, consider the following BROKEN version. It is EXACTLY the SAME as the correct code, except that I have moved the pattern for empty list up above the other patterns:

实际上,请考虑以下BROKEN版本。它完全是SAME作为正确的代码,除了我已经将空列表的模式移到了其他模式之上:

myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step _ []                              = error "this should be impossible"
      step space acc | charIsSpace space     = "":acc
      step char (x:xs)                       = (char:x):xs

We're back to stack overflow, because the first thing that happens when step is called is that the interpreter checks to see if equation number one is a match. To do so, it must see if the second arg is []. To do that, it must evaluate the second arg.

我们回到堆栈溢出,因为调用step时发生的第一件事是解释器检查方程式1是否匹配。为此,必须查看第二个arg是否为[]。要做到这一点,它必须评估第二个arg。

Moving the equation down BELOW the other equations ensures that the 3rd equation is never attempted, because either the first or the second pattern always matches. The 3rd equation is merely there to dispense with the non-exhaustive pattern warning.

将等式向下移动低于其他等式确保从不尝试第三个等式,因为第一个或第二个模式总是匹配。第三个等式仅仅是为了省去非详尽模式警告。

This has been a great learning experience. Thanks to everyone for your help.

这是一次很棒的学习经历。感谢大家的帮助。

4 个解决方案

#1

Others have pointed out the problem, which is that step always evaluates its second argument before producing any output at all, yet its second argument will ultimately depend on the result of another invocation of step when the foldr is applied to an infinite list.

其他人已经指出了这个问题,即步骤总是在产生任何输出之前总是评估它的第二个参数,但它的第二个参数最终将取决于当foldr应用于无限列表时另一个步骤调用的结果。

It doesn't have to be written this way, but your second version is kind of ugly because it relies on the initial argument to step having a particular format and it's quite hard to see that the head/tail will never go wrong. (I'm not even 100% certain that they won't!)

它不必以这种方式编写,但是你的第二个版本有点难看,因为它依赖于具有特定格式的步骤的初始参数,并且很难看出头/尾永远不会出错。 (我甚至不是100%肯定他们不会!)

What you should do is restructure the first version so it produces output without depending on the input list in at least some situations. In particular we can see that when the character is not a space, there's always at least one element in the output list. So delay the pattern-matching on the second argument until after producing that first element. The case where the character is a space will still be dependent on the list, but that's fine because the only way that case can infinitely recurse is if you pass in an infinite list of spaces, in which case not producing any output and going into a loop is the expected behaviour for words (what else could it do?)

您应该做的是重构第一个版本,以便在至少某些情况下产生输出而不依赖于输入列表。特别是我们可以看到,当字符不是空格时,输出列表中始终至少有一个元素。因此,延迟第二个参数上的模式匹配,直到产生第一个元素。字符是空格的情况仍然依赖于列表,但这很好,因为案例可以无限递归的唯一方法是传入无限的空格列表,在这种情况下不产生任何输出并进入循环是单词的预期行为(它还能做什么?)

#2

Try expanding the expression by hand:

尝试手动扩展表达式:

 take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
 take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
 take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
 take 5 (foldr step [] ("why " ++ cycle "why "))
 take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
 take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))

What's the next expansion? You should see that in order to pattern match for step, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.

下一次扩张是什么?你应该看到,为了模式匹配步骤,你需要知道它是否是空列表。为了找到它,你必须至少评估它。但是,第二个术语恰好是通过模式匹配的函数来减少倍数。换句话说,step函数无法在不调用自身的情况下查看其参数,因此您可以进行无限递归。

Contrast that with an expansion of your second function:

与你的第二个功能的扩展形成对比:

myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
    ['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
    ['w':(head result)] ++ tail result

You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.

您可能会看到此扩展将持续到达到空间。达到空格后,“头部结果”将获得一个值,您将产生答案的第一个元素。

I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?

我怀疑第二个函数将溢出无限字符串,不包含任何空格。你能明白为什么吗?

#3

The second version does not actually evaluate result until after it has started producing part of its own answer. The first version evaluates result immediately by pattern matching on it.

第二个版本在开始生成部分自己的答案后才会实际评估结果。第一个版本通过模式匹配立即评估结果。

The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.

这些无限列表的关键是你必须在开始要求列表元素之前产生一些东西,这样输出总是能够“保持领先”输入。

(I feel like this explanation is not very clear, but it's the best I can do.)

(我觉得这个解释不是很清楚,但这是我能做的最好的。)

#4

The library function foldr has this implementation (or similar):

库函数foldr具有此实现(或类似):

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k

The result of myWords_FailsOnInfiniteList depends on the result of foldr which depends on the result of step which depends on the result of the inner foldr which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList will use up an infinite amount of space and time before producing its first word.

myWords_FailsOnInfiniteList的结果取决于foldr的结果,它依赖于step的结果,它取决于内部foldr的结果,它取决于...等等无限列表,myWords_FailsOnInfiniteList将耗尽无限量的空间和时间在产生第一个字之前。

The step function in myWords_anotherReader does not require the result of the inner foldr until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...))).

myWords_anotherReader中的step函数在生成第一个单词的第一个字母之前不需要内部foldr的结果。不幸的是,正如Apocalisp所说,它在产生下一个单词之前使用O(第一个单词的长度)空间,因为当第一个单词被生成时,尾部thunk不断增长尾巴([...] ++ tail([。 ..] ++ tail(...)))。

In contrast, compare to

相比之下,与之相比

myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
    myWords' [] = []
    myWords' string =
        let (part1, part2) = break isSpace string
        in part1 : myWords part2

using library functions which may be defined as

使用可定义为的库函数

break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p

span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)

takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs

Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.

请注意,生成中间结果永远不会被未来计算所阻碍,并且只需要O(1)空间,因为结果的每个元素都可供使用。

Addendum

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.

所以,这是修改后的代码。我通常试图避免头部和尾部,仅仅因为它们是部分功能,而且因为我需要练习编写相应的模式匹配。
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)

(旁白:你可能不在乎,但是来自图书馆的单词“”== [],但你的myWords“”= [“”]。跟尾空格的类似问题。)

Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.

看起来比myWords_anotherReader有了很大改进,对于基于foldr的解决方案来说非常好。

\n -> tail $ myWords $ replicate n 'a' ++ " b"

It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.

它不可能比O(n)时间更好,但myWords_anotherReader和myWords都在这里占用O(n)空间。考虑到使用foldr,这可能是不可避免的。

Worse,

\n -> head $ head $ myWords $ replicate n 'a' ++ " b"

myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.

myWords_anotherReader是O(1),但新的myWords是O(n),因为模式匹配(x:xs)需要进一步的结果。

You can work around this with

你可以解决这个问题

myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
   where 
      step space acc | isSpace space = "":acc
      step char ~(x:xs)              = (char:x):xs

The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.

〜引入了“无可辩驳的模式”。无可辩驳的模式永远不会失败,也不会强制立即进行评估。

#1

#2

Try expanding the expression by hand:

尝试手动扩展表达式:

 take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
 take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
 take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
 take 5 (foldr step [] ("why " ++ cycle "why "))
 take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
 take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))

Contrast that with an expansion of your second function:

与你的第二个功能的扩展形成对比:

myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
    ['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
    ['w':(head result)] ++ tail result

您可能会看到此扩展将持续到达到空间。达到空格后,“头部结果”将获得一个值,您将产生答案的第一个元素。

I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?

我怀疑第二个函数将溢出无限字符串,不包含任何空格。你能明白为什么吗?

#3

The second version does not actually evaluate result until after it has started producing part of its own answer. The first version evaluates result immediately by pattern matching on it.

第二个版本在开始生成部分自己的答案后才会实际评估结果。第一个版本通过模式匹配立即评估结果。

The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.

这些无限列表的关键是你必须在开始要求列表元素之前产生一些东西,这样输出总是能够“保持领先”输入。

(I feel like this explanation is not very clear, but it's the best I can do.)

(我觉得这个解释不是很清楚,但这是我能做的最好的。)

#4

The library function foldr has this implementation (or similar):

库函数foldr具有此实现(或类似):

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k

In contrast, compare to

相比之下,与之相比

myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
    myWords' [] = []
    myWords' string =
        let (part1, part2) = break isSpace string
        in part1 : myWords part2

using library functions which may be defined as

使用可定义为的库函数

break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p

span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)

takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs

Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.

请注意,生成中间结果永远不会被未来计算所阻碍,并且只需要O(1)空间,因为结果的每个元素都可供使用。

Addendum

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.

所以,这是修改后的代码。我通常试图避免头部和尾部,仅仅因为它们是部分功能,而且因为我需要练习编写相应的模式匹配。
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)

(旁白:你可能不在乎,但是来自图书馆的单词“”== [],但你的myWords“”= [“”]。跟尾空格的类似问题。)

Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.

看起来比myWords_anotherReader有了很大改进,对于基于foldr的解决方案来说非常好。

\n -> tail $ myWords $ replicate n 'a' ++ " b"

It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.

它不可能比O(n)时间更好,但myWords_anotherReader和myWords都在这里占用O(n)空间。考虑到使用foldr,这可能是不可避免的。

Worse,

\n -> head $ head $ myWords $ replicate n 'a' ++ " b"

myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.

myWords_anotherReader是O(1),但新的myWords是O(n),因为模式匹配(x:xs)需要进一步的结果。

You can work around this with

你可以解决这个问题

myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
   where 
      step space acc | isSpace space = "":acc
      step char ~(x:xs)              = (char:x):xs

The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.

〜引入了“无可辩驳的模式”。无可辩驳的模式永远不会失败,也不会强制立即进行评估。

秒客网

为什么第一个Haskell函数FAIL处理无限列表，而第二个片段SUCCEEDS带有无限列表？

4 个解决方案

#1

#2

#3

#4

Addendum

#1

#2

#3

#4

Addendum

相关文章