在Haskell函数定义中应用DRY的指南

时间:2021-01-25 17:02:50

I have a question about whether or not a specific way of applying of the DRY principle is considered a good practice in Haskell.I'm going to present an example, and then ask whether the approach I'm taking is considered good Haskell style. In a nutshell, the question is this: when you have a long formula, and then you find yourself needing to repeat some small subsets of that formula elsewhere, do you always put that repeated subset of the formula into a variable so you can stay DRY? Why or why not?

我有一个问题,关于应用DRY原则的具体方式是否被认为是Haskell中的一个好习惯。我将提出一个例子,然后问我所采用的方法是否被认为是好的Haskell风格。简而言之,问题是这样的:当你有一个很长的公式,然后你发现自己需要在其他地方重复那个公式的一些小子集时,你是否总是将公式的重复子集放入一个变量中,这样你就可以保持DRY ?为什么或者为什么不?

The Example: Imagine we're taking a string of digits, and converting that string into its corresponding Int value. (BTW, this is an exercise from "Real World Haskell").

示例:想象一下,我们正在取一串数字,并将该字符串转换为其对应的Int值。 (顺便说一句,这是来自“真实世界Haskell”的练习)。

Here's a solution that works except that it ignores edge cases:

这是一个有效的解决方案,除了忽略边缘情况:

asInt_fold string = fst (foldr helper (0,0) string)
  where
    helper char (sum,place) = (newValue, newPlace)
      where 
        newValue = (10 ^ place) * (digitToInt char) + sum
        newPlace = place + 1

It uses foldr, and the accumulator is a tuple of the next place value and the sum so far.

它使用foldr,累加器是下一个值的元组和到目前为止的总和。

So far so good. Now, when I went to implement the edge case checks, I found that I needed little portions of the "newValue" formula in different places to check for errors. For example, on my machine, there would be an Int overflow if the input was larger than (2^31 - 1), so the max value I could handle is 2,147,483,647. Therefore, I put in 2 checks:

到现在为止还挺好。现在,当我去实现边缘案例检查时,我发现在不同的地方我需要“newValue”公式的一小部分来检查错误。例如,在我的机器上,如果输入大于(2 ^ 31 - 1),则会出现Int溢出,因此我可以处理的最大值是2,147,483,647。因此,我进行了2次检查:

  1. If the place value 9 (the billions place) and the digit value is > 2, there's an error.
  2. 如果位置值9(数十亿位)和数字值> 2,则出现错误。

  3. If sum + (10 ^ place) * (digitToInt char) > maxInt, there's an error.
  4. 如果sum +(10 ^ place)*(digitToInt char)> maxInt,则出现错误。

Those 2 checks caused me to repeat part of the formula, so I introduced the following new variables:

那两个检查让我重复了部分公式,所以我介绍了以下新变量:

  • digitValue = digitToInt char
  • digitValue = digitToInt char

  • newPlaceComponent = (10^place) * digitValue
  • newPlaceComponent =(10 ^ place)* digitValue

The reason I introduced those variables is merely an automatic application of the DRY principle: I found myself repeating those portions of the formula, so I defined them once and only once.

我引入这些变量的原因仅仅是DRY原理的自动应用:我发现自己重复了公式的那些部分,所以我只定义了一次。

However, I wonder if this is considered good Haskell style. There are obvious advantages, but I see disadvantages as well. It definitely makes the code longer, whereas much of the Haskell code I've seen is pretty terse.

但是,我想知道这是否被认为是好的Haskell风格。有明显的优点,但我也看到了缺点。它肯定会使代码更长,而我见过的大部分Haskell代码都非常简洁。

So, do you consider this good Haskell style, and do you follow this practice, or not? Why / why not?

那么,你认为这个好的Haskell风格,你是否遵循这种做法?为什么/为什么不呢?

And for what it's worth, here's my final solution that deals with a number of edge cases and therefore has quite a large where block. You can see how large the block became due to my application of the DRY principle.

而对于它的价值,这是我的最终解决方案,它处理了许多边缘情况,因此具有相当大的where块。由于我应用DRY原理,您可以看到块的大小。

Thanks.

asInt_fold "" = error "You can't be giving me an empty string now"
asInt_fold "-" = error "I need a little more than just a dash"
asInt_fold string | isInfixOf "." string = error "I can't handle decimal points"
asInt_fold ('-':xs) = -1 * (asInt_fold xs) 
asInt_fold string = fst (foldr helper (0,0) string)
  where
    helper char (sum,place) | place == 9 && digitValue > 2 = throwMaxIntError
               | maxInt - sum < newPlaceComponent      = throwMaxIntError
                   | otherwise                             = (newValue, newPlace)
            where
              digitValue =  (digitToInt char)
              placeMultiplier = (10 ^ place)
              newPlaceComponent = placeMultiplier * digitValue
              newValue = newPlaceComponent + sum
              newPlace = place + 1
              maxInt = 2147483647
              throwMaxIntError = 
                        error "The value is larger than max, which is 2147483647"

3 个解决方案

#1


As noted by bdonlan, your algorithm could be cleaner---it's especially useful that the language itself detects overflow. As for your code itself and the style, I think the main tradeoff is that each new name imposes a small cognitive burden on the reader. When to name an intermediate result becomes a judgment call.

正如bdonlan所说,你的算法可能更清晰 - 语言本身检测溢出特别有用。至于你的代码本身和风格,我认为主要的权衡是每个新名称给读者带来了很小的认知负担。何时命名中间结果成为判断调用。

I personally would not have chosen to name placeMultiplier, as I think the intent of place ^ 10 is much clearer. And I would look for maxInt in the Prelude, as you run the risk of being terribly wrong if run on 64-bit hardware. Otherwise, the only thing I find objectionable in your code are the redundant parentheses. So what you have is an acceptable style.

我个人不会选择命名placeMultiplier,因为我认为地方^ 10的意图更清晰。我会在Prelude中寻找maxInt,因为如果在64位硬件上运行,你可能会遇到非常错误的风险。否则,我在代码中唯一令人反感的是多余的括号。所以你拥有的是一种可以接受的风格。

(My credentials: At this point I have written on the order of 10,000 to 20,000 lines of Haskell code, and I have read perhaps two or three times that. I also have ten times that much experience with the ML family of languages, which require the programmer to make similar decisions.)

(我的凭据:此时我已经写了10,000到20,000行Haskell代码的顺序,我读的可能是两到三次。我也有十倍于ML系列语言的经验,这需要程序员做出类似的决定。)

#2


DRY is just as good of a principle in Haskell as it is anywhere else :) A lot of the reason behind the terseness you speak of in haskell is that many idioms are lifted out into libraries, and that often those examples you look at have been considered very carefully to make them terse :)

DRY和Haskell中的原则一样好,就像在其他任何地方一样:)你在haskell中说到的简洁性背后的很多原因是许多习语被提升到库中,而那些你常常看到的例子都是仔细考虑让他们简洁:)

For example, here's an alternate way to implement your digit-to-string algorithm:

例如,这是实现数字到字符串算法的另一种方法:

asInt_fold ('-':n) = negate (asInt_fold n)
asInt_fold "" = error "Need some actual digits!"
asInt_fold str = foldl' step 0 str
    where
        step _ x
            | x < '0' || x > '9'
            = error "Bad character somewhere!"
        step sum dig =
            case sum * 10 + digitToInt dig of
                n | n < 0 -> error "Overflow!"
                n -> n

A few things to note:

有几点需要注意:

  1. We detect overflow when it happens, not by deciding arbitrary-ish limits on what digits we allow. This signifigantly simplifies the overflow detection logic - and makes it work on any integer type from Int8 to Integer [as long as overflow results in wraparound, doesn't occur, or results in an assertion from the addition operator itself]
  2. 我们在发生溢出时检测溢出,而不是通过决定我们允许的数字的任意限制。这显着简化了溢出检测逻辑 - 并使其适用于从Int8到Integer的任何整数类型[只要溢出导致环绕,不会发生,或导致加法运算符本身的断言]

  3. By using a different fold, we don't need two seperate states.
  4. 通过使用不同的折叠,我们不需要两个单独的状态。

  5. No repeating ourselves, even without going out of our way to lift things out - it falls naturally out of re-stating what we're trying to say.
  6. 不要重复自己,即使没有竭尽全力解决问题 - 它自然也不会重新说明我们想说的话。

Now, it's not always possible to just reword the algorithm and make the duplication go away, but it's always useful to take a step back and reconsider how you've been thinking about the problem :)

现在,并不总是可以重新编写算法并使复制消失,但是退一步并重新考虑你如何考虑问题总是有用的:)

#3


I think the way you've done it makes sense.

我认为你做到这一点的方式是有道理的。

You should certainly always break repeated computations out into separately defined values if avoiding repeated computation is important, but in this case that doesn't look necessary. Nevertheless, the broken out values have easy to understand names, so they make your code easier to follow. I don't think the fact that your code is a bit longer as a result is a bad thing.

如果避免重复计算很重要,你当然应该总是将重复计算分解为单独定义的值,但在这种情况下看起来并不必要。然而,破碎的值具有易于理解的名称,因此它们使您的代码更容易理解。我不认为你的代码有点长的事实是一件坏事。

BTW, instead hardcoding the maximum Int, you can use (maxBound :: Int) which avoids the risk of you making a mistake or another implementation with a different maximum Int breaking your code.

BTW,而不是硬编码最大的Int,你可以使用(maxBound :: Int),这可以避免你犯错误的风险或其他实现与不同的最大Int破坏你的代码。

#1


As noted by bdonlan, your algorithm could be cleaner---it's especially useful that the language itself detects overflow. As for your code itself and the style, I think the main tradeoff is that each new name imposes a small cognitive burden on the reader. When to name an intermediate result becomes a judgment call.

正如bdonlan所说,你的算法可能更清晰 - 语言本身检测溢出特别有用。至于你的代码本身和风格,我认为主要的权衡是每个新名称给读者带来了很小的认知负担。何时命名中间结果成为判断调用。

I personally would not have chosen to name placeMultiplier, as I think the intent of place ^ 10 is much clearer. And I would look for maxInt in the Prelude, as you run the risk of being terribly wrong if run on 64-bit hardware. Otherwise, the only thing I find objectionable in your code are the redundant parentheses. So what you have is an acceptable style.

我个人不会选择命名placeMultiplier,因为我认为地方^ 10的意图更清晰。我会在Prelude中寻找maxInt,因为如果在64位硬件上运行,你可能会遇到非常错误的风险。否则,我在代码中唯一令人反感的是多余的括号。所以你拥有的是一种可以接受的风格。

(My credentials: At this point I have written on the order of 10,000 to 20,000 lines of Haskell code, and I have read perhaps two or three times that. I also have ten times that much experience with the ML family of languages, which require the programmer to make similar decisions.)

(我的凭据:此时我已经写了10,000到20,000行Haskell代码的顺序,我读的可能是两到三次。我也有十倍于ML系列语言的经验,这需要程序员做出类似的决定。)

#2


DRY is just as good of a principle in Haskell as it is anywhere else :) A lot of the reason behind the terseness you speak of in haskell is that many idioms are lifted out into libraries, and that often those examples you look at have been considered very carefully to make them terse :)

DRY和Haskell中的原则一样好,就像在其他任何地方一样:)你在haskell中说到的简洁性背后的很多原因是许多习语被提升到库中,而那些你常常看到的例子都是仔细考虑让他们简洁:)

For example, here's an alternate way to implement your digit-to-string algorithm:

例如,这是实现数字到字符串算法的另一种方法:

asInt_fold ('-':n) = negate (asInt_fold n)
asInt_fold "" = error "Need some actual digits!"
asInt_fold str = foldl' step 0 str
    where
        step _ x
            | x < '0' || x > '9'
            = error "Bad character somewhere!"
        step sum dig =
            case sum * 10 + digitToInt dig of
                n | n < 0 -> error "Overflow!"
                n -> n

A few things to note:

有几点需要注意:

  1. We detect overflow when it happens, not by deciding arbitrary-ish limits on what digits we allow. This signifigantly simplifies the overflow detection logic - and makes it work on any integer type from Int8 to Integer [as long as overflow results in wraparound, doesn't occur, or results in an assertion from the addition operator itself]
  2. 我们在发生溢出时检测溢出,而不是通过决定我们允许的数字的任意限制。这显着简化了溢出检测逻辑 - 并使其适用于从Int8到Integer的任何整数类型[只要溢出导致环绕,不会发生,或导致加法运算符本身的断言]

  3. By using a different fold, we don't need two seperate states.
  4. 通过使用不同的折叠,我们不需要两个单独的状态。

  5. No repeating ourselves, even without going out of our way to lift things out - it falls naturally out of re-stating what we're trying to say.
  6. 不要重复自己,即使没有竭尽全力解决问题 - 它自然也不会重新说明我们想说的话。

Now, it's not always possible to just reword the algorithm and make the duplication go away, but it's always useful to take a step back and reconsider how you've been thinking about the problem :)

现在,并不总是可以重新编写算法并使复制消失,但是退一步并重新考虑你如何考虑问题总是有用的:)

#3


I think the way you've done it makes sense.

我认为你做到这一点的方式是有道理的。

You should certainly always break repeated computations out into separately defined values if avoiding repeated computation is important, but in this case that doesn't look necessary. Nevertheless, the broken out values have easy to understand names, so they make your code easier to follow. I don't think the fact that your code is a bit longer as a result is a bad thing.

如果避免重复计算很重要,你当然应该总是将重复计算分解为单独定义的值,但在这种情况下看起来并不必要。然而,破碎的值具有易于理解的名称,因此它们使您的代码更容易理解。我不认为你的代码有点长的事实是一件坏事。

BTW, instead hardcoding the maximum Int, you can use (maxBound :: Int) which avoids the risk of you making a mistake or another implementation with a different maximum Int breaking your code.

BTW,而不是硬编码最大的Int,你可以使用(maxBound :: Int),这可以避免你犯错误的风险或其他实现与不同的最大Int破坏你的代码。