你重新分配==和!=到isTRUE(all.equal())?

时间:2022-01-29 11:29:32

A previous post prompted me to post this question. It would seem like a best-practice to reassign == to isTRUE(all.equal()) ( and != to !isTRUE(all.equal()). I'm wondering if others do this in practice? I just realized that I use == and != to do numeric equality throughout my codebase. My first reaction was that I need to do a full-scrub and convert to all.equal. But in fact, everytime I use == and != I want to test equality (regardless of the datatype). In fact, I'm not sure what these operations would test for other than equality. I'm sure I'm missing some concept here. Can someone enlighten me? The only argument I see against this approach is that in some cases two non-identical numbers will appear to be identical because of the tolerance of all.equal. But we're told that two numbers that are in fact identical might not pass identical() because of how they are are stored in memory. So really what's the point of not defaulting to all.equal?

之前的帖子促使我发布这个问题。将==重新分配给isTRUE(all.equal())(和!= to!areTRUE(all.equal())似乎是一种最佳实践。我想知道其他人是否在实践中这样做了?我刚刚意识到我使用==和!=在我的代码库中做数字相等。我的第一反应是我需要做一个完全擦洗并转换为all.equal。但事实上,每次我使用==和!=我想要测试相等性(无论数据类型如何)。实际上,我不确定这些操作除了相等之外还会测试什么。我确定我在这里缺少一些概念。有人能启发我吗?我看到的唯一论点反对这种方法的是,在某些情况下,由于all.equal的容忍度,两个不相同的数字看起来是相同的。但我们被告知两个实际上相同的数字可能不会通过相同的()因为它们如何是存储在内存中的。所以真的有什么不是默认为all.equal?

1 个解决方案

#1


7  

As @joran alluded to, you'll run into floating point issues with == and != in pretty much any other language too. One important aspect of them in R is the vectorization part.

正如@joran所提到的那样,你会遇到几乎任何其他语言的==和!=浮点问题。 R中它们的一个重要方面是矢量化部分。

It would be much better to define a new function almostEqual, fuzzyEqual or similar. It is unfortunate that there is no such base function. all.equal isn't very efficient since it handles all kinds of objects and returns a string describing the difference when mostly you just want TRUE or FALSE.

定义一个新函数almostEqual,fuzzyEqual或类似函数会好得多。遗憾的是,没有这样的基础功能。 all.equal效率不高,因为它处理所有类型的对象,并返回描述差异的字符串,而大多数情况下你只需要TRUE或FALSE。

Here's an example of such a function. It's vectorized like ==.

这是一个这样的功能的例子。它的矢量化类似于==。

almostEqual <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  ifelse( mag > tolerance, diff/mag <= tolerance, diff <= tolerance)
}

almostEqual(1, c(1+1e-8, 1+2e-8)) # [1]  TRUE FALSE

...it is around 2x faster than all.equal for scalar values, and much faster with vectors.

...对于标量值,它比all.equal快2倍,而对于向量则快得多。

x <- 1
y <- 1+1e-8
system.time(for(i in 1:1e4) almostEqual(x, y)) # 0.44 seconds
system.time(for(i in 1:1e4) all.equal(x, y))   # 0.93 seconds

#1


7  

As @joran alluded to, you'll run into floating point issues with == and != in pretty much any other language too. One important aspect of them in R is the vectorization part.

正如@joran所提到的那样,你会遇到几乎任何其他语言的==和!=浮点问题。 R中它们的一个重要方面是矢量化部分。

It would be much better to define a new function almostEqual, fuzzyEqual or similar. It is unfortunate that there is no such base function. all.equal isn't very efficient since it handles all kinds of objects and returns a string describing the difference when mostly you just want TRUE or FALSE.

定义一个新函数almostEqual,fuzzyEqual或类似函数会好得多。遗憾的是,没有这样的基础功能。 all.equal效率不高,因为它处理所有类型的对象,并返回描述差异的字符串,而大多数情况下你只需要TRUE或FALSE。

Here's an example of such a function. It's vectorized like ==.

这是一个这样的功能的例子。它的矢量化类似于==。

almostEqual <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  ifelse( mag > tolerance, diff/mag <= tolerance, diff <= tolerance)
}

almostEqual(1, c(1+1e-8, 1+2e-8)) # [1]  TRUE FALSE

...it is around 2x faster than all.equal for scalar values, and much faster with vectors.

...对于标量值,它比all.equal快2倍,而对于向量则快得多。

x <- 1
y <- 1+1e-8
system.time(for(i in 1:1e4) almostEqual(x, y)) # 0.44 seconds
system.time(for(i in 1:1e4) all.equal(x, y))   # 0.93 seconds