在R中，当真/假需要错误时，缺失值

I have got a column with different numbers (from 1 to tt) and would like to use looping to perform a count on the occurrence of these numbers in R.

我有一个列有不同的数字(从1到tt)，我想使用循环对这些数字在R中的出现进行计数。

count = matrix(ncol=1,nrow=tt) #creating an empty matrix
for (j in 1:tt)
{count[j] = 0} #initiate count at 0



for (j in 1:tt) 
 {
  for (i in 1:N) #for each observation (1 to N)
    {
        if (column[i] == j) 
           {count[j] = count[j] + 1 }

     }
  }

Unfortunately I keep getting this error.

不幸的是，我一直犯这个错误。

Error in if (column[i] == j) { : 
missing value where TRUE/FALSE needed

So I tried:

所以我试着:

for (i in 1:N) #from obs 1 to obs N
if (column[i] = 1) print("Test")

I basically got the same error.

我基本上得到了相同的错误。

Tried to do abit research on this kind of error and alot have to said about "debugging" which I'm not familiar with.

试着对这类错误做一些研究，不得不说很多我不熟悉的“调试”。

Hopefully someone can tell me what's happening here. Thanks!

希望有人能告诉我这里发生了什么。谢谢!

2 个解决方案

#1

As you progress with your learning of R, one feature you should be aware of is vectorisation. Many operations that (in C say) would have to be done in a loop, can be don all at once in R. This is particularly true when you have a vector/matrix/array and a scalar, and want to perform an operation between them.

当你学习R时，你应该意识到一个特征就是矢量化。很多操作(比如C中的)必须在循环中完成，在r中可以一次完成。当你有一个向量/矩阵/数组和一个标量，并且想要在它们之间执行一个操作时，这是特别正确的。

Say you want to add 2 to the vector myvector. The C/C++ way to do it in R would be to use a loop:

假设你想给向量加上2。在R中使用C/ c++的方法是使用一个循环:

for ( i in 1:length(myvector) )
    myvector[i] = myvector[i] + 2

Since R has vectorisation, you can do the addition without a loop at all, that is, add a scalar to a vector:

由于R是矢量化的，你完全可以不需要一个循环来做加法，也就是说，给一个矢量加上一个标量:

myvector = myvector + 2

Vectorisation means the loop is done internally. This is much more efficient than writing the loop within R itself! (If you've ever done any Matlab or python/numpy it's much the same in this sense).

矢量化意味着循环是在内部完成的。这比在R中编写循环要高效得多!(如果你曾经做过Matlab或python/numpy，这在这个意义上是一样的)。

I know you're new to R so this is a bit confusing but just keep in mind that often loops can be eliminated in R.

我知道你对R很陌生，所以这有点让人困惑，但是记住，在R中循环可以被消除。

With that in mind, let's look at your code:

记住这一点，让我们看看你的代码:

The initialisation of count to 0 can be done at creation, so the first loop is unnecessary.

在创建时可以执行count到0的初始化，因此第一个循环是不必要的。

count = matrix(0,ncol=1,nrow=tt)

Secondly, because of vectorisation, you can compare a vector to a scalar. So for your inner loop in i, instead of looping through column and doing if column[i]==j, you can do idx = (column==j). This returns a vector that is TRUE where column[i]==j and FALSE otherwise.

其次，由于矢量化，你可以将一个向量与一个标量进行比较。因此，对于i中的内部循环，而不是循环遍历列并执行if列[i]= j，可以执行idx = (column==j)它返回的向量在列[i]= j处为真，否则为假。

To find how many elements of column are equal to j, we just count how many TRUEs there are in idx. That is, we do sum(idx).

为了找出有多少列元素等于j，我们只需要计算idx中有多少个正电子。也就是和(idx)

So your double-loop can be rewritten like so:

所以你的双循环可以这样重写:

for ( j in 1:tt ) {
    idx = (column == j)
    count[j] = sum(idx) # no need to add
}

Now it's even possible to remove the outer loop in j by using the function sapply:

现在甚至可以使用函数sapply来删除j中的外环:

sapply( 1:tt, function(j) sum(column==j) )

The above line of code means: "for each j in 1:tt, return function(j)", an returns a vector where the j'th element is the result of the function.

上面一行代码的意思是:“对于每一个j:tt，返回函数(j)”，返回一个向量，其中第j个元素是函数的结果。

So in summary, you can reduce your entire code to:

综上所述，您可以将整个代码缩减为:

count = sapply( 1:tt, function(j) sum(column==j) )

(Although this doesn't explain your error, which I suspect is to do with the construction or class of your column).

(尽管这并不能解释你的错误，我怀疑这与你的专栏的结构或类别有关)。

#2

I suggest to not use for loops, but use the count function from the plyr package. This function does exactly what you want in one line of code.

我建议不要使用for循环，而是使用plyr包中的count函数。这个函数在一行代码中执行您想要的操作。

#1