Here's the smallest piece of code which displays how i am getting different results for class() when called directly for columns vs when called using apply.
这是最小的一段代码,显示了当使用apply直接调用列时,我如何获得class()的不同结果。
data.frame looks like this.
data.frame看起来像这样。
> df
A B C
1 rlm 4.047317e-03 0.0040111713
2 rlm -6.474359e-02 -0.0657461598
3 rlm 1.464302e-01 0.1451224214
4 rlm 3.508878e-01 0.3477540761
5 lm 2.701757e-01 0.2769367280
6 lm 2.580785e-03 0.0025815525
7 rlm 1.638077e-05 0.0000160895
> str(df)
'data.frame': 7 obs. of 3 variables:
$ A: chr "rlm" "rlm" "rlm" "rlm" ...
$ B: num 0.00405 -0.06474 0.14643 0.35089 0.27018 ...
$ C: num 0.00401 -0.06575 0.14512 0.34775 0.27694 ...
> class(df$A)
[1] "character"
> class(df$B)
[1] "numeric"
> apply(df, 2, class)
A B C
"character" "character" "character"
So, when called directly class of B is 'numeric', but when called using apply, it's saying 'character'.
因此,当直接调用B的类是'数字'时,但是当使用apply调用时,它会说'character'。
Am i missing anything here ?
我在这里遗漏了什么?
2 个解决方案
#1
1
Apply coerces data.frames to matrices before applying the function. Since in a matrix each element must have the same class you end up with a character matrix (since you can convert numeric to character without information loss but not the other way). The reason for this is probably that you can apply functions by-row as well, which would be messy with data.frames since your function would need to operate on a list.
在应用函数之前,将data.frames强制应用于矩阵。因为在矩阵中,每个元素必须具有相同的类,最后才能使用字符矩阵(因为您可以将数字转换为字符而不会丢失信息,但不能反过来)。原因可能是你也可以逐行应用函数,这会使data.frames变得混乱,因为你的函数需要在列表上运行。
For what you want check out the lapply and sapply functions, since data.frames are basically lists with each element of the list being one of the columns.
对于你想要的东西,检查lapply和sapply函数,因为data.frames基本上是列表,列表的每个元素都是列之一。
> x <- data.frame(a = "Entry", b = 5)
> sapply(x, class)
a b
"factor" "numeric"
#2
0
I get the same result. I think it might be the same behavior you see in this example:
我得到了相同的结果。我认为这可能与您在此示例中看到的行为相同:
number_m <- matrix(1:6)
mode(number_m) # "numeric"
number_m[2,1] <- "b"
mode(number_m) # "character"
number_m
converting one element of a matrix or vector to a character changes the data type of all the elements.
将矩阵或向量的一个元素转换为字符会改变所有元素的数据类型。
I get the correct result using a loop:
我使用循环得到了正确的结果:
df <- read.table(header=TRUE, text="
A B C
1 rlm 4.047317e-03 0.0040111713
2 rlm -6.474359e-02 -0.0657461598
3 rlm 1.464302e-01 0.1451224214
4 rlm 3.508878e-01 0.3477540761
5 lm 2.701757e-01 0.2769367280
6 lm 2.580785e-03 0.0025815525
7 rlm 1.638077e-05 0.0000160895")
sapply(1:3, function(i) class(df[,i]))
#1
1
Apply coerces data.frames to matrices before applying the function. Since in a matrix each element must have the same class you end up with a character matrix (since you can convert numeric to character without information loss but not the other way). The reason for this is probably that you can apply functions by-row as well, which would be messy with data.frames since your function would need to operate on a list.
在应用函数之前,将data.frames强制应用于矩阵。因为在矩阵中,每个元素必须具有相同的类,最后才能使用字符矩阵(因为您可以将数字转换为字符而不会丢失信息,但不能反过来)。原因可能是你也可以逐行应用函数,这会使data.frames变得混乱,因为你的函数需要在列表上运行。
For what you want check out the lapply and sapply functions, since data.frames are basically lists with each element of the list being one of the columns.
对于你想要的东西,检查lapply和sapply函数,因为data.frames基本上是列表,列表的每个元素都是列之一。
> x <- data.frame(a = "Entry", b = 5)
> sapply(x, class)
a b
"factor" "numeric"
#2
0
I get the same result. I think it might be the same behavior you see in this example:
我得到了相同的结果。我认为这可能与您在此示例中看到的行为相同:
number_m <- matrix(1:6)
mode(number_m) # "numeric"
number_m[2,1] <- "b"
mode(number_m) # "character"
number_m
converting one element of a matrix or vector to a character changes the data type of all the elements.
将矩阵或向量的一个元素转换为字符会改变所有元素的数据类型。
I get the correct result using a loop:
我使用循环得到了正确的结果:
df <- read.table(header=TRUE, text="
A B C
1 rlm 4.047317e-03 0.0040111713
2 rlm -6.474359e-02 -0.0657461598
3 rlm 1.464302e-01 0.1451224214
4 rlm 3.508878e-01 0.3477540761
5 lm 2.701757e-01 0.2769367280
6 lm 2.580785e-03 0.0025815525
7 rlm 1.638077e-05 0.0000160895")
sapply(1:3, function(i) class(df[,i]))