This is puzzling me. When you run summary() on a vector of integers you don't seem to get accurate results. The numbers seem to be rounded off. I tried this on three different machines with different OS's and the results are the same.
这令我感到困惑。当您对整数向量运行summary()时,您似乎无法获得准确的结果。这些数字似乎已圆满了。我在三台不同操作系统的机器上试过这个,结果是一样的。
For a vector:
对于矢量:
>a <- 0:628846
>str(a)
int [1:628847] 0 1 2 3 4 5 6 7 8 9 ...
>summary(a)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
>max(a)
[1] 628846
For a data.frame:
对于data.frame:
> b <- data.frame(b = 0:628846)
> str(b)
'data.frame': 628847 obs. of 1 variable:
$ b: int 0 1 2 3 4 5 6 7 8 9 ...
> summary(b)
b
Min. : 0
1st Qu.:157212
Median :314423
Mean :314423
3rd Qu.:471635
Max. :628846
> summary(b$b)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
Why are these results different?
为什么这些结果不同?
1 个解决方案
#1
1
The object a
is class integer
, b
is class data.frame
. A data frame
is a list
with certain properties and with class data.frame
(http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames). Many functions, including summary
, handle objects of different classes differently (see that you can use summary
on an object of class lm
and it gives you something completely different). If you want to apply the function summary
to every components in b
, you could use lapply
:
对象a是类整数,b是类data.frame。数据框是具有某些属性和类data.frame的列表(http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames)。许多函数(包括摘要)以不同方式处理不同类的对象(请参阅您可以在类lm的对象上使用摘要,它会为您提供完全不同的东西)。如果要将函数摘要应用于b中的每个组件,可以使用lapply:
> a <- 0:628846
> b <- data.frame(b = 0:628846)
> class(a)
[1] "integer"
> class(b)
[1] "data.frame"
> names(b)
[1] "b"
> length(b)
[1] 1
> summary(b[[1]]) # b[[1]] gives the first component of the list b
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
> class(b$b)
[1] "integer"
> summary(b$b)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
> lapply(b,summary)
$b
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
>
> # example of summary on a linear model
> x <- rnorm(100)
> y <- x + rnorm(100)
> my.lm <- lm(y~x)
> class(my.lm)
[1] "lm"
> summary(my.lm)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.6847 -0.5460 0.1175 0.6610 2.2976
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04122 0.09736 0.423 0.673
x 1.14790 0.09514 12.066 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9735 on 98 degrees of freedom
Multiple R-squared: 0.5977, Adjusted R-squared: 0.5936
F-statistic: 145.6 on 1 and 98 DF, p-value: < 2.2e-16
#1
1
The object a
is class integer
, b
is class data.frame
. A data frame
is a list
with certain properties and with class data.frame
(http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames). Many functions, including summary
, handle objects of different classes differently (see that you can use summary
on an object of class lm
and it gives you something completely different). If you want to apply the function summary
to every components in b
, you could use lapply
:
对象a是类整数,b是类data.frame。数据框是具有某些属性和类data.frame的列表(http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames)。许多函数(包括摘要)以不同方式处理不同类的对象(请参阅您可以在类lm的对象上使用摘要,它会为您提供完全不同的东西)。如果要将函数摘要应用于b中的每个组件,可以使用lapply:
> a <- 0:628846
> b <- data.frame(b = 0:628846)
> class(a)
[1] "integer"
> class(b)
[1] "data.frame"
> names(b)
[1] "b"
> length(b)
[1] 1
> summary(b[[1]]) # b[[1]] gives the first component of the list b
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
> class(b$b)
[1] "integer"
> summary(b$b)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
> lapply(b,summary)
$b
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 157200 314400 314400 471600 628800
>
> # example of summary on a linear model
> x <- rnorm(100)
> y <- x + rnorm(100)
> my.lm <- lm(y~x)
> class(my.lm)
[1] "lm"
> summary(my.lm)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.6847 -0.5460 0.1175 0.6610 2.2976
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04122 0.09736 0.423 0.673
x 1.14790 0.09514 12.066 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9735 on 98 degrees of freedom
Multiple R-squared: 0.5977, Adjusted R-squared: 0.5936
F-statistic: 145.6 on 1 and 98 DF, p-value: < 2.2e-16