This is puzzling me. When you run summary() on a vector of integers you don't seem to get accurate results. The numbers seem to be rounded off. I tried this on three different machines with different OS's and the results are the same.


For a vector:


>a <- 0:628846
 int [1:628847] 0 1 2 3 4 5 6 7 8 9 ...
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0  157200  314400  314400  471600  628800 
[1] 628846

For a data.frame:


> b <- data.frame(b = 0:628846)
> str(b)
'data.frame':   628847 obs. of  1 variable:
 $ b: int  0 1 2 3 4 5 6 7 8 9 ...
> summary(b)
 Min.   :     0  
 1st Qu.:157212  
 Median :314423  
 Mean   :314423  
 3rd Qu.:471635  
 Max.   :628846  
> summary(b$b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0  157200  314400  314400  471600  628800 

Why are these results different?


1 个解决方案



The object a is class integer, b is class data.frame. A data frame is a list with certain properties and with class data.frame (http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames). Many functions, including summary, handle objects of different classes differently (see that you can use summary on an object of class lm and it gives you something completely different). If you want to apply the function summary to every components in b, you could use lapply:


> a <- 0:628846
> b <- data.frame(b = 0:628846)
> class(a)
[1] "integer"
> class(b)
[1] "data.frame"
> names(b)
[1] "b"
> length(b)
[1] 1
> summary(b[[1]]) # b[[1]] gives the first component of the list b
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0  157200  314400  314400  471600  628800 
> class(b$b)
[1] "integer"
> summary(b$b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0  157200  314400  314400  471600  628800 
> lapply(b,summary)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0  157200  314400  314400  471600  628800 
> # example of summary on a linear model
> x <- rnorm(100)
> y <- x + rnorm(100)
> my.lm <- lm(y~x)
> class(my.lm)
[1] "lm"
> summary(my.lm)

lm(formula = y ~ x)

    Min      1Q  Median      3Q     Max 
-2.6847 -0.5460  0.1175  0.6610  2.2976 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.04122    0.09736   0.423    0.673    
x            1.14790    0.09514  12.066   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9735 on 98 degrees of freedom
Multiple R-squared: 0.5977, Adjusted R-squared: 0.5936 
F-statistic: 145.6 on 1 and 98 DF,  p-value: < 2.2e-16



