如何在R中找到数据帧中列的最高值?

时间:2021-03-09 13:17:59

I have the following data frame which I called ozone:

我有以下的数据框架我称之为ozone:

   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9

I would like to extract the highest value from ozone, Solar.R, Wind...

我想从臭氧,太阳能中提取最高的值。R,风……

Also, if possible how would I sort Solar.R or any column of this data frame in descending order

而且,如果可能的话,我将如何对太阳能进行分类。R或此数据帧的任何列按降序排列

I tried

我试着

max(ozone, na.rm=T)

which gives me the highest value in the dataset.

这给了我数据集中最高的值。

I have also tried

我也试过

max(subset(ozone,Ozone))

but got "subset" must be logical."

但是得到“子集”必须是合乎逻辑的。

I can set an object to hold the subset of each column, by the following commands

我可以通过以下命令设置一个对象来保存每个列的子集

ozone <- subset(ozone, Ozone >0)
max(ozone,na.rm=T) 

but it gives the same value of 334, which is the max value of the data frame, not the column.

但它给出的值是334,也就是数据框的最大值,而不是列。

Any help would be great, thanks.

任何帮助都可以,谢谢。

10 个解决方案

#1


38  

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

与colMeans、colsum等类似,您可以编写一个列最大函数colMax和一个列排序函数colSort。

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

我用……在第二个功能,希望激起你的阴谋。

Get your data:

让你的数据:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

在样本数据上使用colMax函数:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

要对单个列进行排序,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

在所有列上使用colSort函数,

colSort(dat, decreasing = TRUE) ## compare with '...' above

#2


27  

To get the max of any column you want something like:

要得到任何列的最大值,你需要:

max(ozone$Ozone, na.rm = TRUE)

To get the max of all columns, you want:

要得到所有列的最大值,需要:

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

And to sort:

和分类:

ozone[order(ozone$Solar.R),]

Or to sort the other direction:

或者从另一个方向排序:

ozone[rev(order(ozone$Solar.R)),]

#3


6  

In response to finding the max value for each column, you could try using the apply() function:

为了找到每个列的最大值,您可以尝试使用apply()函数:

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0 

#4


5  

Here's a dplyr solution:

这里有一个dplyr解决方案:

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

UPDATE: summarise_each() has been deprecated in favour of a more featureful family of functions: mutate_all(), mutate_at(), mutate_if(), summarise_all(), summarise_at(), summarise_if()

更新:已经弃用了summary _each(),代之以功能更丰富的函数族:mutate_all()、mutate_at()、mutate_if()、汇总_all()、汇总_at()、汇总_if()

Here is how you could do:

你可以这样做:

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

or

ozone %>%
         summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

#5


2  

Another way would be to use ?pmax

另一种方法是使用?pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0

#6


1  

Assuming that your data in data.frame called maxinozone, you can do this

假设你的数据在data.frame中被称为maxinozone,你可以这样做。

max(maxinozone[1, ], na.rm = TRUE)

#7


0  

max(ozone$Ozone, na.rm = TRUE) should do the trick. Remember to include the na.rm = TRUE or else R will return NA.

max($臭氧,臭氧na。rm = TRUE)应该是这样的。记住要包含na。rm = TRUE,否则R将返回NA。

#8


0  

max(may$Ozone, na.rm = TRUE)

Without $Ozone it will filter in the whole data frame, this can be learned in the swirl library.

在没有$Ozone的情况下,它会过滤整个数据框架,这可以在vortex库中学到。

I'm studying this course on Coursera too ~

我也在Coursera上学习这门课程

#9


0  

Try this solution:

试试这个解决方案:

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...

#10


0  

There is a package matrixStats that provides some functions to do column and row summaries, see in the package vignette, but you have to convert your data.frame into a matrix.

有一个package matrixStats,它提供了一些函数来执行列和行摘要,在package vignette中看到,但是您必须将您的数据。frame转换成一个矩阵。

Then you run: colMaxs(as.matrix(ozone))

然后运行:colMaxs(as.matrix(臭氧))

#1


38  

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

与colMeans、colsum等类似,您可以编写一个列最大函数colMax和一个列排序函数colSort。

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

我用……在第二个功能,希望激起你的阴谋。

Get your data:

让你的数据:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

在样本数据上使用colMax函数:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

要对单个列进行排序,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

在所有列上使用colSort函数,

colSort(dat, decreasing = TRUE) ## compare with '...' above

#2


27  

To get the max of any column you want something like:

要得到任何列的最大值,你需要:

max(ozone$Ozone, na.rm = TRUE)

To get the max of all columns, you want:

要得到所有列的最大值,需要:

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

And to sort:

和分类:

ozone[order(ozone$Solar.R),]

Or to sort the other direction:

或者从另一个方向排序:

ozone[rev(order(ozone$Solar.R)),]

#3


6  

In response to finding the max value for each column, you could try using the apply() function:

为了找到每个列的最大值,您可以尝试使用apply()函数:

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0 

#4


5  

Here's a dplyr solution:

这里有一个dplyr解决方案:

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

UPDATE: summarise_each() has been deprecated in favour of a more featureful family of functions: mutate_all(), mutate_at(), mutate_if(), summarise_all(), summarise_at(), summarise_if()

更新:已经弃用了summary _each(),代之以功能更丰富的函数族:mutate_all()、mutate_at()、mutate_if()、汇总_all()、汇总_at()、汇总_if()

Here is how you could do:

你可以这样做:

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

or

ozone %>%
         summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

#5


2  

Another way would be to use ?pmax

另一种方法是使用?pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0

#6


1  

Assuming that your data in data.frame called maxinozone, you can do this

假设你的数据在data.frame中被称为maxinozone,你可以这样做。

max(maxinozone[1, ], na.rm = TRUE)

#7


0  

max(ozone$Ozone, na.rm = TRUE) should do the trick. Remember to include the na.rm = TRUE or else R will return NA.

max($臭氧,臭氧na。rm = TRUE)应该是这样的。记住要包含na。rm = TRUE,否则R将返回NA。

#8


0  

max(may$Ozone, na.rm = TRUE)

Without $Ozone it will filter in the whole data frame, this can be learned in the swirl library.

在没有$Ozone的情况下,它会过滤整个数据框架,这可以在vortex库中学到。

I'm studying this course on Coursera too ~

我也在Coursera上学习这门课程

#9


0  

Try this solution:

试试这个解决方案:

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...

#10


0  

There is a package matrixStats that provides some functions to do column and row summaries, see in the package vignette, but you have to convert your data.frame into a matrix.

有一个package matrixStats,它提供了一些函数来执行列和行摘要,在package vignette中看到,但是您必须将您的数据。frame转换成一个矩阵。

Then you run: colMaxs(as.matrix(ozone))

然后运行:colMaxs(as.matrix(臭氧))