OK, check out this data frame...
好的,看看这个数据框......
customer_name order_dates order_values
1 John 2010-11-01 15
2 Bob 2008-03-25 12
3 Alex 2009-11-15 5
4 John 2012-08-06 15
5 John 2015-05-07 20
Lets say I want to add an order variable that Ranks the highest order value, by name, by max order date, using the last order date at the tie breaker. So, ultimately the data should look like this:
假设我想添加一个订单变量,按名称,按最大订单日期排序最高订单值,使用最后一个订单断路器的订单日期。所以,最终数据应如下所示:
customer_name order_dates order_values ranked_order_values_by_max_value_date
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1
Where everyone's single order gets 1, and all subsequent orders are ranked based on the value, and the tie breaker is the last order date getting priority. In this example, John's 8/6/2012 order gets the #2 rank because it was placed after 11/1/2010. The 5/7/2015 order is 1 because it was the biggest. So, even if that order was placed 20 years ago, it should be the #1 Rank because it was John's highest order value.
每个人的单个订单获得1,并且所有后续订单都根据该值进行排名,并且决胜者是获得优先权的最后订单日期。在这个例子中,John的8/6/2012订单获得了#2排名,因为它是在11/1/2010之后放置的。 2015年5月7日的订单是1,因为它是最大的。因此,即使该订单是在20年前发布的,也应该是#1 Rank,因为这是John的最高订单价值。
Does anyone know how I can do this in R? Where I can Rank within a group of specified variables in a data frame?
有谁知道我怎么能在R中做到这一点?我可以在数据框中的一组指定变量中排名?
Thanks for your help!
谢谢你的帮助!
4 个解决方案
#1
6
You can do this pretty cleanly with dplyr
你可以用dplyr干净利落地做到这一点
library(dplyr)
df %>%
group_by(customer_name) %>%
mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))
Source: local data frame [5 x 4]
Groups: customer_name
customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1
#2
9
The top rated answer (by cdeterman) is actually incorrect. The order function provides the location of the 1st, 2nd, 3rd, etc ranked values not the ranks of the values in their current order.
评分最高的答案(由cdeterman提供)实际上是不正确的。 order函数提供第1,第2,第3等排名值的位置,而不是当前顺序中值的排名。
Let’s take a simple example where we want to rank, starting with the largest, grouping by customer name. I have included a manual ranking so we can check the values
让我们举一个简单的例子,我们想要排名,从最大的客户名称分组开始。我已经包含了手动排名,因此我们可以检查这些值
> df
customer_name order_values manual_rank
1 John 2 5
2 John 5 2
3 John 9 1
4 John 1 6
5 John 4 3
6 John 3 4
7 Lucy 4 4
8 Lucy 9 1
9 Lucy 6 3
10 Lucy 2 6
11 Lucy 8 2
12 Lucy 3 5
If I run the code suggested by cdeterman I get the following incorrect ranks:
如果我运行cdeterman建议的代码,我会得到以下错误的排名:
> df %>%
+ group_by(customer_name) %>%
+ mutate(my_ranks = order(order_values, decreasing=TRUE))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank my_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 3
2 John 5 2 2
3 John 9 1 5
4 John 1 6 6
5 John 4 3 1
6 John 3 4 4
7 Lucy 4 4 2
8 Lucy 9 1 5
9 Lucy 6 3 3
10 Lucy 2 6 1
11 Lucy 8 2 6
12 Lucy 3 5 4
Order is used to re-order dataframes into decreasing or increasing order. What we actually want is to run the order function twice, with the second order function giving us the actual ranks we want.
订单用于将数据帧重新排序为递减或递增顺序。我们真正想要的是运行两次订单功能,二阶函数给我们想要的实际排名。
> df %>%
+ group_by(customer_name) %>%
+ mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank good_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 5
2 John 5 2 2
3 John 9 1 1
4 John 1 6 6
5 John 4 3 3
6 John 3 4 4
7 Lucy 4 4 4
8 Lucy 9 1 1
9 Lucy 6 3 3
10 Lucy 2 6 6
11 Lucy 8 2 2
12 Lucy 3 5 5
#3
3
In base R
you can do this with the slightly unwieldy
在基地R你可以用稍微笨重的方式做到这一点
transform(df,rank=ave(1:nrow(df),customer_name,
FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))
customer_name order_dates order_values rank 1 John 2010-11-01 15 3 2 Bob 2008-03-25 12 1 3 Alex 2009-11-15 5 1 4 John 2012-08-06 15 2 5 John 2015-05-07 20 1
where order
is provided both the primary and tie-breaker values for each group.
其中为每个组提供了主要和决胜局值的顺序。
#4
1
This can be achieved with ave
and rank
. ave
passes the proper groups to rank
. The result from rank
is reversed due to the requested order:
这可以通过ave和rank来实现。 ave通过适当的团体排名。由于请求的顺序,排名的结果被颠倒:
with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1
#1
6
You can do this pretty cleanly with dplyr
你可以用dplyr干净利落地做到这一点
library(dplyr)
df %>%
group_by(customer_name) %>%
mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))
Source: local data frame [5 x 4]
Groups: customer_name
customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1
#2
9
The top rated answer (by cdeterman) is actually incorrect. The order function provides the location of the 1st, 2nd, 3rd, etc ranked values not the ranks of the values in their current order.
评分最高的答案(由cdeterman提供)实际上是不正确的。 order函数提供第1,第2,第3等排名值的位置,而不是当前顺序中值的排名。
Let’s take a simple example where we want to rank, starting with the largest, grouping by customer name. I have included a manual ranking so we can check the values
让我们举一个简单的例子,我们想要排名,从最大的客户名称分组开始。我已经包含了手动排名,因此我们可以检查这些值
> df
customer_name order_values manual_rank
1 John 2 5
2 John 5 2
3 John 9 1
4 John 1 6
5 John 4 3
6 John 3 4
7 Lucy 4 4
8 Lucy 9 1
9 Lucy 6 3
10 Lucy 2 6
11 Lucy 8 2
12 Lucy 3 5
If I run the code suggested by cdeterman I get the following incorrect ranks:
如果我运行cdeterman建议的代码,我会得到以下错误的排名:
> df %>%
+ group_by(customer_name) %>%
+ mutate(my_ranks = order(order_values, decreasing=TRUE))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank my_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 3
2 John 5 2 2
3 John 9 1 5
4 John 1 6 6
5 John 4 3 1
6 John 3 4 4
7 Lucy 4 4 2
8 Lucy 9 1 5
9 Lucy 6 3 3
10 Lucy 2 6 1
11 Lucy 8 2 6
12 Lucy 3 5 4
Order is used to re-order dataframes into decreasing or increasing order. What we actually want is to run the order function twice, with the second order function giving us the actual ranks we want.
订单用于将数据帧重新排序为递减或递增顺序。我们真正想要的是运行两次订单功能,二阶函数给我们想要的实际排名。
> df %>%
+ group_by(customer_name) %>%
+ mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank good_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 5
2 John 5 2 2
3 John 9 1 1
4 John 1 6 6
5 John 4 3 3
6 John 3 4 4
7 Lucy 4 4 4
8 Lucy 9 1 1
9 Lucy 6 3 3
10 Lucy 2 6 6
11 Lucy 8 2 2
12 Lucy 3 5 5
#3
3
In base R
you can do this with the slightly unwieldy
在基地R你可以用稍微笨重的方式做到这一点
transform(df,rank=ave(1:nrow(df),customer_name,
FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))
customer_name order_dates order_values rank 1 John 2010-11-01 15 3 2 Bob 2008-03-25 12 1 3 Alex 2009-11-15 5 1 4 John 2012-08-06 15 2 5 John 2015-05-07 20 1
where order
is provided both the primary and tie-breaker values for each group.
其中为每个组提供了主要和决胜局值的顺序。
#4
1
This can be achieved with ave
and rank
. ave
passes the proper groups to rank
. The result from rank
is reversed due to the requested order:
这可以通过ave和rank来实现。 ave通过适当的团体排名。由于请求的顺序,排名的结果被颠倒:
with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1