Hopefully you guys can help me out. I've been looking all over the web, and I can't find an answer. Here's my data frame:
希望你们能帮助我。我一直在网上看,我找不到答案。这是我的数据框:
name city state stars main_category
A Pittsburgh PA 5.0 Soul Food
B Houston TX 3.0 Professional Services
C Lafayette IN 3.0 NA
D Los Angeles CA 4.0 Local Services
E Los Angeles CA 3.0 Local Services
F Lafayette IN 3.5 *n
G Pittsburgh PA 5.0 Doctors
H Pittsburgh PA 4.0 Soul Food
I Houston TX 4.0 Professional Services
What I would like for it to do is to output the rank by grouping cities (alphabetically) with state and then rank by the amount of stars gotten. Here's what I was hoping for:
我想要它做的是通过将城市(按字母顺序)与州分组来输出等级,然后按照得到的星数进行排名。这就是我所希望的:
name city state stars main_category rank
I Houston TX 4.0 Professional Services 1
B Houston TX 3.0 Professional Services 2
F Lafayette IN 3.5 *n 1
D Los Angeles CA 4.0 Local Services 1
E Los Angeles CA 3.0 Local Services 2
G Pittsburgh PA 5.0 Doctors 1
A Pittsburgh PA 5.0 Soul Food 1
H Pittsburgh PA 4.0 Soul Food 2
Here's my line of code.
这是我的代码行。
l <- ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))
This does not remove the NA that Lafayette has. And I don't know what to put, I also tried na.omit, but when I tried that, the rank column does not show up.
这并不能消除拉斐特所拥有的NA。而且我不知道该放什么,我也尝试了na.omit,但是当我尝试这个时,排名列没有显示出来。
3 个解决方案
#1
1
Here's a base R solution. Not sure if you're set on using dplyr, but this seems to work. I think the last row should be ranked 3 since there are two first values ranked at 1
这是一个基础R解决方案。不确定你是否已开始使用dplyr,但这似乎有效。我认为最后一行应该排名3,因为有两个第一个值排在1
no <- na.omit(dat)
new <- no[do.call(order, with(no, list(city, state, -stars))),]
within(new, {
rank <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min"))
})
# name city state stars main_category rank
# 9 I Houston TX 4.0 Professional Services 1
# 2 B Houston TX 3.0 Professional Services 2
# 6 F Lafayette IN 3.5 *n 1
# 4 D Los Angeles CA 4.0 Local Services 1
# 5 E Los Angeles CA 3.0 Local Services 2
# 1 A Pittsburgh PA 5.0 Soul Food 1
# 7 G Pittsburgh PA 5.0 Doctors 1
# 8 H Pittsburgh PA 4.0 Soul Food 3
#2
0
Using dplyr
使用dplyr
library(dplyr)
filter(dat, complete.cases(dat)) %>%
group_by(city) %>%
arrange(city, state, desc(stars)) %>%
mutate(rank= min_rank(desc(stars)))
# name city state stars main_category rank
#1 I Houston TX 4.0 Professional Services 1
#2 B Houston TX 3.0 Professional Services 2
#3 F Lafayette IN 3.5 *n 1
#4 D Los Angeles CA 4.0 Local Services 1
#5 E Los Angeles CA 3.0 Local Services 2
#6 A Pittsburgh PA 5.0 Soul Food 1
#7 G Pittsburgh PA 5.0 Doctors 1
#8 H Pittsburgh PA 4.0 Soul Food 3
#3
0
na.rm with ddply goes inside .fun , in your case that'd be inside rank.
na.rm与ddply进入.fun,在你的情况下,是在内部排名。
your approach to NA's was as follows:
你对NA的态度如下:
ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))
ddply(d,c(“city”,“state”,“main_category”),na.rm = T,transform,rank = rank(-stars,ties.method =“max”))
Passing the argument inside .fun, should fix it. At least it works for me:
在.fun中传递参数,应该修复它。至少它对我有用:
ddply(d, c("city", "state", "main_category"), transform,
rank=rank(-stars, na.last = TRUE, ties.method="max"))
#1
1
Here's a base R solution. Not sure if you're set on using dplyr, but this seems to work. I think the last row should be ranked 3 since there are two first values ranked at 1
这是一个基础R解决方案。不确定你是否已开始使用dplyr,但这似乎有效。我认为最后一行应该排名3,因为有两个第一个值排在1
no <- na.omit(dat)
new <- no[do.call(order, with(no, list(city, state, -stars))),]
within(new, {
rank <- Reduce(c, Map(rank, split(-stars, city), ties.method = "min"))
})
# name city state stars main_category rank
# 9 I Houston TX 4.0 Professional Services 1
# 2 B Houston TX 3.0 Professional Services 2
# 6 F Lafayette IN 3.5 *n 1
# 4 D Los Angeles CA 4.0 Local Services 1
# 5 E Los Angeles CA 3.0 Local Services 2
# 1 A Pittsburgh PA 5.0 Soul Food 1
# 7 G Pittsburgh PA 5.0 Doctors 1
# 8 H Pittsburgh PA 4.0 Soul Food 3
#2
0
Using dplyr
使用dplyr
library(dplyr)
filter(dat, complete.cases(dat)) %>%
group_by(city) %>%
arrange(city, state, desc(stars)) %>%
mutate(rank= min_rank(desc(stars)))
# name city state stars main_category rank
#1 I Houston TX 4.0 Professional Services 1
#2 B Houston TX 3.0 Professional Services 2
#3 F Lafayette IN 3.5 *n 1
#4 D Los Angeles CA 4.0 Local Services 1
#5 E Los Angeles CA 3.0 Local Services 2
#6 A Pittsburgh PA 5.0 Soul Food 1
#7 G Pittsburgh PA 5.0 Doctors 1
#8 H Pittsburgh PA 4.0 Soul Food 3
#3
0
na.rm with ddply goes inside .fun , in your case that'd be inside rank.
na.rm与ddply进入.fun,在你的情况下,是在内部排名。
your approach to NA's was as follows:
你对NA的态度如下:
ddply(d, c("city", "state", "main_category"), na.rm=T, transform, rank=rank(-stars, ties.method="max"))
ddply(d,c(“city”,“state”,“main_category”),na.rm = T,transform,rank = rank(-stars,ties.method =“max”))
Passing the argument inside .fun, should fix it. At least it works for me:
在.fun中传递参数,应该修复它。至少它对我有用:
ddply(d, c("city", "state", "main_category"), transform,
rank=rank(-stars, na.last = TRUE, ties.method="max"))