I'm trying to calculate the best goal differentials in the group stage of the 2014 world cup.
我正在努力计算2014年世界杯小组赛阶段的最佳进球差距。
football <- read.csv(
file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE,
strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]
Results in
结果
home home_continent home_score away away_continent away_score result
4 Cameroon Africa 0 Croatia Europe 4 l
7 Spain Europe 1 Netherlands Europe 5 l
37 Germany
So those are the games with the highest goal differntial, but now I need to make a new data frame that has a team name, and abs(football$home_score-football$away_score)
所以,这些是有最高目标差异的游戏,但现在我需要创建一个有团队名称的新数据框,以及abs(足球$home_score-football$away_score)
3 个解决方案
#1
2
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
#2
1
You could save some typing in this way. You first get score differences and winners. When the result indicates w
, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max()
.
你可以用这种方法保存一些打印文件。你首先会得到分数差异和赢家。当结果表明w时,家是赢家。所以你根本不用看分数。一旦您添加了得分差异和获胜者,您就可以通过使用max()来将数据子集进行分组。
mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
library(dplyr)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
#3
1
Here is another option without using ifelse
for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..
), and the row index is just 1:nrow(football)
. Subset the "football" dataset with columns 'home', 'away' and cbind
it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.
这里有另一个选项,不使用ifelse来创建“赢家”列。这是基于行/列索引的。数值列索引是通过将结果列与它的惟一元素(match(足球$result,..))匹配而创建的,而行索引仅为1:nrow(足球)。将“足球”数据集与列“home”、“away”和cbind绑定在一起,并与NAs附加一列“draw”,以便“结果”中的“d”元素变为NA。
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
If the dataset is very big, you could speed up the match
by using chmatch
from library(data.table)
如果数据集很大,可以通过使用库中的chmatch来加速匹配(data.table)
library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
NOTE: I used the full dataset in the link
注意:我在链接中使用了完整的数据集。
#1
2
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
#2
1
You could save some typing in this way. You first get score differences and winners. When the result indicates w
, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max()
.
你可以用这种方法保存一些打印文件。你首先会得到分数差异和赢家。当结果表明w时,家是赢家。所以你根本不用看分数。一旦您添加了得分差异和获胜者,您就可以通过使用max()来将数据子集进行分组。
mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
library(dplyr)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
#3
1
Here is another option without using ifelse
for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..
), and the row index is just 1:nrow(football)
. Subset the "football" dataset with columns 'home', 'away' and cbind
it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.
这里有另一个选项,不使用ifelse来创建“赢家”列。这是基于行/列索引的。数值列索引是通过将结果列与它的惟一元素(match(足球$result,..))匹配而创建的,而行索引仅为1:nrow(足球)。将“足球”数据集与列“home”、“away”和cbind绑定在一起,并与NAs附加一列“draw”,以便“结果”中的“d”元素变为NA。
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
If the dataset is very big, you could speed up the match
by using chmatch
from library(data.table)
如果数据集很大,可以通过使用库中的chmatch来加速匹配(data.table)
library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
NOTE: I used the full dataset in the link
注意:我在链接中使用了完整的数据集。