从当前的数据帧中创建新的数据帧。

时间:2021-10-18 22:55:55

I'm trying to calculate the best goal differentials in the group stage of the 2014 world cup.

我正在努力计算2014年世界杯小组赛阶段的最佳进球差距。

football <- read.csv(
    file="http://pastebin.com/raw.php?i=iTXdPvGf", 
    header = TRUE, 
    strip.white = TRUE
)
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]

Results in

结果

home home_continent home_score        away away_continent away_score result
4  Cameroon         Africa          0     Croatia         Europe          4      l
7     Spain         Europe          1 Netherlands         Europe          5      l
37  Germany  

So those are the games with the highest goal differntial, but now I need to make a new data frame that has a team name, and abs(football$home_score-football$away_score)

所以,这些是有最高目标差异的游戏,但现在我需要创建一个有团队名称的新数据框,以及abs(足球$home_score-football$away_score)

3 个解决方案

#1


2  

 football$score_diff <- abs(football$home_score - football$away_score)
 football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home), 
                      ifelse(football$result == "d", NA, as.character(football$away)))

#2


1  

You could save some typing in this way. You first get score differences and winners. When the result indicates w, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max().

你可以用这种方法保存一些打印文件。你首先会得到分数差异和赢家。当结果表明w时,家是赢家。所以你根本不用看分数。一旦您添加了得分差异和获胜者,您就可以通过使用max()来将数据子集进行分组。

mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf", 
                 header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)

library(dplyr)

mutate(mydf, scorediff = abs(home_score - away_score),
             winner = ifelse(result == "w", as.character(home),
                         ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))

#      home home_continent home_score        away away_continent away_score result scorediff      winner
#1 Cameroon         Africa          0     Croatia         Europe          4      l         4     Croatia
#2    Spain         Europe          1 Netherlands         Europe          5      l         4 Netherlands
#3  Germany         Europe          4    Portugal         Europe          0      w         4     Germany

#3


1  

Here is another option without using ifelse for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..), and the row index is just 1:nrow(football). Subset the "football" dataset with columns 'home', 'away' and cbind it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.

这里有另一个选项,不使用ifelse来创建“赢家”列。这是基于行/列索引的。数值列索引是通过将结果列与它的惟一元素(match(足球$result,..))匹配而创建的,而行索引仅为1:nrow(足球)。将“足球”数据集与列“home”、“away”和cbind绑定在一起,并与NAs附加一列“draw”,以便“结果”中的“d”元素变为NA。

football$score_diff <- abs(football$home_score - football$away_score)
football$winner <-  cbind(football[c('home', 'away')],draw=NA)[ 
    cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]

football[with(football, score_diff==max(score_diff)),]
#  home home_continent home_score    away away_continent away_score   result
 #60 Brazil  South America          1 Germany         Europe          7    l
 #   score_diff  winner
 #60          6 Germany

If the dataset is very big, you could speed up the match by using chmatch from library(data.table)

如果数据集很大,可以通过使用库中的chmatch来加速匹配(data.table)

library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))

NOTE: I used the full dataset in the link

注意:我在链接中使用了完整的数据集。

#1


2  

 football$score_diff <- abs(football$home_score - football$away_score)
 football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home), 
                      ifelse(football$result == "d", NA, as.character(football$away)))

#2


1  

You could save some typing in this way. You first get score differences and winners. When the result indicates w, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max().

你可以用这种方法保存一些打印文件。你首先会得到分数差异和赢家。当结果表明w时,家是赢家。所以你根本不用看分数。一旦您添加了得分差异和获胜者,您就可以通过使用max()来将数据子集进行分组。

mydf <- read.csv(file="http://pastebin.com/raw.php?i=iTXdPvGf", 
                 header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)

library(dplyr)

mutate(mydf, scorediff = abs(home_score - away_score),
             winner = ifelse(result == "w", as.character(home),
                         ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))

#      home home_continent home_score        away away_continent away_score result scorediff      winner
#1 Cameroon         Africa          0     Croatia         Europe          4      l         4     Croatia
#2    Spain         Europe          1 Netherlands         Europe          5      l         4 Netherlands
#3  Germany         Europe          4    Portugal         Europe          0      w         4     Germany

#3


1  

Here is another option without using ifelse for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..), and the row index is just 1:nrow(football). Subset the "football" dataset with columns 'home', 'away' and cbind it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.

这里有另一个选项,不使用ifelse来创建“赢家”列。这是基于行/列索引的。数值列索引是通过将结果列与它的惟一元素(match(足球$result,..))匹配而创建的,而行索引仅为1:nrow(足球)。将“足球”数据集与列“home”、“away”和cbind绑定在一起,并与NAs附加一列“draw”,以便“结果”中的“d”元素变为NA。

football$score_diff <- abs(football$home_score - football$away_score)
football$winner <-  cbind(football[c('home', 'away')],draw=NA)[ 
    cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]

football[with(football, score_diff==max(score_diff)),]
#  home home_continent home_score    away away_continent away_score   result
 #60 Brazil  South America          1 Germany         Europe          7    l
 #   score_diff  winner
 #60          6 Germany

If the dataset is very big, you could speed up the match by using chmatch from library(data.table)

如果数据集很大,可以通过使用库中的chmatch来加速匹配(data.table)

library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))

NOTE: I used the full dataset in the link

注意:我在链接中使用了完整的数据集。