在R数据帧中首先按组标记

时间:2021-01-08 14:57:26

I have a data frame which looks like this:

我有一个如下所示的数据框:

id  score
1   15
1   18
1   16
2   10
2   9
3   8
3   47
3   21

I'd like to identify a way to flag the first occurrence of id -- similar to first. and last. in SAS. I've tried the !duplicated function, but I need to actually append the "flag" column to my data frame since I'm running it through a loop later on. I'd like to get something like this:

我想找出一种方法来标记第一次出现的id - 类似于first。最后。在SAS。我已经尝试了!复制函数,但是我需要将“flag”列附加到我的数据框,因为我以后通过循环运行它。我想得到这样的东西:

id  score   first_ind
1   15      1
1   18      0
1   16      0
2   10      1
2   9       0
3   8       1
3   47      0
3   21      0

4 个解决方案

#1


15  

> df$first_ind <- as.numeric(!duplicated(df$id))
> df
  id score first_ind
1  1    15         1
2  1    18         0
3  1    16         0
4  2    10         1
5  2     9         0
6  3     8         1
7  3    47         0
8  3    21         0

#2


6  

You can find the edges using diff.

您可以使用diff找到边缘。

x <- read.table(text = "id  score
1   15
1   18
1   16
2   10
2   9
3   8
3   47
3   21", header = TRUE)

x$first_id <- c(1, diff(x$id))
x

  id score first_id
1  1    15        1
2  1    18        0
3  1    16        0
4  2    10        1
5  2     9        0
6  3     8        1
7  3    47        0
8  3    21        0

#3


3  

Using plyr:

library("plyr")
ddply(x,"id",transform,first=as.numeric(seq(length(score))==1))

or if you prefer dplyr:

或者如果您更喜欢dplyr:

x %>% group_by(id) %>% 
    mutate(first=c(1,rep(0,n-1)))

(although if you're operating completely in the plyr/dplyr framework you probably wouldn't need this flag variable anyway ...)

(尽管如果你在plyr / dplyr框架中完全运行,你可能不需要这个标志变量......)

#4


2  

Another base R option:

另一个基本R选项:

df$first_ind <- ave(df$id, df$id, FUN = seq_along) == 1
df
#  id score first_ind
#1  1    15      TRUE
#2  1    18     FALSE
#3  1    16     FALSE
#4  2    10      TRUE
#5  2     9     FALSE
#6  3     8      TRUE
#7  3    47     FALSE
#8  3    21     FALSE

This also works in case of unsorted ids. If you want 1/0 instead of T/F you can easily wrap it in as.integer(.).

这也适用于未分类的ID。如果你想要1/0而不是T / F,你可以轻松地将它包装在as.integer(。)中。

#1


15  

> df$first_ind <- as.numeric(!duplicated(df$id))
> df
  id score first_ind
1  1    15         1
2  1    18         0
3  1    16         0
4  2    10         1
5  2     9         0
6  3     8         1
7  3    47         0
8  3    21         0

#2


6  

You can find the edges using diff.

您可以使用diff找到边缘。

x <- read.table(text = "id  score
1   15
1   18
1   16
2   10
2   9
3   8
3   47
3   21", header = TRUE)

x$first_id <- c(1, diff(x$id))
x

  id score first_id
1  1    15        1
2  1    18        0
3  1    16        0
4  2    10        1
5  2     9        0
6  3     8        1
7  3    47        0
8  3    21        0

#3


3  

Using plyr:

library("plyr")
ddply(x,"id",transform,first=as.numeric(seq(length(score))==1))

or if you prefer dplyr:

或者如果您更喜欢dplyr:

x %>% group_by(id) %>% 
    mutate(first=c(1,rep(0,n-1)))

(although if you're operating completely in the plyr/dplyr framework you probably wouldn't need this flag variable anyway ...)

(尽管如果你在plyr / dplyr框架中完全运行,你可能不需要这个标志变量......)

#4


2  

Another base R option:

另一个基本R选项:

df$first_ind <- ave(df$id, df$id, FUN = seq_along) == 1
df
#  id score first_ind
#1  1    15      TRUE
#2  1    18     FALSE
#3  1    16     FALSE
#4  2    10      TRUE
#5  2     9     FALSE
#6  3     8      TRUE
#7  3    47     FALSE
#8  3    21     FALSE

This also works in case of unsorted ids. If you want 1/0 instead of T/F you can easily wrap it in as.integer(.).

这也适用于未分类的ID。如果你想要1/0而不是T / F,你可以轻松地将它包装在as.integer(。)中。