I have a data frame which looks like this:
我有一个如下所示的数据框:
id score
1 15
1 18
1 16
2 10
2 9
3 8
3 47
3 21
I'd like to identify a way to flag the first occurrence of id -- similar to first. and last. in SAS. I've tried the !duplicated function, but I need to actually append the "flag" column to my data frame since I'm running it through a loop later on. I'd like to get something like this:
我想找出一种方法来标记第一次出现的id - 类似于first。最后。在SAS。我已经尝试了!复制函数,但是我需要将“flag”列附加到我的数据框,因为我以后通过循环运行它。我想得到这样的东西:
id score first_ind
1 15 1
1 18 0
1 16 0
2 10 1
2 9 0
3 8 1
3 47 0
3 21 0
4 个解决方案
#1
15
> df$first_ind <- as.numeric(!duplicated(df$id))
> df
id score first_ind
1 1 15 1
2 1 18 0
3 1 16 0
4 2 10 1
5 2 9 0
6 3 8 1
7 3 47 0
8 3 21 0
#2
6
You can find the edges using diff
.
您可以使用diff找到边缘。
x <- read.table(text = "id score
1 15
1 18
1 16
2 10
2 9
3 8
3 47
3 21", header = TRUE)
x$first_id <- c(1, diff(x$id))
x
id score first_id
1 1 15 1
2 1 18 0
3 1 16 0
4 2 10 1
5 2 9 0
6 3 8 1
7 3 47 0
8 3 21 0
#3
3
Using plyr
:
library("plyr")
ddply(x,"id",transform,first=as.numeric(seq(length(score))==1))
or if you prefer dplyr
:
或者如果您更喜欢dplyr:
x %>% group_by(id) %>%
mutate(first=c(1,rep(0,n-1)))
(although if you're operating completely in the plyr
/dplyr
framework you probably wouldn't need this flag variable anyway ...)
(尽管如果你在plyr / dplyr框架中完全运行,你可能不需要这个标志变量......)
#4
2
Another base R option:
另一个基本R选项:
df$first_ind <- ave(df$id, df$id, FUN = seq_along) == 1
df
# id score first_ind
#1 1 15 TRUE
#2 1 18 FALSE
#3 1 16 FALSE
#4 2 10 TRUE
#5 2 9 FALSE
#6 3 8 TRUE
#7 3 47 FALSE
#8 3 21 FALSE
This also works in case of unsorted id
s. If you want 1/0 instead of T/F you can easily wrap it in as.integer(.)
.
这也适用于未分类的ID。如果你想要1/0而不是T / F,你可以轻松地将它包装在as.integer(。)中。
#1
15
> df$first_ind <- as.numeric(!duplicated(df$id))
> df
id score first_ind
1 1 15 1
2 1 18 0
3 1 16 0
4 2 10 1
5 2 9 0
6 3 8 1
7 3 47 0
8 3 21 0
#2
6
You can find the edges using diff
.
您可以使用diff找到边缘。
x <- read.table(text = "id score
1 15
1 18
1 16
2 10
2 9
3 8
3 47
3 21", header = TRUE)
x$first_id <- c(1, diff(x$id))
x
id score first_id
1 1 15 1
2 1 18 0
3 1 16 0
4 2 10 1
5 2 9 0
6 3 8 1
7 3 47 0
8 3 21 0
#3
3
Using plyr
:
library("plyr")
ddply(x,"id",transform,first=as.numeric(seq(length(score))==1))
or if you prefer dplyr
:
或者如果您更喜欢dplyr:
x %>% group_by(id) %>%
mutate(first=c(1,rep(0,n-1)))
(although if you're operating completely in the plyr
/dplyr
framework you probably wouldn't need this flag variable anyway ...)
(尽管如果你在plyr / dplyr框架中完全运行,你可能不需要这个标志变量......)
#4
2
Another base R option:
另一个基本R选项:
df$first_ind <- ave(df$id, df$id, FUN = seq_along) == 1
df
# id score first_ind
#1 1 15 TRUE
#2 1 18 FALSE
#3 1 16 FALSE
#4 2 10 TRUE
#5 2 9 FALSE
#6 3 8 TRUE
#7 3 47 FALSE
#8 3 21 FALSE
This also works in case of unsorted id
s. If you want 1/0 instead of T/F you can easily wrap it in as.integer(.)
.
这也适用于未分类的ID。如果你想要1/0而不是T / F,你可以轻松地将它包装在as.integer(。)中。