Dplyr：如何按群组找到第一个缺少的字符串？

Consider the following simple example

请考虑以下简单示例

group <-c('A','A','A','B','B','B','B')
names<- c(NA,'fred',NA,'josh','josh',NA,NA)
data=data_frame(group,names)

> data
# A tibble: 7 × 2
  group names
  <chr> <chr>
1     A  <NA>
2     A  fred
3     A  <NA>
4     B  josh
5     B  josh
6     B  <NA>
7     B  <NA>

Here, I would like to get, for each group the first non missing name in names. How can I do that? The solution below using coalesce and first fail.

在这里,我想为每个组获取名称中第一个未缺少的名称。我怎样才能做到这一点?下面的解决方案使用合并并首先失败。

data %>% group_by(group) %>% mutate(first_non_missing = first(names),
                                    first_non_missing_alt = coalesce(names)) %>% ungroup()

# A tibble: 7 × 4
  group names first_non_missing first_non_missing_alt
  <chr> <chr>             <chr>                 <chr>
1     A  <NA>              <NA>                  <NA>
2     A  fred              <NA>                  fred
3     A  <NA>              <NA>                  <NA>
4     B  josh              josh                  josh
5     B  josh              josh                  josh
6     B  <NA>              josh                  <NA>
7     B  <NA>              josh                  <NA>

Indeed, for group A, first_non_missing should be fred for all three observations..

实际上,对于A组来说,应该为所有三个观察结果提供first_non_missing。

Many thanks!

1 个解决方案

#1

Summarise will give one entry per group, here, finding the first non-missing using which

总结将给每组一个条目,在这里,找到第一个非缺失使用哪个

data %>%
  group_by(group) %>%
  summarise(first_non_missing = names[which(!is.na(names))[1]])

gives

  group first_non_missing
  <chr>             <chr>
1     A              fred
2     B              josh

If you still want all of the rows, replace summarise with mutate.

如果您仍想要所有行,请用mutate替换summary。

#1