以R分割单列数据帧

时间:2021-10-12 09:10:56

I have a large data set that I want to split into individual units. Right now, these unit barriers are marked by NA, but how do I split them? Sample set:

我有一个很大的数据集,我想把它分解成单独的单位。现在,这些单位壁垒都是NA标记的,但是我如何将它们分开呢?样本集:

df=matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),ncol=1,byrow=TRUE)

gives us

给了我们

       [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]   NA
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]   NA
[10,]    10
[11,]    11
[12,]    12

I would like these three stored in separate variables, such that

我想把这三个单独的变量存储起来

a
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
b
      [,1]
 [1,]    6
 [2,]    7
 [3,]    8
c
      [,1]
 [1,]    10
 [2,]    11
 [3,]    12

Does this make sense? Thanks.

这是否有意义吗?谢谢。

2 个解决方案

#1


1  

I wasn't sure if by "data set" you meant a true matrix or a data.frame. Here's a data.frame example, a matrix would be similar

我不确定你所说的“数据集”是指一个真正的矩阵还是一个data.frame。这是一个数据。框架的例子,矩阵是相似的

df <- data.frame(a=c(1,2,3,4,NA,6,7,8,NA,10,11,12))
gg <- ifelse(is.na(df$a),NA, cumsum(is.na(df$a)))
split(df, gg)

We just use gg as a new variable to count up every time we see an NA so we can divide the sections into groups. We also retain the NA values to drop them for the splitting. And finally split() with this new categorical variable does what we want.

我们只是把gg作为一个新的变量来计算每次看到一个NA,这样我们就可以把这些部分分成组。我们还保留NA值,以便在分割时删除它们。最后用这个新的直言变量split()就可以了。

$`0`
  a
1 1
2 2
3 3
4 4

$`1`
  a
6 6
7 7
8 8

$`2`
    a
10 10
11 11
12 12

#2


2  

One line solution using split and cumsum after removing missing values:

去除缺失值后,使用split和cumsum的一行解决方案:

 split(df[!is.na(df)],cumsum(is.na(df))[!is.na(df)])
$`0`
[1] 1 2 3 4

$`1`
[1] 6 7 8

$`2`
[1] 10 11 12

#1


1  

I wasn't sure if by "data set" you meant a true matrix or a data.frame. Here's a data.frame example, a matrix would be similar

我不确定你所说的“数据集”是指一个真正的矩阵还是一个data.frame。这是一个数据。框架的例子,矩阵是相似的

df <- data.frame(a=c(1,2,3,4,NA,6,7,8,NA,10,11,12))
gg <- ifelse(is.na(df$a),NA, cumsum(is.na(df$a)))
split(df, gg)

We just use gg as a new variable to count up every time we see an NA so we can divide the sections into groups. We also retain the NA values to drop them for the splitting. And finally split() with this new categorical variable does what we want.

我们只是把gg作为一个新的变量来计算每次看到一个NA,这样我们就可以把这些部分分成组。我们还保留NA值,以便在分割时删除它们。最后用这个新的直言变量split()就可以了。

$`0`
  a
1 1
2 2
3 3
4 4

$`1`
  a
6 6
7 7
8 8

$`2`
    a
10 10
11 11
12 12

#2


2  

One line solution using split and cumsum after removing missing values:

去除缺失值后,使用split和cumsum的一行解决方案:

 split(df[!is.na(df)],cumsum(is.na(df))[!is.na(df)])
$`0`
[1] 1 2 3 4

$`1`
[1] 6 7 8

$`2`
[1] 10 11 12