R:在向空数据帧添加行时丢失列名

时间:2022-07-08 19:35:42

I am just starting with R and encountered a strange behaviour: when inserting the first row in an empty data frame, the original column names get lost.

我刚从R开始,遇到了一个奇怪的行为:在空数据框中插入第一行时,原来的列名会丢失。

example:

例子:

a<-data.frame(one = numeric(0), two = numeric(0))
a
#[1] one two
#<0 rows> (or 0-length row.names)
names(a)
#[1] "one" "two"
a<-rbind(a, c(5,6))
a
#  X5 X6
#1  5  6
names(a)
#[1] "X5" "X6"

As you can see, the column names one and two were replaced by X5 and X6.

如您所见,列名1和2被X5和X6取代。

Could somebody please tell me why this happens and is there a right way to do this without losing column names?

有人能告诉我为什么会发生这种情况吗?有没有一种正确的方法可以在不丢失列名的情况下做到这一点?

A shotgun solution would be to save the names in an auxiliary vector and then add them back when finished working on the data frame.

散弹枪的解决方案是将名称保存在辅助向量中,然后在处理完数据帧后将它们添加回。

Thanks

谢谢

Context:

背景:

I created a function which gathers some data and adds them as a new row to a data frame received as a parameter. I create the data frame, iterate through my data sources, passing the data.frame to each function call to be filled up with its results.

我创建了一个函数,该函数收集一些数据,并将它们作为新行添加到作为参数接收的数据帧中。我创建数据帧,遍历我的数据源,将data.frame传递给每个函数调用以填充其结果。

8 个解决方案

#1


30  

The rbind help pages specifies that :

rbind帮助页面指定:

For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’) are ignored unless the result would have zero rows (columns), for S compatibility. (Zero-extent matrices do not occur in S3 and are not ignored in R.)

对于' cbind ' (' rbind '),零长度的向量(包括' NULL ')将被忽略,除非结果为零行(列),以满足S的兼容性。(零范围矩阵在S3中不存在,在r中不被忽略)

So, in fact, a is ignored in your rbind instruction. Not totally ignored, it seems, because as it is a data frame the rbind function is called as rbind.data.frame :

实际上,你的rbind指令中忽略了a。似乎并没有完全被忽略,因为它是一个数据框架,rbind函数被称为rbind.data.frame:

rbind.data.frame(c(5,6))
#  X5 X6
#1  5  6

Maybe one way to insert the row could be :

或许插入行的一种方法是:

a[nrow(a)+1,] <- c(5,6)
a
#  one two
#1   5   6

But there may be a better way to do it depending on your code.

但是根据您的代码,可能有更好的方法来实现它。

#2


9  

was almost surrendering to this issue.

在这个问题上几乎要屈服了。

1) create data frame with stringsAsFactor set to FALSE or you run straight into the next issue

1)使用stringsAsFactor设置为FALSE创建数据帧,否则直接进入下一个问题

2) don't use rbind - no idea why on earth it is messing up the column names. simply do it this way:

2)不要使用rbind(不知道为什么它会把列名搞砸)。简单地这样做:

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df(nrow(df)+ 1]< - c(“d”,“gsgsgd”,4)

df <- data.frame(a = character(0), b=character(0), c=numeric(0))

df[nrow(df)+1,] <- c("d","gsgsgd",4)

#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
#  invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
#  invalid factor level, NAs generated

df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df
#  a      b c
#1 d gsgsgd 4

#3


8  

Workaround would be:

解决方法是:

a <- rbind(a, data.frame(one = 5, two = 6))

?rbind states that merging objects demands matching names:

?rbind表示合并对象需要匹配名称:

It then takes the classes of the columns from the first data frame, and matches columns by name (rather than by position)

然后,它从第一个数据框架中获取列的类,并按名称(而不是按位置)匹配列

#4


7  

FWIW, an alternative design might have your functions building vectors for the two columns, instead of rbinding to a data frame:

FWIW是一种替代设计,它可以让你为这两列构建向量,而不是绑定到数据框架:

ones <- c()
twos <- c()

Modify the vectors in your functions:

修改函数中的向量:

ones <- append(ones, 5)
twos <- append(twos, 6)

Repeat as needed, then create your data.frame in one go:

根据需要重复,然后创建您的数据。

a <- data.frame(one=ones, two=twos)

#5


1  

You can do this:

你可以这样做:

give one row to the initial data frame

向初始数据帧提供一行。

 df=data.frame(matrix(nrow=1,ncol=length(newrow))

add your new row and take out the NAS

添加新的行并取出NAS

newdf=na.omit(rbind(newrow,df))

but watch out that your newrow does not have NAs or it will be erased too.

但是要注意,你的newrow没有NAs,否则它也会被删除。

Cheers Agus

欢呼声阿古斯

#6


0  

I use the following solution to add a row to an empty data frame:

我使用以下解决方案向空数据框中添加一行:

d_dataset <- 
  data.frame(
    variable = character(),
    before = numeric(),
    after = numeric(),
    stringsAsFactors = FALSE)

d_dataset <- 
  rbind(
    d_dataset,
      data.frame(
        variable = "test",
        before = 9,
        after = 12,
        stringsAsFactors = FALSE))  

print(d_dataset)

variable before after  
1     test      9    12

HTH.

HTH。

Kind regards

亲切的问候

Georg

Georg

#7


0  

One way to make this work generically and with the least amount of re-typing the column names is the following. This method doesn't require hacking the NA or 0.

一种方法可以使此工作具有通用性,并且只需最少地重新输入列名,如下所示。这个方法不需要对NA或0进行黑客攻击。

rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
    calc <- c(i, i^2, i^3)
    # append calc to rs
    names(calc) <- names(rs)
    rs <- rbind(rs, as.list(calc))
}

rs will have the correct names

rs有正确的名字

> rs
    i square cube
1   1      1    1
2   2      4    8
3   3      9   27
4   4     16   64
> 

Another way to do this more cleanly is to use data.table:

另一种更简洁的方法是使用数据。

> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
>   X1 X2
> 1  1  2

> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
   a b
1: 1 2

Notice that a data.table is also a data.frame.

注意到一个数据。表也是一个data.frame。

> class(df)
"data.table" "data.frame"

#8


-1  

Instead of constructing the data.frame with numeric(0) I use as.numeric(0).

我使用asn .numeric(0)而不是使用数字(0)构造data.frame。

a<-data.frame(one=as.numeric(0), two=as.numeric(0))

This creates an extra initial row

这将创建一个额外的初始行

a
#    one two
#1   0   0

Bind the additional rows

把额外的行

a<-rbind(a,c(5,6))
a
#    one two
#1   0   0
#2   5   6

Then use negative indexing to remove the first (bogus) row

然后使用负索引删除第一行(伪)

a<-a[-1,]
a

#    one two
#2   5   6

Note: it messes up the index (far left). I haven't figured out how to prevent that (anyone else?), but most of the time it probably doesn't matter.

注意:它打乱了索引(最左)。我还没想好如何预防这种情况(还有其他人吗?),但大多数时候,这可能并不重要。

#1


30  

The rbind help pages specifies that :

rbind帮助页面指定:

For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’) are ignored unless the result would have zero rows (columns), for S compatibility. (Zero-extent matrices do not occur in S3 and are not ignored in R.)

对于' cbind ' (' rbind '),零长度的向量(包括' NULL ')将被忽略,除非结果为零行(列),以满足S的兼容性。(零范围矩阵在S3中不存在,在r中不被忽略)

So, in fact, a is ignored in your rbind instruction. Not totally ignored, it seems, because as it is a data frame the rbind function is called as rbind.data.frame :

实际上,你的rbind指令中忽略了a。似乎并没有完全被忽略,因为它是一个数据框架,rbind函数被称为rbind.data.frame:

rbind.data.frame(c(5,6))
#  X5 X6
#1  5  6

Maybe one way to insert the row could be :

或许插入行的一种方法是:

a[nrow(a)+1,] <- c(5,6)
a
#  one two
#1   5   6

But there may be a better way to do it depending on your code.

但是根据您的代码,可能有更好的方法来实现它。

#2


9  

was almost surrendering to this issue.

在这个问题上几乎要屈服了。

1) create data frame with stringsAsFactor set to FALSE or you run straight into the next issue

1)使用stringsAsFactor设置为FALSE创建数据帧,否则直接进入下一个问题

2) don't use rbind - no idea why on earth it is messing up the column names. simply do it this way:

2)不要使用rbind(不知道为什么它会把列名搞砸)。简单地这样做:

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df(nrow(df)+ 1]< - c(“d”,“gsgsgd”,4)

df <- data.frame(a = character(0), b=character(0), c=numeric(0))

df[nrow(df)+1,] <- c("d","gsgsgd",4)

#Warnmeldungen:
#1: In `[<-.factor`(`*tmp*`, iseq, value = "d") :
#  invalid factor level, NAs generated
#2: In `[<-.factor`(`*tmp*`, iseq, value = "gsgsgd") :
#  invalid factor level, NAs generated

df <- data.frame(a = character(0), b=character(0), c=numeric(0), stringsAsFactors=F)

df[nrow(df)+1,] <- c("d","gsgsgd",4)

df
#  a      b c
#1 d gsgsgd 4

#3


8  

Workaround would be:

解决方法是:

a <- rbind(a, data.frame(one = 5, two = 6))

?rbind states that merging objects demands matching names:

?rbind表示合并对象需要匹配名称:

It then takes the classes of the columns from the first data frame, and matches columns by name (rather than by position)

然后,它从第一个数据框架中获取列的类,并按名称(而不是按位置)匹配列

#4


7  

FWIW, an alternative design might have your functions building vectors for the two columns, instead of rbinding to a data frame:

FWIW是一种替代设计,它可以让你为这两列构建向量,而不是绑定到数据框架:

ones <- c()
twos <- c()

Modify the vectors in your functions:

修改函数中的向量:

ones <- append(ones, 5)
twos <- append(twos, 6)

Repeat as needed, then create your data.frame in one go:

根据需要重复,然后创建您的数据。

a <- data.frame(one=ones, two=twos)

#5


1  

You can do this:

你可以这样做:

give one row to the initial data frame

向初始数据帧提供一行。

 df=data.frame(matrix(nrow=1,ncol=length(newrow))

add your new row and take out the NAS

添加新的行并取出NAS

newdf=na.omit(rbind(newrow,df))

but watch out that your newrow does not have NAs or it will be erased too.

但是要注意,你的newrow没有NAs,否则它也会被删除。

Cheers Agus

欢呼声阿古斯

#6


0  

I use the following solution to add a row to an empty data frame:

我使用以下解决方案向空数据框中添加一行:

d_dataset <- 
  data.frame(
    variable = character(),
    before = numeric(),
    after = numeric(),
    stringsAsFactors = FALSE)

d_dataset <- 
  rbind(
    d_dataset,
      data.frame(
        variable = "test",
        before = 9,
        after = 12,
        stringsAsFactors = FALSE))  

print(d_dataset)

variable before after  
1     test      9    12

HTH.

HTH。

Kind regards

亲切的问候

Georg

Georg

#7


0  

One way to make this work generically and with the least amount of re-typing the column names is the following. This method doesn't require hacking the NA or 0.

一种方法可以使此工作具有通用性,并且只需最少地重新输入列名,如下所示。这个方法不需要对NA或0进行黑客攻击。

rs <- data.frame(i=numeric(), square=numeric(), cube=numeric())
for (i in 1:4) {
    calc <- c(i, i^2, i^3)
    # append calc to rs
    names(calc) <- names(rs)
    rs <- rbind(rs, as.list(calc))
}

rs will have the correct names

rs有正确的名字

> rs
    i square cube
1   1      1    1
2   2      4    8
3   3      9   27
4   4     16   64
> 

Another way to do this more cleanly is to use data.table:

另一种更简洁的方法是使用数据。

> df <- data.frame(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are messed up
>   X1 X2
> 1  1  2

> df <- data.table(a=numeric(0), b=numeric(0))
> rbind(df, list(1,2)) # column names are preserved
   a b
1: 1 2

Notice that a data.table is also a data.frame.

注意到一个数据。表也是一个data.frame。

> class(df)
"data.table" "data.frame"

#8


-1  

Instead of constructing the data.frame with numeric(0) I use as.numeric(0).

我使用asn .numeric(0)而不是使用数字(0)构造data.frame。

a<-data.frame(one=as.numeric(0), two=as.numeric(0))

This creates an extra initial row

这将创建一个额外的初始行

a
#    one two
#1   0   0

Bind the additional rows

把额外的行

a<-rbind(a,c(5,6))
a
#    one two
#1   0   0
#2   5   6

Then use negative indexing to remove the first (bogus) row

然后使用负索引删除第一行(伪)

a<-a[-1,]
a

#    one two
#2   5   6

Note: it messes up the index (far left). I haven't figured out how to prevent that (anyone else?), but most of the time it probably doesn't matter.

注意:它打乱了索引(最左)。我还没想好如何预防这种情况(还有其他人吗?),但大多数时候,这可能并不重要。