如何在不丢失属性的情况下从data.frame中删除一行

时间:2020-12-27 20:11:06

for starters: I searched for hours on this problem by now - so if the answer should be trivial, please forgive me...

首先:我已经在这个问题上找了好几个小时了——所以如果答案是微不足道的,请原谅我……

What I want to do is delete a row (no. 101) from a data.frame. It contains test data and should not appear in my analyses. My problem is: Whenever I subset from the data.frame, the attributes (esp. comments) are lost.

我要做的是删除一行(不是。从data.frame 101)。它包含测试数据,不应该出现在我的分析中。我的问题是:每当我从data.frame中子集时,属性(尤其是注释)就会丢失。

str(x)
# x has comments for each variable
x <- x[1:100,]
str(x)
# now x has lost all comments

It is well documented that subsetting will drop all attributes - so far, it's perfectly clear. The manual (e.g. http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html) even suggests a way to preserve the attributes:

有充分的文件证明,子设置将删除所有属性——到目前为止,这是完全清楚的。手册(例如http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html)甚至提出了一种保存属性的方法:

## keeping special attributes: use a class with a
## "as.data.frame" and "[" method:


as.data.frame.avector <- as.data.frame.vector

`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

d <- data.frame(i= 0:7, f= gl(2,4),
                u= structure(11:18, unit = "kg", class="avector"))
str(d[2:4, -1]) # 'u' keeps its "unit"

I am not yet so far into R to understand what exactly happens here. However, simply running these lines (except the last three) does not change the behavior of my subsetting. Using the command subset() with an appropriate vector (100-times TRUE + 1 FALSE) gives me the same result. And simply storing the attributes to a variable and restoring it after the subset, does not work, either.

我还没有深入到R中去了解这里到底发生了什么。然而,简单地运行这些行(除了最后三个)不会改变我的子设置的行为。使用带有适当向量的命令子集()(100乘以TRUE + 1 FALSE)会得到相同的结果。简单地将属性存储到一个变量中并在子集之后恢复它,同样也不起作用。

# Does not work...
tmp <- attributes(x)
x <- x[1:100,]
attributes(x) <- tmp

Of course, I could write all comments to a vector (var=>comment), subset and write them back using a loop - but that does not seem a well-founded solution. And I am quite sure I will encounter datasets with other relevant attributes in future analyses.

当然,我可以将所有注释写到一个向量(var=>注释),子集并使用循环将它们写回——但这似乎不是一个有充分根据的解决方案。我很肯定,在以后的分析中,我将遇到具有其他相关属性的数据集。

So this is where my efforts in *, Google, and brain power got stuck. I would very much appreciate if anyone could help me out with a hint. Thanks!

这就是我在*上的努力,谷歌,和脑力停滞的地方。如果有人能给我一个提示,我将不胜感激。谢谢!

4 个解决方案

#1


10  

If I understand you correctly, you have some data in a data.frame, and the columns of the data.frame have comments associated with them. Perhaps something like the following?

如果我正确地理解了您,您在data.frame中有一些数据,并且data.frame的列有相关的注释。也许像下面这样?

set.seed(1)

mydf<-data.frame(aa=rpois(100,4),bb=sample(LETTERS[1:5],
  100,replace=TRUE))

comment(mydf$aa)<-"Don't drop me!"
comment(mydf$bb)<-"Me either!"

So this would give you something like

这就得到了

> str(mydf)
'data.frame':   100 obs. of  2 variables:
 $ aa: atomic  3 3 4 7 2 7 7 5 5 1 ...
  ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2 2 5 4 2 1 3 5 3 ...
  ..- attr(*, "comment")= chr "Me either!"

And when you subset this, the comments are dropped:

当你对它进行子集划分时,评论就会被删除:

> str(mydf[1:2,]) # comment dropped.
'data.frame':   2 obs. of  2 variables:
 $ aa: num  3 3
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2

To preserve the comments, define the function [.avector, as you did above (from the documentation) then add the appropriate class attributes to each of the columns in your data.frame (EDIT: to keep the factor levels of bb, add "factor" to the class of bb.):

要保存注释,请定义函数[。avector,正如您在上面所做的(从文档中),然后向您的data.frame中的每一列添加适当的类属性(编辑:为了保持bb的因子级别,向bb的类添加“factor”):

mydf$aa<-structure(mydf$aa, class="avector")
mydf$bb<-structure(mydf$bb, class=c("avector","factor"))

So that the comments are preserved:

以便保留意见:

> str(mydf[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Class 'avector'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"

EDIT:

编辑:

If there are many columns in your data.frame that have attributes you want to preserve, you could use lapply (EDITED to include original column class):

如果您的data.frame中有许多列需要保存,您可以使用lapply(经过编辑以包含原始列类):

mydf2 <- data.frame( lapply( mydf, function(x) {
  structure( x, class = c("avector", class(x) ) )
} ) )

However, this drops comments associated with the data.frame itself (such as comment(mydf)<-"I'm a data.frame"), so if you have any, assign them to the new data.frame:

但是,这会删除与data.frame本身相关的注释(例如comment(mydf)<-"I'm a data.frame"),因此如果您有的话,请将它们分配给新的data.frame:

comment(mydf2)<-comment(mydf)

And then you have

然后你有

> str(mydf2[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Classes 'avector', 'numeric'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"
 - attr(*, "comment")= chr "I'm a data.frame"

#2


4  

For those who look for the "all-in" solution based on BenBarnes explanation: Here it is.

对于那些寻找基于BenBarnes的“all-in”解决方案的人来说:这里。

(give the your "up" to the post from BenBarnes if this is working for you)

(如果这对你有用的话,请把你的“up”写在本巴恩斯的帖子上)

# Define the avector-subselection method (from the manual)
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

# Assign each column in the data.frame the (additional) class avector
# Note that this will "lose" the data.frame's attributes, therefore write to a copy
df2 <- data.frame(
  lapply(df, function(x) {
    structure( x, class = c("avector", class(x) ) )
  } )
)

# Finally copy the attribute for the original data.frame if necessary
mostattributes(df2) <- attributes(df)

# Now subselects work without losing attributes :)
df2 <- df2[1:100,]
str(df2)

The good thing: When attached the class to all the data.frame's element once, the subselects never again bother attributes.

好处:当将类附加到所有的data.frame的元素时,子选择永远不会再次干扰属性。

Okay - sometimes I am stunned how complicated it is to do the most simple operations in R. But I surely did not learn about the "classes" feature if I just marked and deleted the case in SPSS ;)

好吧——有时我很惊讶在r中做最简单的操作是多么复杂,但如果我只是在SPSS中标记和删除这个案例,我肯定没有了解到“classes”特性;

#3


1  

This is solved by the sticky package. (Full Disclosure: I am the package author.) Apply the sticky() to your vectors and the attributes are preserved through subset operations. For example:

这是通过粘性的包来解决的。(详细说明:我是包的作者。)将sticky()应用到向量上,通过子集操作保存属性。例如:

> df <- data.frame( 
+   sticky   = sticky( structure(1:5, comment="sticky attribute") ),
+   nonstick = structure( letters[1:5], comment="non-sticky attribute" )
+ )
> 
> comment(df[1:3, "nonstick"])
NULL
> comment(df[1:3, "sticky"])
[1] "sticky attribute"

This works for any attribute and not only comment.

这适用于任何属性,而不仅仅是注释。

See the sticky package for details:

详情请参阅“粘性包装”:

#4


0  

I spent hours trying to figure out how to retain attribute data (specifically variable labels) when subsetting a dataframe (removing columns). The answer was so simple, I couldn't believe it. Just use the function spss.get from the Hmisc package, and then no matter how you subset, the variable labels are retained.

在设置dataframe(删除列)时,我花了好几个小时试图找出如何保留属性数据(特别是变量标签)。答案如此简单,我简直不敢相信。使用spss函数。从Hmisc包中获取,然后无论如何子集,都保留变量标签。

#1


10  

If I understand you correctly, you have some data in a data.frame, and the columns of the data.frame have comments associated with them. Perhaps something like the following?

如果我正确地理解了您,您在data.frame中有一些数据,并且data.frame的列有相关的注释。也许像下面这样?

set.seed(1)

mydf<-data.frame(aa=rpois(100,4),bb=sample(LETTERS[1:5],
  100,replace=TRUE))

comment(mydf$aa)<-"Don't drop me!"
comment(mydf$bb)<-"Me either!"

So this would give you something like

这就得到了

> str(mydf)
'data.frame':   100 obs. of  2 variables:
 $ aa: atomic  3 3 4 7 2 7 7 5 5 1 ...
  ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2 2 5 4 2 1 3 5 3 ...
  ..- attr(*, "comment")= chr "Me either!"

And when you subset this, the comments are dropped:

当你对它进行子集划分时,评论就会被删除:

> str(mydf[1:2,]) # comment dropped.
'data.frame':   2 obs. of  2 variables:
 $ aa: num  3 3
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2

To preserve the comments, define the function [.avector, as you did above (from the documentation) then add the appropriate class attributes to each of the columns in your data.frame (EDIT: to keep the factor levels of bb, add "factor" to the class of bb.):

要保存注释,请定义函数[。avector,正如您在上面所做的(从文档中),然后向您的data.frame中的每一列添加适当的类属性(编辑:为了保持bb的因子级别,向bb的类添加“factor”):

mydf$aa<-structure(mydf$aa, class="avector")
mydf$bb<-structure(mydf$bb, class=c("avector","factor"))

So that the comments are preserved:

以便保留意见:

> str(mydf[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Class 'avector'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"

EDIT:

编辑:

If there are many columns in your data.frame that have attributes you want to preserve, you could use lapply (EDITED to include original column class):

如果您的data.frame中有许多列需要保存,您可以使用lapply(经过编辑以包含原始列类):

mydf2 <- data.frame( lapply( mydf, function(x) {
  structure( x, class = c("avector", class(x) ) )
} ) )

However, this drops comments associated with the data.frame itself (such as comment(mydf)<-"I'm a data.frame"), so if you have any, assign them to the new data.frame:

但是,这会删除与data.frame本身相关的注释(例如comment(mydf)<-"I'm a data.frame"),因此如果您有的话,请将它们分配给新的data.frame:

comment(mydf2)<-comment(mydf)

And then you have

然后你有

> str(mydf2[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Classes 'avector', 'numeric'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"
 - attr(*, "comment")= chr "I'm a data.frame"

#2


4  

For those who look for the "all-in" solution based on BenBarnes explanation: Here it is.

对于那些寻找基于BenBarnes的“all-in”解决方案的人来说:这里。

(give the your "up" to the post from BenBarnes if this is working for you)

(如果这对你有用的话,请把你的“up”写在本巴恩斯的帖子上)

# Define the avector-subselection method (from the manual)
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

# Assign each column in the data.frame the (additional) class avector
# Note that this will "lose" the data.frame's attributes, therefore write to a copy
df2 <- data.frame(
  lapply(df, function(x) {
    structure( x, class = c("avector", class(x) ) )
  } )
)

# Finally copy the attribute for the original data.frame if necessary
mostattributes(df2) <- attributes(df)

# Now subselects work without losing attributes :)
df2 <- df2[1:100,]
str(df2)

The good thing: When attached the class to all the data.frame's element once, the subselects never again bother attributes.

好处:当将类附加到所有的data.frame的元素时,子选择永远不会再次干扰属性。

Okay - sometimes I am stunned how complicated it is to do the most simple operations in R. But I surely did not learn about the "classes" feature if I just marked and deleted the case in SPSS ;)

好吧——有时我很惊讶在r中做最简单的操作是多么复杂,但如果我只是在SPSS中标记和删除这个案例,我肯定没有了解到“classes”特性;

#3


1  

This is solved by the sticky package. (Full Disclosure: I am the package author.) Apply the sticky() to your vectors and the attributes are preserved through subset operations. For example:

这是通过粘性的包来解决的。(详细说明:我是包的作者。)将sticky()应用到向量上,通过子集操作保存属性。例如:

> df <- data.frame( 
+   sticky   = sticky( structure(1:5, comment="sticky attribute") ),
+   nonstick = structure( letters[1:5], comment="non-sticky attribute" )
+ )
> 
> comment(df[1:3, "nonstick"])
NULL
> comment(df[1:3, "sticky"])
[1] "sticky attribute"

This works for any attribute and not only comment.

这适用于任何属性,而不仅仅是注释。

See the sticky package for details:

详情请参阅“粘性包装”:

#4


0  

I spent hours trying to figure out how to retain attribute data (specifically variable labels) when subsetting a dataframe (removing columns). The answer was so simple, I couldn't believe it. Just use the function spss.get from the Hmisc package, and then no matter how you subset, the variable labels are retained.

在设置dataframe(删除列)时,我花了好几个小时试图找出如何保留属性数据(特别是变量标签)。答案如此简单,我简直不敢相信。使用spss函数。从Hmisc包中获取,然后无论如何子集,都保留变量标签。