Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:
有没有人知道如何从数据中删除整个列?例如,如果给我这个数据。
> head(data)
chr genome region
1 chr1 hg19_refGene CDS
2 chr1 hg19_refGene exon
3 chr1 hg19_refGene CDS
4 chr1 hg19_refGene exon
5 chr1 hg19_refGene CDS
6 chr1 hg19_refGene exon
and I want to remove the 2nd column.
我想去掉第二列。
5 个解决方案
#1
308
You can set it to NULL
.
你可以把它设为空。
> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon
As pointed out in the comments, here are some other possibilities:
正如在评论中指出的,这里还有一些其他的可能性:
Data[2] <- NULL # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above
You can remove multiple columns via:
您可以通过以下方式删除多个列:
Data[1:2] <- list(NULL) # Marek
Data[1:2] <- NULL # does not work!
Be careful with matrix-subsetting though, as you can end up with a vector:
但是要小心使用矩阵-subsetting,因为你最终会得到一个向量:
Data <- Data[,-(2:3)] # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame
#2
53
To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset()
syntax. E.g. for the data-frame
要删除一个或多个列的名称,当已知列名时(而不是在运行时被确定),我喜欢子集()语法。如对数据帧
df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)
to remove just the a
column you could do
只需要删除一个列就可以了。
Data <- subset( Data, select = -a )
and to remove the b
and d
columns you could do
去掉b和d列。
Data <- subset( Data, select = -c(d, b ) )
You can remove all columns between d
and b
with:
你可以移除d和b之间的所有列:
Data <- subset( Data, select = -c( d : b )
As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (i.e. assigned to a variable). I'll reproduce this Warning from the ?subset
documentation:
如上所述,只有在已知列名时,此语法才起作用。当使用编程方式确定列名时,它不会起作用(例如,赋值给一个变量)。我将从子集文档中复制这个警告:
Warning:
警告:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like '[', and in particular the non-standard evaluation of argument 'subset' can have unanticipated consequences.
这是一个方便的功能,旨在交互式地使用。对于编程来说,最好使用像“[”这样的标准的子设置函数,特别是对参数子集的非标准评估可能会产生意料之外的结果。
#3
18
The posted answers are very good when working with data.frame
s. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due to out of memory
errors. Package data.table
helps address this problem with the :=
operator:
在使用数据框时,发布的答案非常好。但是,从内存的角度来看,这些任务可能非常低效。对于大数据,删除一个列可以花费非常长的时间和/或由于内存错误而失败。包数据。table帮助解决这个问题:=操作符:
library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
b c
[1,] 1 1
I should put together a bigger example to show the differences. I'll update this answer at some point with that.
我应该用一个更大的例子来说明差异。我会在某个时候更新这个答案。
#4
11
(For completeness) If you want to remove columns by name, you can do this:
(为了完整性)如果您想要删除列的名称,您可以这样做:
cols.dont.want <- "genome"
cols.dont.want <- c("genome", "region") # if you want to remove multiple columns
data <- data[, ! names(data) %in% cols.dont.want, drop = F]
Including drop = F
ensures that the result will still be a data.frame
even if only one column remains.
包括drop = F,确保结果仍然是一个数据。即使只有一个列仍然存在。
#5
1
With this you can remove the column
and store variable
into another variable
.
这样,您可以将列和存储变量删除到另一个变量中。
df = subset(data, select = -c(genome) )
#1
308
You can set it to NULL
.
你可以把它设为空。
> Data$genome <- NULL
> head(Data)
chr region
1 chr1 CDS
2 chr1 exon
3 chr1 CDS
4 chr1 exon
5 chr1 CDS
6 chr1 exon
As pointed out in the comments, here are some other possibilities:
正如在评论中指出的,这里还有一些其他的可能性:
Data[2] <- NULL # Wojciech Sobala
Data[[2]] <- NULL # same as above
Data <- Data[,-2] # Ian Fellows
Data <- Data[-2] # same as above
You can remove multiple columns via:
您可以通过以下方式删除多个列:
Data[1:2] <- list(NULL) # Marek
Data[1:2] <- NULL # does not work!
Be careful with matrix-subsetting though, as you can end up with a vector:
但是要小心使用矩阵-subsetting,因为你最终会得到一个向量:
Data <- Data[,-(2:3)] # vector
Data <- Data[,-(2:3),drop=FALSE] # still a data.frame
#2
53
To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset()
syntax. E.g. for the data-frame
要删除一个或多个列的名称,当已知列名时(而不是在运行时被确定),我喜欢子集()语法。如对数据帧
df <- data.frame(a=1:3, d=2:4, c=3:5, b=4:6)
to remove just the a
column you could do
只需要删除一个列就可以了。
Data <- subset( Data, select = -a )
and to remove the b
and d
columns you could do
去掉b和d列。
Data <- subset( Data, select = -c(d, b ) )
You can remove all columns between d
and b
with:
你可以移除d和b之间的所有列:
Data <- subset( Data, select = -c( d : b )
As I said above, this syntax works only when the column names are known. It won't work when say the column names are determined programmatically (i.e. assigned to a variable). I'll reproduce this Warning from the ?subset
documentation:
如上所述,只有在已知列名时,此语法才起作用。当使用编程方式确定列名时,它不会起作用(例如,赋值给一个变量)。我将从子集文档中复制这个警告:
Warning:
警告:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like '[', and in particular the non-standard evaluation of argument 'subset' can have unanticipated consequences.
这是一个方便的功能,旨在交互式地使用。对于编程来说,最好使用像“[”这样的标准的子设置函数,特别是对参数子集的非标准评估可能会产生意料之外的结果。
#3
18
The posted answers are very good when working with data.frame
s. However, these tasks can be pretty inefficient from a memory perspective. With large data, removing a column can take an unusually long amount of time and/or fail due to out of memory
errors. Package data.table
helps address this problem with the :=
operator:
在使用数据框时,发布的答案非常好。但是,从内存的角度来看,这些任务可能非常低效。对于大数据,删除一个列可以花费非常长的时间和/或由于内存错误而失败。包数据。table帮助解决这个问题:=操作符:
library(data.table)
> dt <- data.table(a = 1, b = 1, c = 1)
> dt[,a:=NULL]
b c
[1,] 1 1
I should put together a bigger example to show the differences. I'll update this answer at some point with that.
我应该用一个更大的例子来说明差异。我会在某个时候更新这个答案。
#4
11
(For completeness) If you want to remove columns by name, you can do this:
(为了完整性)如果您想要删除列的名称,您可以这样做:
cols.dont.want <- "genome"
cols.dont.want <- c("genome", "region") # if you want to remove multiple columns
data <- data[, ! names(data) %in% cols.dont.want, drop = F]
Including drop = F
ensures that the result will still be a data.frame
even if only one column remains.
包括drop = F,确保结果仍然是一个数据。即使只有一个列仍然存在。
#5
1
With this you can remove the column
and store variable
into another variable
.
这样,您可以将列和存储变量删除到另一个变量中。
df = subset(data, select = -c(genome) )