如何在data.table中按名称删除列?

时间:2022-11-18 21:08:13

To get rid of a column named "foo" in a data.frame, I can do:

为了在数据中删除一个名为“foo”的列,我可以这样做:

df <- df[-grep('foo', colnames(df))]

df < - df[grep(“foo”,colnames(df)))

However, once df is converted to a data.table object, there is no way to just remove a column.

然而,一旦df被转换成数据。表对象,没有办法只删除列。

Example:

例子:

df <- data.frame(id = 1:100, foo = rnorm(100))
df2 <- df[-grep('foo', colnames(df))] # works
df3 <- data.table(df)
df3[-grep('foo', colnames(df3))] 

But once it is converted to a data.table object, this no longer works.

但是一旦它被转换成数据。表对象,这不再有效。

8 个解决方案

#1


212  

Any of the following will remove column foo from the data.table df3:

下面的任何一个都将从数据中删除列foo。表df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]

df3[, c("foo","bar"):=NULL]  # remove two columns

myVar = "foo"
df3[, (myVar):=NULL]   # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

数据。表还支持以下语法:

## Method 3 (could then assign to df3, 
df3[, !"foo", with=FALSE]  

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

尽管如果您实际上想要从df3中删除列“foo”(而不是仅仅打印df3的视图减去列“foo”),您确实需要使用方法1。

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

(注意,如果你使用一个方法依靠grep()或grepl(),您需要设置模式=“^ foo $”而不是“foo”,如果你不想列名称如“傻瓜”和“小丑”(即那些包含子字符串foo)也是匹配和删除。)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

如果df3包含一个匹配“foo”的列,那么接下来的两个习惯用法也会起作用,但是如果不这样做,可能会以一种出乎意料的方式失败。例如,如果您使用其中的任何一个来搜索不存在的列“bar”,您将得到一个零行data.table。

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

因此,它们非常适合交互使用,例如,想要显示数据。表减去任何包含子字符串“foo”的列。对于编程目的(或者如果您想从df3中而不是从它的副本中删除列),方法1、2a和2b才是最好的选择。

# Method 4a:
df3[, -grep("^foo$", colnames(df3)), with=FALSE]

# Method 4b: 
df3[, !grepl("^foo$", colnames(df3)), with=FALSE]

#2


25  

You can also user set for this, which avoids the overhead of [.data.table in loops:

您还可以为此设置用户,这样可以避免[.data]的开销。表在循环中:

dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )
set( dt, j=c(1L,3L,5L), value=NULL )
> dt[1:5]
   b d
1: A a
2: B b
3: C c
4: D d
5: E e

If you want to do it by column name, which(colnames(dt) %in% c("a","c","e")) should work for j.

如果您想按列名进行,那么在% c(“a”、“c”、“e”)中colnames(colnames(dt) %应该适用于j。

#3


15  

I simply do it in the data frame kind of way:

我只是在数据框架中这样做:

DT$col = NULL

Works fast and as far as I could see doesn't cause any problems.

工作得快,就我所能看到的,不会造成任何问题。

UPDATE: not the best method if your DT is very large, as using the $<- operator will lead to object copying. So better use:

更新:如果您的DT非常大,则不是最好的方法,因为使用$<-操作符将导致对象复制。所以更好的使用:

DT[, col:=NULL]

#4


2  

Very simple option in case you have many individual columns to delete in a data table and you want to avoid typing in all column names #careadviced

如果在数据表中有许多单独的列要删除,并且希望避免输入所有列名称#careadviced,那么这个选项非常简单

dt <- dt[, -c(1,4,6,17,83,104), with =F]

This will remove columns based on column number instead.

这将删除基于列号的列。

It's obviously not as efficient because it bypasses data.table advantages but if you're working with less than say 500,000 rows it works fine

它显然没有那么高效,因为它绕过了数据。表优势,但如果你的工作少于50万行,它就能正常工作。

#5


0  

Suppose your dt has columns col1, col2, col3, col4, col5, coln.

假设dt有列col1 col2 col3 col4 col5 coln。

To delete a subset of them:

删除其中的一个子集:

vx <- as.character(bquote(c(col1, col2, col3, coln)))[-1]
DT[, paste0(vx):=NULL]

#6


-2  

Here is a way when you want to set a # of columns to NULL given their column names a function for your usage :)

这里有一种方法,当您想要将列名称设置为NULL时,给定它们的列名称为您使用的函数:)

deleteColsFromDataTable <- function (train, toDeleteColNames) {

<-函数(train, todeletecolsfromdatatable)

   for (myNm in toDeleteColNames)

   train <- train [,(myNm):=NULL,with=F]

   return (train)

}

}

#7


-2  

DT[,c:=NULL] # remove column c

#8


-7  

For a data.table, assigning the column to NULL removes it:

对于一个数据。表中,将列赋为NULL删除:

DT[,c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the extra comma if DT is a data.table

... which is the equivalent of:

…等于:

DT$col1 <- NULL
DT$col2 <- NULL
DT$col3 <- NULL
DT$col4 <- NULL

The equivalent for a data.frame is:

对于数据,框架的等效是:

DF[c("col1", "col1", "col2", "col2")] <- NULL
      ^
      |---- Notice the missing comma if DF is a data.frame

Q. Why is there a comma in the version for data.table, and no comma in the version for data.frame?

问:为什么数据版本中有逗号?在data.frame版本中没有逗号?

A. As data.frames are stored as a list of columns, you can skip the comma. You could also add it in, however then you will need to assign them to a list of NULLs, DF[, c("col1", "col2", "col3")] <- list(NULL).

当数据。帧存储为列的列表时,您可以跳过逗号。您也可以添加它,但是您需要将它们分配到一个空列表,DF[, c(“col1”,“col2”,“col3”)]<- list(NULL)。

#1


212  

Any of the following will remove column foo from the data.table df3:

下面的任何一个都将从数据中删除列foo。表df3:

# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]

df3[, c("foo","bar"):=NULL]  # remove two columns

myVar = "foo"
df3[, (myVar):=NULL]   # lookup myVar contents

# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]

# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]

data.table also supports the following syntax:

数据。表还支持以下语法:

## Method 3 (could then assign to df3, 
df3[, !"foo", with=FALSE]  

though if you were actually wanting to remove column "foo" from df3 (as opposed to just printing a view of df3 minus column "foo") you'd really want to use Method 1 instead.

尽管如果您实际上想要从df3中删除列“foo”(而不是仅仅打印df3的视图减去列“foo”),您确实需要使用方法1。

(Do note that if you use a method relying on grep() or grepl(), you need to set pattern="^foo$" rather than "foo", if you don't want columns with names like "fool" and "buffoon" (i.e. those containing foo as a substring) to also be matched and removed.)

(注意,如果你使用一个方法依靠grep()或grepl(),您需要设置模式=“^ foo $”而不是“foo”,如果你不想列名称如“傻瓜”和“小丑”(即那些包含子字符串foo)也是匹配和删除。)

Less safe options, fine for interactive use:

The next two idioms will also work -- if df3 contains a column matching "foo" -- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar", you'll end up with a zero-row data.table.

如果df3包含一个匹配“foo”的列,那么接下来的两个习惯用法也会起作用,但是如果不这样做,可能会以一种出乎意料的方式失败。例如,如果您使用其中的任何一个来搜索不存在的列“bar”,您将得到一个零行data.table。

As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo". For programming purposes (or if you are wanting to actually remove the column(s) from df3 rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.

因此,它们非常适合交互使用,例如,想要显示数据。表减去任何包含子字符串“foo”的列。对于编程目的(或者如果您想从df3中而不是从它的副本中删除列),方法1、2a和2b才是最好的选择。

# Method 4a:
df3[, -grep("^foo$", colnames(df3)), with=FALSE]

# Method 4b: 
df3[, !grepl("^foo$", colnames(df3)), with=FALSE]

#2


25  

You can also user set for this, which avoids the overhead of [.data.table in loops:

您还可以为此设置用户,这样可以避免[.data]的开销。表在循环中:

dt <- data.table( a=letters, b=LETTERS, c=seq(26), d=letters, e=letters )
set( dt, j=c(1L,3L,5L), value=NULL )
> dt[1:5]
   b d
1: A a
2: B b
3: C c
4: D d
5: E e

If you want to do it by column name, which(colnames(dt) %in% c("a","c","e")) should work for j.

如果您想按列名进行,那么在% c(“a”、“c”、“e”)中colnames(colnames(dt) %应该适用于j。

#3


15  

I simply do it in the data frame kind of way:

我只是在数据框架中这样做:

DT$col = NULL

Works fast and as far as I could see doesn't cause any problems.

工作得快,就我所能看到的,不会造成任何问题。

UPDATE: not the best method if your DT is very large, as using the $<- operator will lead to object copying. So better use:

更新:如果您的DT非常大,则不是最好的方法,因为使用$<-操作符将导致对象复制。所以更好的使用:

DT[, col:=NULL]

#4


2  

Very simple option in case you have many individual columns to delete in a data table and you want to avoid typing in all column names #careadviced

如果在数据表中有许多单独的列要删除,并且希望避免输入所有列名称#careadviced,那么这个选项非常简单

dt <- dt[, -c(1,4,6,17,83,104), with =F]

This will remove columns based on column number instead.

这将删除基于列号的列。

It's obviously not as efficient because it bypasses data.table advantages but if you're working with less than say 500,000 rows it works fine

它显然没有那么高效,因为它绕过了数据。表优势,但如果你的工作少于50万行,它就能正常工作。

#5


0  

Suppose your dt has columns col1, col2, col3, col4, col5, coln.

假设dt有列col1 col2 col3 col4 col5 coln。

To delete a subset of them:

删除其中的一个子集:

vx <- as.character(bquote(c(col1, col2, col3, coln)))[-1]
DT[, paste0(vx):=NULL]

#6


-2  

Here is a way when you want to set a # of columns to NULL given their column names a function for your usage :)

这里有一种方法,当您想要将列名称设置为NULL时,给定它们的列名称为您使用的函数:)

deleteColsFromDataTable <- function (train, toDeleteColNames) {

<-函数(train, todeletecolsfromdatatable)

   for (myNm in toDeleteColNames)

   train <- train [,(myNm):=NULL,with=F]

   return (train)

}

}

#7


-2  

DT[,c:=NULL] # remove column c

#8


-7  

For a data.table, assigning the column to NULL removes it:

对于一个数据。表中,将列赋为NULL删除:

DT[,c("col1", "col1", "col2", "col2")] <- NULL
^
|---- Notice the extra comma if DT is a data.table

... which is the equivalent of:

…等于:

DT$col1 <- NULL
DT$col2 <- NULL
DT$col3 <- NULL
DT$col4 <- NULL

The equivalent for a data.frame is:

对于数据,框架的等效是:

DF[c("col1", "col1", "col2", "col2")] <- NULL
      ^
      |---- Notice the missing comma if DF is a data.frame

Q. Why is there a comma in the version for data.table, and no comma in the version for data.frame?

问:为什么数据版本中有逗号?在data.frame版本中没有逗号?

A. As data.frames are stored as a list of columns, you can skip the comma. You could also add it in, however then you will need to assign them to a list of NULLs, DF[, c("col1", "col2", "col3")] <- list(NULL).

当数据。帧存储为列的列表时,您可以跳过逗号。您也可以添加它,但是您需要将它们分配到一个空列表,DF[, c(“col1”,“col2”,“col3”)]<- list(NULL)。