如何删除数据帧中的行?

时间:2022-04-04 01:21:52

I have a data frame named "mydata" that looks like this this:

我有一个名为"mydata"的数据帧,它看起来像这样:

   A  B  C   D 
1. 5  4  4   4 
2. 5  4  4   4 
3. 5  4  4   4 
4. 5  4  4   4 
5. 5  4  4   4 
6. 5  4  4   4 
7. 5  4  4   4 

I'd like to delete row 2,4,6. For example, like this:

我想删除第2行,第4行,第6行。例如,像这样:

   A  B  C   D
1. 5  4  4  4 
3. 5  4  4  4 
5. 5  4  4  4 
7. 5  4  4  4 

4 个解决方案

#1


225  

The key idea is you form a set of the rows you want to remove, and keep the complement of that set.

关键的思想是,你要形成一组你想要移除的行,并保留该集合的补数。

In R, the complement of a set is given by the '-' operator.

在R中,集合的补数由“-”运算符给出。

So, assuming the data.frame is called myData:

假设数据。frame被称为myData:

myData[-c(2, 4, 6), ]   # notice the -

Of course, don't forget to "reassign" myData if you wanted to drop those rows entirely---otherwise, R just prints the results.

当然,如果您想完全删除这些行,不要忘记“重新分配”myData——否则,R只会打印结果。

myData <- myData[-c(2, 4, 6), ]

#2


58  

You can also work with a so called boolean vector, aka logical:

你也可以使用所谓的布尔向量,也就是逻辑:

row_to_keep = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE)
myData = myData[row_to_keep,]

Note that the ! operator acts as a NOT, i.e. !TRUE == FALSE:

注意!操作符作为一个NOT,即!TRUE == FALSE:

myData = myData[!row_to_keep,]

This seems a bit cumbersome in comparison to @mrwab's answer (+1 btw :)), but a logical vector can be generated on the fly, e.g. where a column value exceeds a certain value:

与@mrwab的答案(+1 btw:)相比,这看起来有点麻烦,但是可以在果蝇上生成一个逻辑向量,例如,列值超过某个值:

myData = myData[myData$A > 4,]
myData = myData[!myData$A > 4,] # equal to myData[myData$A <= 4,]

You can transform a boolean vector to a vector of indices:

您可以将布尔向量转换为索引向量:

row_to_keep = which(myData$A > 4)

Finally, a very neat trick is that you can use this kind of subsetting not only for extraction, but also for assignment:

最后,一个非常巧妙的技巧是,你可以使用这种子设置不仅用于提取,还可以用于赋值:

myData$A[myData$A > 4,] <- NA

where column A is assigned NA (not a number) where A exceeds 4.

列A被分配给NA(不是一个数字),其中A超过4。

#3


23  

Problems with deleting by row number

For quick and dirty analyses, you can delete rows of a data.frame by number as per the top answer. I.e.,

对于快速和不干净的分析,您可以删除数据的行。也就是说,

newdata <- myData[-c(2, 4, 6), ] 

However, if you are trying to write a robust data analysis script, you should generally avoid deleting rows by numeric position. This is because the order of the rows in your data may change in the future. A general principle of a data.frame or database tables is that the order of the rows should not matter. If the order does matter, this should be encoded in an actual variable in the data.frame.

但是,如果您正在尝试编写一个健壮的数据分析脚本,那么通常应该避免通过数字位置删除行。这是因为数据中的行顺序可能会在将来发生变化。框架或数据库表的一般原则是,行的顺序不重要。如果顺序很重要,那么应该在data.frame中的一个实际变量中进行编码。

For example, imagine you imported a dataset and deleted rows by numeric position after inspecting the data and identifying the row numbers of the rows that you wanted to delete. However, at some later point, you go into the raw data and have a look around and reorder the data. Your row deletion code will now delete the wrong rows, and worse, you are unlikely to get any errors warning you that this has occurred.

例如,假设您在检查数据并识别想要删除的行的行号之后,通过数字位置导入数据集和删除行。但是,在稍后的某个点,您将进入原始数据并查看并重新排序数据。您的行删除代码现在将删除错误的行,更糟糕的是,您不可能得到任何警告,警告您已经发生了这些错误。

Better strategy

A better strategy is to delete rows based on substantive and stable properties of the row. For example, if you had an id column variable that uniquely identifies each case, you could use that.

更好的策略是根据行的实质和稳定属性删除行。例如,如果您有一个id列变量,惟一地标识每个案例,您可以使用它。

newdata <- myData[ !(myData$id %in% c(2,4,6), ]

Other times, you will have a formal exclusion criteria that could be specified, and you could use one of the many subsetting tools in R to exclude cases based on that rule.

其他时候,您将有一个可以指定的正式排除标准,并且您可以使用R中的许多子设置工具之一来排除基于该规则的案例。

#4


3  

Create id column in your data frame or use any column name to identify the row. Using index is not fair to delete.

在您的数据框架中创建id列,或者使用任何列名称来标识该行。使用索引是不公平的。

Use subset function to create new frame.

使用子集函数创建新框架。

updated_myData <- subset(myData, id!= 6)
print (updated_myData)

updated_myData <- subset(myData, id %in% c(1, 3, 5, 7))
print (updated_myData)

#1


225  

The key idea is you form a set of the rows you want to remove, and keep the complement of that set.

关键的思想是,你要形成一组你想要移除的行,并保留该集合的补数。

In R, the complement of a set is given by the '-' operator.

在R中,集合的补数由“-”运算符给出。

So, assuming the data.frame is called myData:

假设数据。frame被称为myData:

myData[-c(2, 4, 6), ]   # notice the -

Of course, don't forget to "reassign" myData if you wanted to drop those rows entirely---otherwise, R just prints the results.

当然,如果您想完全删除这些行,不要忘记“重新分配”myData——否则,R只会打印结果。

myData <- myData[-c(2, 4, 6), ]

#2


58  

You can also work with a so called boolean vector, aka logical:

你也可以使用所谓的布尔向量,也就是逻辑:

row_to_keep = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE)
myData = myData[row_to_keep,]

Note that the ! operator acts as a NOT, i.e. !TRUE == FALSE:

注意!操作符作为一个NOT,即!TRUE == FALSE:

myData = myData[!row_to_keep,]

This seems a bit cumbersome in comparison to @mrwab's answer (+1 btw :)), but a logical vector can be generated on the fly, e.g. where a column value exceeds a certain value:

与@mrwab的答案(+1 btw:)相比,这看起来有点麻烦,但是可以在果蝇上生成一个逻辑向量,例如,列值超过某个值:

myData = myData[myData$A > 4,]
myData = myData[!myData$A > 4,] # equal to myData[myData$A <= 4,]

You can transform a boolean vector to a vector of indices:

您可以将布尔向量转换为索引向量:

row_to_keep = which(myData$A > 4)

Finally, a very neat trick is that you can use this kind of subsetting not only for extraction, but also for assignment:

最后,一个非常巧妙的技巧是,你可以使用这种子设置不仅用于提取,还可以用于赋值:

myData$A[myData$A > 4,] <- NA

where column A is assigned NA (not a number) where A exceeds 4.

列A被分配给NA(不是一个数字),其中A超过4。

#3


23  

Problems with deleting by row number

For quick and dirty analyses, you can delete rows of a data.frame by number as per the top answer. I.e.,

对于快速和不干净的分析,您可以删除数据的行。也就是说,

newdata <- myData[-c(2, 4, 6), ] 

However, if you are trying to write a robust data analysis script, you should generally avoid deleting rows by numeric position. This is because the order of the rows in your data may change in the future. A general principle of a data.frame or database tables is that the order of the rows should not matter. If the order does matter, this should be encoded in an actual variable in the data.frame.

但是,如果您正在尝试编写一个健壮的数据分析脚本,那么通常应该避免通过数字位置删除行。这是因为数据中的行顺序可能会在将来发生变化。框架或数据库表的一般原则是,行的顺序不重要。如果顺序很重要,那么应该在data.frame中的一个实际变量中进行编码。

For example, imagine you imported a dataset and deleted rows by numeric position after inspecting the data and identifying the row numbers of the rows that you wanted to delete. However, at some later point, you go into the raw data and have a look around and reorder the data. Your row deletion code will now delete the wrong rows, and worse, you are unlikely to get any errors warning you that this has occurred.

例如,假设您在检查数据并识别想要删除的行的行号之后,通过数字位置导入数据集和删除行。但是,在稍后的某个点,您将进入原始数据并查看并重新排序数据。您的行删除代码现在将删除错误的行,更糟糕的是,您不可能得到任何警告,警告您已经发生了这些错误。

Better strategy

A better strategy is to delete rows based on substantive and stable properties of the row. For example, if you had an id column variable that uniquely identifies each case, you could use that.

更好的策略是根据行的实质和稳定属性删除行。例如,如果您有一个id列变量,惟一地标识每个案例,您可以使用它。

newdata <- myData[ !(myData$id %in% c(2,4,6), ]

Other times, you will have a formal exclusion criteria that could be specified, and you could use one of the many subsetting tools in R to exclude cases based on that rule.

其他时候,您将有一个可以指定的正式排除标准,并且您可以使用R中的许多子设置工具之一来排除基于该规则的案例。

#4


3  

Create id column in your data frame or use any column name to identify the row. Using index is not fair to delete.

在您的数据框架中创建id列,或者使用任何列名称来标识该行。使用索引是不公平的。

Use subset function to create new frame.

使用子集函数创建新框架。

updated_myData <- subset(myData, id!= 6)
print (updated_myData)

updated_myData <- subset(myData, id %in% c(1, 3, 5, 7))
print (updated_myData)