如何删除所有重复的内容,从而使数据帧中不存在重复的内容?

时间:2021-05-18 07:42:32

There is a similar question for PHP, but I'm working with R and am unable to translate the solution to my problem.

PHP也有类似的问题,但我正在使用R,无法将解决方案转换为我的问题。

I have this data frame with 10 rows and 50 columns, where some of the rows are absolutely identical. If I use unique on it, I get one row per - let's say - "type", but what I actually want is to get only those rows which only appear once. Does anyone know how I can achieve this?

我有一个包含10行和50列的数据框架,其中有些行是完全相同的。如果我在它上面使用unique,我就会得到每一行——比方说“type”,但是我真正想要的是只出现一次的那些行。有人知道我是怎么做到的吗?

I can have a look at clusters and heatmaps to sort it out manually, but I have bigger data frames than the one mentioned above (with up to 100 rows) where this gets a bit tricky.

我可以看一下集群和热图来手动进行排序,但是我有比上面提到的更大的数据帧(最多100行),这有点棘手。

1 个解决方案

#1


37  

This will extract the rows which appear only once (assuming your data frame is named df):

这将提取只出现一次的行(假设您的数据帧名为df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

工作原理:这个函数复制测试一行是否至少第二次出现,从第一行开始。如果使用了参数fromLast = TRUE,则函数从最后一行开始。

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

布思布尔结果与|(逻辑'或')组合成一个新的向量,该向量表示所有的线出现的次数都不止一次。结果是否定的使用!因此,创建一个布尔向量表示只出现一次的行。

#1


37  

This will extract the rows which appear only once (assuming your data frame is named df):

这将提取只出现一次的行(假设您的数据帧名为df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

工作原理:这个函数复制测试一行是否至少第二次出现,从第一行开始。如果使用了参数fromLast = TRUE,则函数从最后一行开始。

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

布思布尔结果与|(逻辑'或')组合成一个新的向量,该向量表示所有的线出现的次数都不止一次。结果是否定的使用!因此,创建一个布尔向量表示只出现一次的行。