如何删除重复数据的行（R）

I've done a quick search on this topic but haven't found anything from previous posts to address my question. It seems very straight forward but I've still not figured out how to do this efficiently.

我已经对这个主题进行了快速搜索，但没有找到以前帖子中的任何内容来解决我的问题。这看起来很直接，但我还没有想出如何有效地做到这一点。

In the data frame below, I'd like to delete all rows with a single entry (In this case B500 and D40).

在下面的数据框中，我想删除所有包含一个条目的行（在本例中为B500和D40）。

x_1 <- c("A1", "A1","A1", "B10", "B10", "B10","B10", 
            "B500", "C100", "C100", "C100", "D40", "G100", "G100")
   z_1 <- rnorm(14, 70) 
   z_2 <- rnorm(14, 1.7)
   A <- data.frame(x_1, z_1, z_2)

        x_1      z_1       z_2
1        A1 69.65033 1.5308858
2        A1 68.72687 2.2859416
3        A1 68.32700 0.7994794
4       B10 68.68382 0.5212132
5       B10 70.23359 1.3266729
6       B10 70.68604 4.3823605
7       B10 70.52774 2.2430322
8       B500 69.62868 3.0121398
9       C100 69.41412 2.1895905
10      C100 69.10745 1.7599065
11      C100 69.70876 1.6001099
12      D40 68.96542 0.7485665
13      G100 70.21754 1.9635395
14      G100 72.70583 3.0645247

I can do this manually by using:

我可以使用以下方法手动完成：

A[!A$x_1 %in% c("B500", "D40"), ]

Another way of doing this is using the table function below:

另一种方法是使用下面的表函数：

 table(A$x_1)

   A1  B10 B500 C100  D40 G100 
   3    4    1    3    1    2

Now, my problem is how do I select the entries with just the number 1 underneath them? If I can do this, I should be able to get the names and then delete them from the data frame.

现在，我的问题是如何选择下面只有数字1的条目？如果我能做到这一点，我应该能够获取名称，然后从数据框中删除它们。

Any useful ideas/codes would be highly appreciated.

任何有用的想法/代码将受到高度赞赏。

2 个解决方案

#1

Continuing on your table path. I assign your table to an object. The names of the desired table entries are then extracted and used to subset the data frame.

继续你的桌子路径。我将您的表分配给一个对象。然后提取所需表条目的名称并用于对数据帧进行子集化。

tt <- table(A$x_1)
A[!A$x_1 %in% names(tt[tt == 1]), ]

# or
A[A$x_1 %in% names(tt[tt > 1]), ]

#     x_1      z_1       z_2
# 1    A1 69.18667 0.8578626
# 2    A1 71.36819 2.8482506
# 3    A1 69.71246 1.9528315
# 4   B10 69.47145 1.7852872
# 5   B10 69.12699 0.7663739
# 6   B10 70.93589 1.1431804
# 7   B10 68.72273 0.6836297
# 9  C100 70.31252 2.4651336
# 10 C100 69.89168 1.9991948
# 11 C100 70.25079 1.0823843
# 13 G100 69.56992 2.0879085
# 14 G100 68.29589 2.5432109

#2

You can use duplicated twice:

您可以使用重复两次：

A[duplicated(A$x_1) | duplicated(A$x_1, fromLast = TRUE), ]

    x_1      z_1       z_2
1    A1 70.32176 2.5074802
2    A1 70.28238 1.8819723
3    A1 67.93057 2.1899037
4   B10 69.75905 1.8493991
5   B10 70.25713 2.6948229
6   B10 69.33121 0.2793853
7   B10 70.82879 2.2831781
9  C100 70.14587 1.0332913
10 C100 69.51571 0.2590098
11 C100 70.48928 1.8471024
13 G100 72.11057 0.6914086
14 G100 69.93814 2.4245214

For more information on how this works, see this answer.

有关其工作原理的详细信息，请参阅此答案。

#1