After reading mostly all the questions related to pair duplicates, no question address the following issue:
在阅读了大部分关于对重复的问题之后,没有问题解决以下问题:
Given a Df:
鉴于Df:
Letter
0 a
1 b
2 c
3 d
4 a
5 b
6 a
7 a
8 a
Eliminate only pairs of duplicates. For example: as the Df have 5 a's, the solution is to eliminate the first two set of pairs of a's and leave the last a (order is important). The two b's are just eliminated because they are a set of pairs. The resulting Df would look like this:
仅消除一对重复项。例如:因为Df有5个a,所以解决方法是消除前两个a对并留下最后一个(顺序很重要)。两个b刚刚被淘汰,因为它们是一组对。生成的Df看起来像这样:
Letter
2 c
3 d
8 a
I hope it was clear the issue. Thanks!
我希望这个问题很清楚。谢谢!
1 个解决方案
#1
0
You can first get rid of letters with even number of rows, then use drop_duplicates.
你可以先删除偶数行的字母,然后使用drop_duplicates。
df.groupby('Letter').filter(lambda x: len(x)%2>0).drop_duplicates(keep="last")
Out[174]:
Letter
2 c
3 d
8 a
#1
0
You can first get rid of letters with even number of rows, then use drop_duplicates.
你可以先删除偶数行的字母,然后使用drop_duplicates。
df.groupby('Letter').filter(lambda x: len(x)%2>0).drop_duplicates(keep="last")
Out[174]:
Letter
2 c
3 d
8 a