pandas
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
import pandas as pd
import numpy as np
salaries = pd.DataFrame({
'name' : [ 'BOSS' , 'Lilei' , 'Lilei' , 'Han' , 'BOSS' , 'BOSS' , 'Han' , 'BOSS' ],
'Year' : [ 2016 , 2016 , 2016 , 2016 , 2017 , 2017 , 2017 , 2017 ],
'Salary' : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ],
'Bonus' : [ 2 , 2 , 2 , 2 , 3 , 4 , 5 , 6 ]
})
print (salaries)
print (salaries[ 'Bonus' ].duplicated(keep = 'first' ))
print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'first' )].index)
print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'first' )])
print (salaries[ 'Bonus' ].duplicated(keep = 'last' ))
print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'last' )].index)
print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'last' )])
|
输出如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
Bonus Salary Year name
0 2 1 2016 BOSS
1 2 2 2016 Lilei
2 2 3 2016 Lilei
3 2 4 2016 Han
4 3 5 2017 BOSS
5 4 6 2017 BOSS
6 5 7 2017 Han
7 6 8 2017 BOSS
0 False
1 True
2 True
3 True
4 False
5 False
6 False
7 False
Name: Bonus, dtype: bool
Int64Index([ 1 , 2 , 3 ], dtype = 'int64' )
Bonus Salary Year name
1 2 2 2016 Lilei
2 2 3 2016 Lilei
3 2 4 2016 Han
0 True
1 True
2 True
3 False
4 False
5 False
6 False
7 False
Name: Bonus, dtype: bool
Int64Index([ 0 , 1 , 2 ], dtype = 'int64' )
Bonus Salary Year name
0 2 1 2016 BOSS
1 2 2 2016 Lilei
2 2 3 2016 Lilei
|
非pandas
对于如nunpy中的这些操作主要如下:
假设有数组
a = np.array([1, 2, 1, 3, 3, 3, 0])
想找出 [1 3]
则有
1
2
3
4
5
|
方法 1
m = np.zeros_like(a, dtype = bool )
m[np.unique(a, return_index = True )[ 1 ]] = True
a[~m]
|
1
2
3
|
方法 2
a[~np.in1d(np.arange( len (a)), np.unique(a, return_index = True )[ 1 ], assume_unique = True )]
|
1
2
3
|
方法 3
np.setxor1d(a, np.unique(a), assume_unique = True )
|
1
2
3
4
|
方法 4
u, i = np.unique(a, return_inverse = True )
u[np.bincount(i) > 1 ]
|
1
2
3
4
|
方法 5
s = np.sort(a, axis = None )
s[: - 1 ][s[ 1 :] = = s[: - 1 ]]
|
参考:https://*.com/questions/11528078/determining-duplicate-values-in-an-array
以上这篇Pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/hguo11/article/details/82556171