I know how to get most frequent value of each column in dataframe using "mode". For example:
我知道如何使用“模式”在dataframe中获取每个列的最频繁值。例如:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3]})
df.mode()
A
0 2
But I am unable to find "n" most frequent value of each column of a dataframe? For example for the mentioned dataframe, i would like following output for n=2:
但是我找不到dataframe每个列的最常见值“n”吗?例如,对于上面提到的dataframe,我希望输出n=2:
A
0 2
1 1
Any pointer ?
指针吗?
2 个解决方案
#1
1
One way is to use pd.Series.value_counts
and extract the index:
一种方法是使用pd.Series。value_counts并提取索引:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3]})
res = pd.DataFrame({col: df[col].value_counts().head(2).index for col in df})
# A
# 0 2
# 1 1
#2
1
Use value_counts
and select index values by indexing, but it working for each column separately, so need apply
or dict comprehension
with DataFrame contructor
. Casting to Series
is necessary for more general solution if possible indices does not exist, e.g:
使用value_counts和通过索引选择索引值,但它对每个列分别有效,因此需要使用DataFrame contructor应用或dict组合。如果可能的指数不存在,则需要对级数进行更通解,例如:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3],
'B': [1, 1, 1, 1, 1, 1]})
N = 2
df = df.apply(lambda x: pd.Series(x.value_counts().index[:N]))
Or:
或者:
N = 2
df = pd.DataFrame({x:pd.Series( df[x].value_counts().index[:N]) for x in df.columns})
print (df)
A B C
0 2 1.0 d
1 1 NaN e
For more general solution select only numeric columns first by select_dtypes
:
对于更一般的解决方案,请先通过select_dtypes选择数字列:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3],
'B': [1, 1, 1, 1, 1, 1],
'C': list('abcdef')})
N = 2
df = df.select_dtypes([np.number]).apply(lambda x: pd.Series(x.value_counts().index[:N]))
N = 2
cols = df.select_dtypes([np.number]).columns
df = pd.DataFrame({x: pd.Series(df[x].value_counts().index[:N]) for x in cols})
print (df)
A B C
0 2 1.0 d
1 1 NaN e
#1
1
One way is to use pd.Series.value_counts
and extract the index:
一种方法是使用pd.Series。value_counts并提取索引:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3]})
res = pd.DataFrame({col: df[col].value_counts().head(2).index for col in df})
# A
# 0 2
# 1 1
#2
1
Use value_counts
and select index values by indexing, but it working for each column separately, so need apply
or dict comprehension
with DataFrame contructor
. Casting to Series
is necessary for more general solution if possible indices does not exist, e.g:
使用value_counts和通过索引选择索引值,但它对每个列分别有效,因此需要使用DataFrame contructor应用或dict组合。如果可能的指数不存在,则需要对级数进行更通解,例如:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3],
'B': [1, 1, 1, 1, 1, 1]})
N = 2
df = df.apply(lambda x: pd.Series(x.value_counts().index[:N]))
Or:
或者:
N = 2
df = pd.DataFrame({x:pd.Series( df[x].value_counts().index[:N]) for x in df.columns})
print (df)
A B C
0 2 1.0 d
1 1 NaN e
For more general solution select only numeric columns first by select_dtypes
:
对于更一般的解决方案,请先通过select_dtypes选择数字列:
df = pd.DataFrame({'A': [1, 2, 1, 2, 2, 3],
'B': [1, 1, 1, 1, 1, 1],
'C': list('abcdef')})
N = 2
df = df.select_dtypes([np.number]).apply(lambda x: pd.Series(x.value_counts().index[:N]))
N = 2
cols = df.select_dtypes([np.number]).columns
df = pd.DataFrame({x: pd.Series(df[x].value_counts().index[:N]) for x in cols})
print (df)
A B C
0 2 1.0 d
1 1 NaN e