Python / Pandas DataFrame中的频率图

时间:2022-07-14 23:48:16

I have a parsed very large dataframe with some values like this and several columns:

我有一个解析的非常大的数据帧,其中包含一些像这样的值和几列:

Name Age Points ...
XYZ  42  32pts  ...
ABC  41  32pts  ...
DEF  32  35pts
GHI  52  35pts
JHK  72  35pts
MNU  43  42pts
LKT  32  32pts
LKI  42  42pts
JHI  42  35pts
JHP  42  42pts
XXX  42  42pts
XYY  42  35pts

I have imported numpy and matplotlib.

我导入了numpy和matplotlib。

I need to plot a graph of the number of times the value in the column 'Points' occurs. I dont need to have any bins for the plotting. So it is more of a plot to see how many times the same score of points occurs over a large dataset.

我需要绘制“点”列中的值出现次数的图表。我不需要为绘图设置任何箱子。因此,更多的情节是查看在大型数据集上出现相同分数的次数。

So essentially the bar plot (or histogram, if you can call it that) should show that 32pts occurs thrice, 35pts occurs 5 times and 42pts occurs 4 times. If I can plot the values in sorted order, all the more better. I have tried df.hist() but it is not working for me. Any clues? Thanks.

所以基本上条形图(或直方图,如果你可以称之为)应该显示32次出现三次,35次出现5次,42次出现4次。如果我可以按排序顺序绘制值,那就更好了。我试过df.hist(),但它不适合我。有什么线索吗?谢谢。

1 个解决方案

#1


14  

Just plot the results of the dataframe's value_count method directly:

只需直接绘制数据框的value_count方法的结果:

import matplotlib.pyplot as plt
import pandas

data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')

If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:

如果要从列中的所有元素中删除字符串“pnts”,可以执行以下操作:

df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)

That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas

这假设他们都以'pnts'结束。如果它在一行之间变化,你需要查看这样的正则表达式:使用pandas拆分列

And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

官方文档:http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

#1


14  

Just plot the results of the dataframe's value_count method directly:

只需直接绘制数据框的value_count方法的结果:

import matplotlib.pyplot as plt
import pandas

data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')

If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:

如果要从列中的所有元素中删除字符串“pnts”,可以执行以下操作:

df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)

That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas

这假设他们都以'pnts'结束。如果它在一行之间变化,你需要查看这样的正则表达式:使用pandas拆分列

And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

官方文档:http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods