I have a parsed very large dataframe with some values like this and several columns:
我有一个解析的非常大的数据帧,其中包含一些像这样的值和几列:
Name Age Points ...
XYZ 42 32pts ...
ABC 41 32pts ...
DEF 32 35pts
GHI 52 35pts
JHK 72 35pts
MNU 43 42pts
LKT 32 32pts
LKI 42 42pts
JHI 42 35pts
JHP 42 42pts
XXX 42 42pts
XYY 42 35pts
I have imported numpy and matplotlib.
我导入了numpy和matplotlib。
I need to plot a graph of the number of times the value in the column 'Points' occurs. I dont need to have any bins for the plotting. So it is more of a plot to see how many times the same score of points occurs over a large dataset.
我需要绘制“点”列中的值出现次数的图表。我不需要为绘图设置任何箱子。因此,更多的情节是查看在大型数据集上出现相同分数的次数。
So essentially the bar plot (or histogram, if you can call it that) should show that 32pts occurs thrice, 35pts occurs 5 times and 42pts occurs 4 times. If I can plot the values in sorted order, all the more better. I have tried df.hist() but it is not working for me. Any clues? Thanks.
所以基本上条形图(或直方图,如果你可以称之为)应该显示32次出现三次,35次出现5次,42次出现4次。如果我可以按排序顺序绘制值,那就更好了。我试过df.hist(),但它不适合我。有什么线索吗?谢谢。
1 个解决方案
#1
14
Just plot the results of the dataframe's value_count
method directly:
只需直接绘制数据框的value_count方法的结果:
import matplotlib.pyplot as plt
import pandas
data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')
If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:
如果要从列中的所有元素中删除字符串“pnts”,可以执行以下操作:
df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)
That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas
这假设他们都以'pnts'结束。如果它在一行之间变化,你需要查看这样的正则表达式:使用pandas拆分列
And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
官方文档:http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
#1
14
Just plot the results of the dataframe's value_count
method directly:
只需直接绘制数据框的value_count方法的结果:
import matplotlib.pyplot as plt
import pandas
data = load_my_data()
fig, ax = plt.subplots()
data['Points'].value_counts().plot(ax=ax, kind='bar')
If you want to remove the string 'pnts' from all of the elements in your column, you can do something like this:
如果要从列中的所有元素中删除字符串“pnts”,可以执行以下操作:
df['points_int'] = df['Points'].str.replace('pnts', '').astype(int)
That assumes they all end with 'pnts'. If it varying from line to line, you need to look into regular expressions like this: Split columns using pandas
这假设他们都以'pnts'结束。如果它在一行之间变化,你需要查看这样的正则表达式:使用pandas拆分列
And the official docs: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods
官方文档:http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods