I'd like to use Matplotlib to plot a histogram over data that's been pre-counted. For example, say I have the raw data
我想使用Matplotlib在预先计数的数据上绘制直方图。例如,假设我有原始数据
data = [1, 2, 2, 3, 4, 5, 5, 5, 5, 6, 10]
Given this data, I can use
鉴于这些数据,我可以使用
pylab.hist(data, bins=[...])
to plot a histogram.
绘制直方图。
In my case, the data has been pre-counted and is represented as a dictionary:
就我而言,数据已被预先计算并表示为字典:
counted_data = {1: 1, 2: 2, 3: 1, 4: 1, 5: 4, 6: 1, 10: 1}
Ideally, I'd like to pass this pre-counted data to a histogram function that lets me control the bin widths, plot range, etc, as if I had passed it the raw data. As a workaround, I'm expanding my counts into the raw data:
理想情况下,我想将这个预先计数的数据传递给直方图函数,让我可以控制箱宽,绘图范围等,就好像我已经将原始数据传递给它一样。作为一种解决方法,我将我的计数扩展到原始数据:
data = list(chain.from_iterable(repeat(value, count)
for (value, count) in counted_data.iteritems()))
This is inefficient when counted_data
contains counts for millions of data points.
当counting_data包含数百万个数据点的计数时,这是低效的。
Is there an easier way to use Matplotlib to produce a histogram from my pre-counted data?
是否有更简单的方法使用Matplotlib从我预先计算的数据中生成直方图?
Alternatively, if it's easiest to just bar-plot data that's been pre-binned, is there a convenience method to "roll-up" my per-item counts into binned counts?
或者,如果最简单的条形图是已预先装箱的数据,是否有一种方便的方法可以将我的每件商品计数“汇总”为分箱数量?
4 个解决方案
#1
19
You can use the weights
keyword argument to np.histgram
(which plt.hist
calls underneath)
你可以使用权重关键字参数到np.histgram(下面的plt.hist调用)
val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)
Assuming you only have integers as the keys, you can also use bar
directly:
假设您只有整数作为键,您也可以直接使用bar:
min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())
bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)
for k,v in counted_data.items():
vals[k - min_bin] = v
plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar
(doc)
where ...是你要传递给bar的文件(doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
如果要重新分类数据,请参阅直方图,其中单独的列表表示频率
#2
16
I used pyplot.hist's weights
option to weight each key by its value, producing the histogram that I wanted:
我使用pyplot.hist的权重选项按每个键的值加权,产生我想要的直方图:
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
pylab.hist(counting_data.keys(),weights = counts_data.values(),bins = range(50))
This allows me to rely on hist
to re-bin my data.
这允许我依靠hist重新包装我的数据。
#3
1
the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:
“bin”数组的长度应该比“count”的长度长。以下是完全重建直方图的方法:
import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)
#4
0
You can also use seaborn to plot the histogram :
您也可以使用seaborn绘制直方图:
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(list(counted_data.keys()), hist_kws={"weights":list(counted_data.values())})
#1
19
You can use the weights
keyword argument to np.histgram
(which plt.hist
calls underneath)
你可以使用权重关键字参数到np.histgram(下面的plt.hist调用)
val, weight = zip(*[(k, v) for k,v in counted_data.items()])
plt.hist(val, weights=weight)
Assuming you only have integers as the keys, you can also use bar
directly:
假设您只有整数作为键,您也可以直接使用bar:
min_bin = np.min(counted_data.keys())
max_bin = np.max(counted_data.keys())
bins = np.arange(min_bin, max_bin + 1)
vals = np.zeros(max_bin - min_bin + 1)
for k,v in counted_data.items():
vals[k - min_bin] = v
plt.bar(bins, vals, ...)
where ... is what ever arguments you want to pass to bar
(doc)
where ...是你要传递给bar的文件(doc)
If you want to re-bin your data see Histogram with separate list denoting frequency
如果要重新分类数据,请参阅直方图,其中单独的列表表示频率
#2
16
I used pyplot.hist's weights
option to weight each key by its value, producing the histogram that I wanted:
我使用pyplot.hist的权重选项按每个键的值加权,产生我想要的直方图:
pylab.hist(counted_data.keys(), weights=counted_data.values(), bins=range(50))
pylab.hist(counting_data.keys(),weights = counts_data.values(),bins = range(50))
This allows me to rely on hist
to re-bin my data.
这允许我依靠hist重新包装我的数据。
#3
1
the length of the "bins" array should be longer than the length of "counts". Here's the way to fully reconstruct the histogram:
“bin”数组的长度应该比“count”的长度长。以下是完全重建直方图的方法:
import numpy as np
import matplotlib.pyplot as plt
bins = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).astype(float)
counts = np.array([5, 3, 4, 5, 6, 1, 3, 7]).astype(float)
centroids = (bins[1:] + bins[:-1]) / 2
counts_, bins_, _ = plt.hist(centroids, bins=len(counts),
weights=counts, range=(min(bins), max(bins)))
plt.show()
assert np.allclose(bins_, bins)
assert np.allclose(counts_, counts)
#4
0
You can also use seaborn to plot the histogram :
您也可以使用seaborn绘制直方图:
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(list(counted_data.keys()), hist_kws={"weights":list(counted_data.values())})