如何在python中实现直方图的规范化?

时间:2021-09-03 04:14:47

I'm trying to plot normed histogram, but instead of getting 1 as maximum value on y axis, I'm getting different numbers.

我试着画出normed直方图,但不是把1作为y轴上的最大值,而是得到不同的数。

For array k=(1,4,3,1)

对数组k =(1、4、3、1)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram, that doesn't look like normed.

我得到了这个直方图,它看起来不像normed。

如何在python中实现直方图的规范化?

For a different array k=(3,3,3,3)

对于不同的数组k=(3,3,3,3)

 import numpy as np

 def plotGraph():

    import matplotlib.pyplot as plt

    k=(3,3,3,3)

    plt.hist(k, normed=1)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

I get this histogram with max y-value is 10.

得到最大值为10的直方图。

如何在python中实现直方图的规范化?

For different k I get different max value of y even though normed=1 or normed=True.

对于不同的k,即使normed=1或normed=True,也会得到不同的y最大值。

Why the normalization (if it works) changes based on the data and how can I make maximum value of y equals to 1?

为什么标准化(如果它起作用)基于数据的变化,我如何使y的最大值等于1?

UPDATE:

更新:

I am trying to implement Carsten König answer from plotting histograms whose bar heights sum to 1 in matplotlib and getting very weird result:

我正在尝试着将Carsten Konig的答案从图的直方图中得到答案,它的bar高度和1在matplotlib中,得到了非常奇怪的结果:

import numpy as np

def plotGraph():

    import matplotlib.pyplot as plt

    k=(1,4,3,1)

    weights = np.ones_like(k)/len(k)
    plt.hist(k, weights=weights)

    from numpy import *
    plt.xticks( arange(10) ) # 10 ticks on x axis

    plt.show()  

plotGraph()

Result:

结果:

如何在python中实现直方图的规范化?

What am I doing wrong?

我做错了什么?

Thanks

谢谢

4 个解决方案

#1


7  

When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

当你绘制一个归一化的直方图时,它的高度不应该等于1,但是曲线下的面积应该等于1:

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x,bins,p=plt.hist(k, normed=1)
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

这里,本例中,bin宽度为0.1,曲线下的面积总和为1(0.1*10)。

To have the sum of height to be 1, add the following before plt.show():

要使高度之和为1,请在plt.show()前添加以下内容:

for item in p:
    item.set_height(item.get_height()/sum(x))

如何在python中实现直方图的规范化?

#2


3  

One way is to get the probabilities on your own, and then plot with plt.bar:

一种方法是得到你自己的概率,然后用plt来绘图:

In [91]: from collections import Counter
    ...: c=Counter(k)
    ...: print c
Counter({1: 2, 3: 1, 4: 1})

In [92]: plt.bar(prob.keys(), prob.values())
    ...: plt.show()

result: 如何在python中实现直方图的规范化?

结果:

#3


1  

A normed histogram is defined such that the sum of products of width and height of each column is equal to the total count. That's why you are not getting your max equal to one.

一种赋值直方图的定义是,每一列的宽度和高度的乘积之和等于总计数。这就是为什么你没有得到最大值等于1的原因。

However, if you still want to force it to be 1, you could use numpy and matplotlib.pyplot.bar in the following way

但是,如果您仍然想强制它为1,您可以使用numpy和matplotlib.pyplot。用下列方法吧。

sample = np.random.normal(0,10,100)
#generate bins boundaries and heights
bin_height,bin_boundary = np.histogram(sample,bins=10)
#define width of each column
width = bin_boundary[1]-bin_boundary[0]
#standardize each column by dividing with the maximum height
bin_height = bin_height/float(max(bin_height))
#plot
plt.bar(bin_boundary[:-1],bin_height,width = width)
plt.show()

#4


1  

You could use the solution outlined here:

您可以使用这里概述的解决方案:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)

#1


7  

When you plot a normalized histogram, it is not the height that should sum up to one, but the area underneath the curve should sum up to one:

当你绘制一个归一化的直方图时,它的高度不应该等于1,但是曲线下的面积应该等于1:

In [44]:

import matplotlib.pyplot as plt
k=(3,3,3,3)
x,bins,p=plt.hist(k, normed=1)
from numpy import *
plt.xticks( arange(10) ) # 10 ticks on x axis
plt.show()  
In [45]:

print bins
[ 2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3  3.4  3.5]

Here, this example, the bin width is 0.1, the area underneath the curve sums up to one (0.1*10).

这里,本例中,bin宽度为0.1,曲线下的面积总和为1(0.1*10)。

To have the sum of height to be 1, add the following before plt.show():

要使高度之和为1,请在plt.show()前添加以下内容:

for item in p:
    item.set_height(item.get_height()/sum(x))

如何在python中实现直方图的规范化?

#2


3  

One way is to get the probabilities on your own, and then plot with plt.bar:

一种方法是得到你自己的概率,然后用plt来绘图:

In [91]: from collections import Counter
    ...: c=Counter(k)
    ...: print c
Counter({1: 2, 3: 1, 4: 1})

In [92]: plt.bar(prob.keys(), prob.values())
    ...: plt.show()

result: 如何在python中实现直方图的规范化?

结果:

#3


1  

A normed histogram is defined such that the sum of products of width and height of each column is equal to the total count. That's why you are not getting your max equal to one.

一种赋值直方图的定义是,每一列的宽度和高度的乘积之和等于总计数。这就是为什么你没有得到最大值等于1的原因。

However, if you still want to force it to be 1, you could use numpy and matplotlib.pyplot.bar in the following way

但是,如果您仍然想强制它为1,您可以使用numpy和matplotlib.pyplot。用下列方法吧。

sample = np.random.normal(0,10,100)
#generate bins boundaries and heights
bin_height,bin_boundary = np.histogram(sample,bins=10)
#define width of each column
width = bin_boundary[1]-bin_boundary[0]
#standardize each column by dividing with the maximum height
bin_height = bin_height/float(max(bin_height))
#plot
plt.bar(bin_boundary[:-1],bin_height,width = width)
plt.show()

#4


1  

You could use the solution outlined here:

您可以使用这里概述的解决方案:

weights = np.ones_like(myarray)/float(len(myarray))
plt.hist(myarray, weights=weights)