如何在MATLAB中实现直方图的规范化?

时间:2023-01-21 14:57:08

How to normalize a histogram such that the area under the probability density function is equal to 1?

如何使直方图标准化,使得概率密度函数下的面积等于1?

7 个解决方案

#1


114  

My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.

我的答案和你之前的问题的答案是一样的。对于概率密度函数,整个空间的积分是1。除以总数不会给你正确的密度。为了得到正确的密度,你必须除以面积。为了说明我的观点,请尝试下面的示例。

[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution

% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off

% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off

You can see for yourself which method agrees with the correct answer (red curve).

您可以自己看到哪种方法与正确的答案(红色曲线)一致。

如何在MATLAB中实现直方图的规范化?

Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.

另一种方法(比方法2更直接)使直方图标准化,即除以(f * dx),它表示概率密度函数的积分,即。

% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off

#2


19  

Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).

自2014b以来,Matlab在直方图函数中嵌入了这些规范化的例程(见此函数提供的6例程的帮助文件)。下面是一个使用PDF标准化的示例(所有容器的总和为1)。

data = 2*randn(5000,1) + 5;             % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf')   % PDF normalization

The corresponding PDF is

相应的PDF是

Nbins = h.NumBins;
edges = h.BinEdges; 
x = zeros(1,Nbins);
for counter=1:Nbins
    midPointShift = abs(edges(counter)-edges(counter+1))/2;
    x(counter) = edges(counter)+midPointShift;
end

mu = mean(data);
sigma = std(data);

f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));

The two together gives

这两个在一起了

hold on;
plot(x,f,'LineWidth',1.5)

如何在MATLAB中实现直方图的规范化?

An improvement that might very well be due to the success of the actual question and accepted answer!

一个很好的改进可能是由于成功的实际问题和公认的答案!


EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of @abcd (trapz and sum) and Matlab (pdf).

编辑-现在不建议使用hist和histc,应该使用直方图。要注意的是,用这个新功能创建箱子的6种方法中,没有一种会产生hist和histc产品。有一个Matlab脚本可以更新以前的代码,以适应直方图的调用方式(bin edge而不是bin center - link)。通过这样做,可以比较@abcd (trapz和sum)和Matlab (pdf)的pdf标准化方法。

The 3 pdf normalization method give nearly identical results (within the range of eps).

3个pdf规范化方法给出了几乎相同的结果(在eps范围内)。

TEST:

测试:

A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));

figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');

subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');

subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');


subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');

dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);

max(normalization_difference_trapz)
max(normalization_difference_sum)

如何在MATLAB中实现直方图的规范化?

The maximum difference between the new PDF normalization and the former one is 5.5511e-17.

新的PDF规范化和前一个的最大差异是5.5511e-17。

#3


10  

hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:

hist不仅可以绘制直方图,还可以返回每个bin中元素的计数,这样您就可以得到该计数,通过将每个bin划分为total并使用bar绘制结果来实现它的规范化。例子:

Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)

or if you want a one-liner:

或者如果你想要一行:

bar(hist(Y) ./ sum(hist(Y)))

Documentation:

Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.

编辑:这个解决方案回答了一个问题,如何让所有箱子的总和等于1。只有当您的bin大小相对于数据的方差较小时,此近似才有效。这里所使用的总和对应于一个简单的求积公式,更复杂的公式可以像R. M所建议的那样使用trapz。

#4


5  

[f,x]=hist(data)

The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:

每个酒吧的面积是高*宽。由于MATLAB会选择等距点,所以宽度为:

delta_x = x(2) - x(1)

Now if we sum up all the individual bars the total area will come out as

现在,如果我们把所有的条数加起来,整个面积就会变成。

A=sum(f)*delta_x

So the correctly scaled plot is obtained by

所以正确的比例曲线是通过。

bar(x, f/sum(f)/(x(2)-x(1)))

#5


3  

The area of abcd`s PDF is not one, which is impossible like pointed out in many comments. Assumptions done in many answers here

abcd的PDF格式不是一个,这在很多评论中都是不可能的。这里有很多假设。

  1. Assume constant distance between consecutive edges.
  2. 假设连续边之间的距离是恒定的。
  3. Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
  4. 在pdf下的概率应该是1。在直方图()和hist()中,标准化应该作为归一化处理,而不是与pdf的标准化。

Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach

图1 hist()方法输出,图2直方图()方法输出。

如何在MATLAB中实现直方图的规范化? 如何在MATLAB中实现直方图的规范化?

The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization. I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.

在hist()的方法中有一些错误,因为直方图()的方法使用标准的标准化,所以最大振幅在两种方法之间是不同的。我认为hist()的方法的错误在于将标准化作为部分pdf,而不是完全作为概率。

Code with hist() [deprecated]

Some remarks

一些评论

  1. First check: sum(f)/N gives 1 if Nbins manually set.
  2. 第一个检查:sum(f)/N,如果Nbins手动设置,则为1。
  3. pdf requires the width of the bin (dx) in the graph g
  4. pdf要求图g中的bin (dx)的宽度。

Code

代码

%http://*.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND

%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off

Output is in Fig. 1.

输出如图1所示。

Code with histogram()

Some remarks

一些评论

  1. First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
  2. 第一次检查:a) sum(f)为1,如果Nbins用直方图()的归一化为概率,b)和(f)/N为1,如果Nbins在不归一化的情况下手动设置。
  3. pdf requires the width of the bin (dx) in the graph g
  4. pdf要求图g中的bin (dx)的宽度。

Code

代码

%%METHOD 5: with histogram()
% http://*.com/a/38809232/54964
N=10000;

figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges; 
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
    midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
    x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off

Output in Fig. 2 and expected output is met: area 1.0000.

图2的输出和期望的输出满足:1。0000。

Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6

Matlab: 2016a系统:Linux Ubuntu 16.04 64位Linux内核4.6。

#6


1  

For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do

对于一些发行版,我想我已经发现trapz会高估这个区域,所以pdf会根据你选择的容器的数量而变化。在这种情况下我是这样做的。

[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')

#7


1  

There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link), the first part is on Histogram Stretching.

在MATLAB中对直方图调整有一个很好的三部分指导(断开的原始链接,archive.org链接),第一部分是直方图拉伸。

#1


114  

My answer to this is the same as in an answer to your earlier question. For a probability density function, the integral over the entire space is 1. Dividing by the sum will not give you the correct density. To get the right density, you must divide by the area. To illustrate my point, try the following example.

我的答案和你之前的问题的答案是一样的。对于概率密度函数,整个空间的积分是1。除以总数不会给你正确的密度。为了得到正确的密度,你必须除以面积。为了说明我的观点,请尝试下面的示例。

[f, x] = hist(randn(10000, 1), 50); % Create histogram from a normal distribution.
g = 1 / sqrt(2 * pi) * exp(-0.5 * x .^ 2); % pdf of the normal distribution

% METHOD 1: DIVIDE BY SUM
figure(1)
bar(x, f / sum(f)); hold on
plot(x, g, 'r'); hold off

% METHOD 2: DIVIDE BY AREA
figure(2)
bar(x, f / trapz(x, f)); hold on
plot(x, g, 'r'); hold off

You can see for yourself which method agrees with the correct answer (red curve).

您可以自己看到哪种方法与正确的答案(红色曲线)一致。

如何在MATLAB中实现直方图的规范化?

Another method (more straightforward than method 2) to normalize the histogram is to divide by sum(f * dx) which expresses the integral of the probability density function, i.e.

另一种方法(比方法2更直接)使直方图标准化,即除以(f * dx),它表示概率密度函数的积分,即。

% METHOD 3: DIVIDE BY AREA USING sum()
figure(3)
dx = diff(x(1:2))
bar(x, f / sum(f * dx)); hold on
plot(x, g, 'r'); hold off

#2


19  

Since 2014b, Matlab has these normalization routines embedded natively in the histogram function (see the help file for the 6 routines this function offers). Here is an example using the PDF normalization (the sum of all the bins is 1).

自2014b以来,Matlab在直方图函数中嵌入了这些规范化的例程(见此函数提供的6例程的帮助文件)。下面是一个使用PDF标准化的示例(所有容器的总和为1)。

data = 2*randn(5000,1) + 5;             % generate normal random (m=5, std=2)
h = histogram(data,'Normalization','pdf')   % PDF normalization

The corresponding PDF is

相应的PDF是

Nbins = h.NumBins;
edges = h.BinEdges; 
x = zeros(1,Nbins);
for counter=1:Nbins
    midPointShift = abs(edges(counter)-edges(counter+1))/2;
    x(counter) = edges(counter)+midPointShift;
end

mu = mean(data);
sigma = std(data);

f = exp(-(x-mu).^2./(2*sigma^2))./(sigma*sqrt(2*pi));

The two together gives

这两个在一起了

hold on;
plot(x,f,'LineWidth',1.5)

如何在MATLAB中实现直方图的规范化?

An improvement that might very well be due to the success of the actual question and accepted answer!

一个很好的改进可能是由于成功的实际问题和公认的答案!


EDIT - The use of hist and histc is not recommended now, and histogram should be used instead. Beware that none of the 6 ways of creating bins with this new function will produce the bins hist and histc produce. There is a Matlab script to update former code to fit the way histogram is called (bin edges instead of bin centers - link). By doing so, one can compare the pdf normalization methods of @abcd (trapz and sum) and Matlab (pdf).

编辑-现在不建议使用hist和histc,应该使用直方图。要注意的是,用这个新功能创建箱子的6种方法中,没有一种会产生hist和histc产品。有一个Matlab脚本可以更新以前的代码,以适应直方图的调用方式(bin edge而不是bin center - link)。通过这样做,可以比较@abcd (trapz和sum)和Matlab (pdf)的pdf标准化方法。

The 3 pdf normalization method give nearly identical results (within the range of eps).

3个pdf规范化方法给出了几乎相同的结果(在eps范围内)。

TEST:

测试:

A = randn(10000,1);
centers = -6:0.5:6;
d = diff(centers)/2;
edges = [centers(1)-d(1), centers(1:end-1)+d, centers(end)+d(end)];
edges(2:end) = edges(2:end)+eps(edges(2:end));

figure;
subplot(2,2,1);
hist(A,centers);
title('HIST not normalized');

subplot(2,2,2);
h = histogram(A,edges);
title('HISTOGRAM not normalized');

subplot(2,2,3)
[counts, centers] = hist(A,centers); %get the count with hist
bar(centers,counts/trapz(centers,counts))
title('HIST with PDF normalization');


subplot(2,2,4)
h = histogram(A,edges,'Normalization','pdf')
title('HISTOGRAM with PDF normalization');

dx = diff(centers(1:2))
normalization_difference_trapz = abs(counts/trapz(centers,counts) - h.Values);
normalization_difference_sum = abs(counts/sum(counts*dx) - h.Values);

max(normalization_difference_trapz)
max(normalization_difference_sum)

如何在MATLAB中实现直方图的规范化?

The maximum difference between the new PDF normalization and the former one is 5.5511e-17.

新的PDF规范化和前一个的最大差异是5.5511e-17。

#3


10  

hist can not only plot an histogram but also return you the count of elements in each bin, so you can get that count, normalize it by dividing each bin by the total and plotting the result using bar. Example:

hist不仅可以绘制直方图,还可以返回每个bin中元素的计数,这样您就可以得到该计数,通过将每个bin划分为total并使用bar绘制结果来实现它的规范化。例子:

Y = rand(10,1);
C = hist(Y);
C = C ./ sum(C);
bar(C)

or if you want a one-liner:

或者如果你想要一行:

bar(hist(Y) ./ sum(hist(Y)))

Documentation:

Edit: This solution answers the question How to have the sum of all bins equal to 1. This approximation is valid only if your bin size is small relative to the variance of your data. The sum used here correspond to a simple quadrature formula, more complex ones can be used like trapz as proposed by R. M.

编辑:这个解决方案回答了一个问题,如何让所有箱子的总和等于1。只有当您的bin大小相对于数据的方差较小时,此近似才有效。这里所使用的总和对应于一个简单的求积公式,更复杂的公式可以像R. M所建议的那样使用trapz。

#4


5  

[f,x]=hist(data)

The area for each individual bar is height*width. Since MATLAB will choose equidistant points for the bars, so the width is:

每个酒吧的面积是高*宽。由于MATLAB会选择等距点,所以宽度为:

delta_x = x(2) - x(1)

Now if we sum up all the individual bars the total area will come out as

现在,如果我们把所有的条数加起来,整个面积就会变成。

A=sum(f)*delta_x

So the correctly scaled plot is obtained by

所以正确的比例曲线是通过。

bar(x, f/sum(f)/(x(2)-x(1)))

#5


3  

The area of abcd`s PDF is not one, which is impossible like pointed out in many comments. Assumptions done in many answers here

abcd的PDF格式不是一个,这在很多评论中都是不可能的。这里有很多假设。

  1. Assume constant distance between consecutive edges.
  2. 假设连续边之间的距离是恒定的。
  3. Probability under pdf should be 1. The normalization should be done as Normalization with probability, not as Normalization with pdf, in histogram() and hist().
  4. 在pdf下的概率应该是1。在直方图()和hist()中,标准化应该作为归一化处理,而不是与pdf的标准化。

Fig. 1 Output of hist() approach, Fig. 2 Output of histogram() approach

图1 hist()方法输出,图2直方图()方法输出。

如何在MATLAB中实现直方图的规范化? 如何在MATLAB中实现直方图的规范化?

The max amplitude differs between two approaches which proposes that there are some mistake in hist()'s approach because histogram()'s approach uses the standard normalization. I assume the mistake with hist()'s approach here is about the normalization as partially pdf, not completely as probability.

在hist()的方法中有一些错误,因为直方图()的方法使用标准的标准化,所以最大振幅在两种方法之间是不同的。我认为hist()的方法的错误在于将标准化作为部分pdf,而不是完全作为概率。

Code with hist() [deprecated]

Some remarks

一些评论

  1. First check: sum(f)/N gives 1 if Nbins manually set.
  2. 第一个检查:sum(f)/N,如果Nbins手动设置,则为1。
  3. pdf requires the width of the bin (dx) in the graph g
  4. pdf要求图g中的bin (dx)的宽度。

Code

代码

%http://*.com/a/5321546/54964
N=10000;
Nbins=50;
[f,x]=hist(randn(N,1),Nbins); % create histogram from ND

%METHOD 4: Count Densities, not Sums!
figure(3)
dx=diff(x(1:2)); % width of bin
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND with dx
% 1.0000
bar(x, f/sum(f));hold on
plot(x,g,'r');hold off

Output is in Fig. 1.

输出如图1所示。

Code with histogram()

Some remarks

一些评论

  1. First check: a) sum(f) is 1 if Nbins adjusted with histogram()'s Normalization as probability, b) sum(f)/N is 1 if Nbins is manually set without normalization.
  2. 第一次检查:a) sum(f)为1,如果Nbins用直方图()的归一化为概率,b)和(f)/N为1,如果Nbins在不归一化的情况下手动设置。
  3. pdf requires the width of the bin (dx) in the graph g
  4. pdf要求图g中的bin (dx)的宽度。

Code

代码

%%METHOD 5: with histogram()
% http://*.com/a/38809232/54964
N=10000;

figure(4);
h = histogram(randn(N,1), 'Normalization', 'probability') % hist() deprecated!
Nbins=h.NumBins;
edges=h.BinEdges; 
x=zeros(1,Nbins);
f=h.Values;
for counter=1:Nbins
    midPointShift=abs(edges(counter)-edges(counter+1))/2; % same constant for all
    x(counter)=edges(counter)+midPointShift;
end
dx=diff(x(1:2)); % constast for all
g=1/sqrt(2*pi)*exp(-0.5*x.^2) .* dx; % pdf of ND
% Use if Nbins manually set
%new_area=sum(f)/N % diff of consecutive edges constant
% Use if histogarm() Normalization probability
new_area=sum(f)
% 1.0000
% No bar() needed here with histogram() Normalization probability
hold on;
plot(x,g,'r');hold off

Output in Fig. 2 and expected output is met: area 1.0000.

图2的输出和期望的输出满足:1。0000。

Matlab: 2016a
System: Linux Ubuntu 16.04 64 bit
Linux kernel 4.6

Matlab: 2016a系统:Linux Ubuntu 16.04 64位Linux内核4.6。

#6


1  

For some Distributions, Cauchy I think, I have found that trapz will overestimate the area, and so the pdf will change depending on the number of bins you select. In which case I do

对于一些发行版,我想我已经发现trapz会高估这个区域,所以pdf会根据你选择的容器的数量而变化。在这种情况下我是这样做的。

[N,h]=hist(q_f./theta,30000); % there Is a large range but most of the bins will be empty
plot(h,N/(sum(N)*mean(diff(h))),'+r')

#7


1  

There is an excellent three part guide for Histogram Adjustments in MATLAB (broken original link, archive.org link), the first part is on Histogram Stretching.

在MATLAB中对直方图调整有一个很好的三部分指导(断开的原始链接,archive.org链接),第一部分是直方图拉伸。