将一个拟合的威布尔分布(fitdistr)添加到一个地_bar (ggplot2)分类图中。

时间:2021-04-12 15:01:08

I have created a barplot of Age vs. Population size (by gender) from Census data in ggplot2. Similarly, I have used the 'fitdist' function from the fitdistrplus package to derive Weibull parameters for the normalised (by maximum observed population across all Age bins) population data.

我用ggplot2的人口普查数据创建了一个年龄与人口规模(按性别)的条形图。类似地,我使用了fitdist函数,从fitdetplus包中获得了标准化(按所有年龄组的最大观察人口)人口数据的威布尔参数。

What I would like to do is to overlay the plotted data with the distribution as a line plot. I have tried

我想要做的是将绘制的数据与分布叠加成一条线。我有试过

+ geom_line (denscomp(malefit.w))

Plus other numerous (unsuccessful) strategies.

加上其他许多(不成功的)策略。

Any help that could be provided would be much appreciated! Please find the syntax appended below:

如能提供任何帮助,我们将不胜感激!请查找下面的语法:

Data Structure

数据结构

    Order     Age    Male  Female   Total  male.norm
1      1   0 - 5 2870000 2820000 5690000 1.00000000
2      2   5 - 9 2430000 2390000 4820000 0.84668990
3      3 10 - 14 2340000 2250000 4590000 0.81533101
4      4 15 - 19 2500000 2500000 5000000 0.87108014
5      5 20 - 24 2690000 2680000 5370000 0.93728223
6      6 25 - 29 2540000 2520000 5060000 0.88501742
7      7 30 - 34 2040000 1990000 4030000 0.71080139
8      8 35 - 39 1710000 1760000 3470000 0.59581882
9      9 40 - 44 1400000 1550000 2950000 0.48780488
10    10 45 - 49 1200000 1420000 2620000 0.41811847
11    11 50 - 54 1010000 1210000 2220000 0.35191638
12    12 55 - 59  812000  985000 1800000 0.28292683
13    13 60 - 64  612000  773000 1390000 0.21324042
14    14 65 - 69  402000  556000  958000 0.14006969
15    15 70 - 74  293000  455000  748000 0.10209059
16    16 75 - 79  165000  316000  481000 0.05749129
17    17 80 - 84  101000  222000  323000 0.03519164
18    18 85 plus   75500  180000  256000 0.02630662
   female.norm 
1   1.00000000  
2   0.84751773   
3   0.79787234    
4   0.88652482    
5   0.95035461    
6   0.89361702    
7   0.70567376    
8   0.62411348   
9   0.54964539    
10  0.50354610    
11  0.42907801    
12  0.34929078    
13  0.27411348   
14  0.19716312    
15  0.16134752   
16  0.11205674    
17  0.07872340   
18  0.06382979 

1 个解决方案

#1


1  

This is the answer to the original question I posed above. In conjunction with the data posted in the question it is a beginning to end solution (i.e. raw data to plot).

这就是我刚才提出的问题的答案。结合问题中公布的数据,这是一个开始结束的解决方案(即绘制原始数据)。

Fitting of South-African age-population data (by gender) to a Weibull distribution (Theresa Cain and Ben Small)

将南非年龄-人口数据(按性别)拟合为威布尔分布(Theresa Cain和Ben Small)

load libraries

加载库

library(MASS)
library(ggplot2)  

Import dataset

导入数据集

age_gender2 <- read.csv("age_gender2.csv", sep=",", header = T)

Define total population size by gender - that is sum the entire male / female population across all age bins and place in an objects 'total.male' and 'total.female' respectively

按性别定义总体人口数量——即所有年龄层的所有男性/女性人口的总和,并将其放在一个对象的总数中。男’和‘。女性的分别

total.male <- sum(age_gender2$Male)
total.female <- sum(age_gender2$Female)

The object 'age.groups' is a single row, single column vector describing the number of age bins for the 'age_gender2' df

对象的年龄。组是一个单行,单列向量描述了“age_gender2”df的年龄箱数。

age.groups <- length(age_gender2$Age) 

The object 'age.all' is a 1 row, 18 column empty matrix that will describe the minimum age range extracted from the age bins (categories) in the 'Age' column from age_gender2 df

对象的年龄。all'是1行18列空矩阵,描述从age_gender2 df的“age”列(类别)中提取的最小年龄范围

age.all <- matrix(0,1,age.groups)

Next line assigns min age to each element of matrix (1 X 18) for first column in each age group. So 'for' loop assigns each column of matrix as an age (HELP: writing a for loop in R).

下一行为每个年龄组第一列的矩阵(1 X 18)的每个元素分配最小年龄。因此,for循环将矩阵的每一列分配为一个年龄(HELP:在R中编写for循环)。

Structure of the 'for' loop # RULE (given in parentheses()): for each element (i) loop from 2 to the value presented in the 'age.groups' object (i.e. 18) # COMMAND (given in curly brackets {}): taking each element (i) in the 'age.male' matrix and starting at the first row (i.e. [1, by each element (i.e. [1,i], perform / assign ('<-') the following operation: ((5 X (ith element - 1)) - 2.5). This operation provides the 'middle' age for the bin

“for”循环#规则的结构(在括号()中给出):对于每个元素(i)循环,从2循环到在“age”中显示的值。组的对象(例如18)#命令(在括号{}中给出):在“age”中取每个元素(i)。male' matrix从第一行开始(即[1],由每个元素(即[1,i],执行/分配('<-')以下操作:(5 X (ith element - 1) - 2.5)。这个操作为垃圾箱提供了“中年”

this assigns the first element (row, column) in the 'age.all' matrix the value 2.5

这将分配“age”中的第一个元素(行、列)。所有的矩阵值是2。5

age.all[1,1] <- 2.5 

for(i in 2:age.groups){ 

age.all[1,i] <- ((5*(i)) - 2.5)  

}

This next command 'rep' creates a (1 X 25190500) vector of all the ages within a particular bin

下一个命令“rep”创建一个(1 X 25190500)向量,该向量包含特定bin中的所有年龄

male.data <- rep(age.all,age_gender2$Male) 
female.data <- rep(age.all,age_gender2$Female)

Fit weibull distribution to age for male and female

适合男性和女性的威布尔分布

male.weib <- fitdistr(male.data, "weibull")
female.weib <- fitdistr(female.data, "weibull")


male.shape <- male.weib$estimate[1] 
male.scale <- male.weib$estimate[2] 

female.shape <- female.weib$estimate[1] 
female.scale <- female.weib$estimate[2] 

Add column "Age_Median" to 'age_gender2' df with median age. Need to transpose as 'age.all' is an 1 row X 18 column vector.

将“年龄中位数”列添加到“年龄中位数2”df中。需要转置为'年龄。所有'都是1行X 18列向量。

age_gender2["Age_Median"] <- t(age.all)

Fit weibull distribution

符合威布尔分布

The function 'pweibull' is a PDF and finds the cumulative probability over all ages, therefore we need to subtract the previous age bin(s) from the present bin to find the probability for that bin and hence (by multiplying by the total male population) the expected population for that bin.

函数“pweibull”是一个PDF,它可以找到所有年龄段的累计概率,因此我们需要从当前的bin中减去先前的age bin(s)来找到该bin的概率,从而(通过乘以总男性人口)得到该bin的期望总体。

male.p.weibull <- matrix(0,1,age.groups)
female.p.weibull <- matrix(0,1,age.groups)

for (i in 1:age.groups){

male.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, male.shape, male.scale) -  pweibull(age.all[1,i]-2.5, male.shape, male.scale)

 }

for (i in 1:age.groups){

female.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, female.shape, female.scale) - pweibull(age.all[1,i]-2.5, female.shape, female.scale)

 }

Add column to list calculated population per age bin - 'transpose' to 1 x 18 -> 18 row x 1 column vector

将列添加到列表中,计算出每个年龄组的总体——“转置”到1 x 18 -> 18行x 1列向量

age_gender2["male.prob"] <- t(male.p.weibull * total.male)
age_gender2["female.prob"] <- t(female.p.weibull * total.female)

Create bar plots describing Age-Gender population distributions

创建描述年龄-性别人口分布的条形图

Males (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)

雄性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)

agp.male <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Male, fill=Male)) +   geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Male Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$male.prob, group=1))

Females (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)

女性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)

agp.female <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Female, fill=Female)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Female Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$female.prob, group=1))

#1


1  

This is the answer to the original question I posed above. In conjunction with the data posted in the question it is a beginning to end solution (i.e. raw data to plot).

这就是我刚才提出的问题的答案。结合问题中公布的数据,这是一个开始结束的解决方案(即绘制原始数据)。

Fitting of South-African age-population data (by gender) to a Weibull distribution (Theresa Cain and Ben Small)

将南非年龄-人口数据(按性别)拟合为威布尔分布(Theresa Cain和Ben Small)

load libraries

加载库

library(MASS)
library(ggplot2)  

Import dataset

导入数据集

age_gender2 <- read.csv("age_gender2.csv", sep=",", header = T)

Define total population size by gender - that is sum the entire male / female population across all age bins and place in an objects 'total.male' and 'total.female' respectively

按性别定义总体人口数量——即所有年龄层的所有男性/女性人口的总和,并将其放在一个对象的总数中。男’和‘。女性的分别

total.male <- sum(age_gender2$Male)
total.female <- sum(age_gender2$Female)

The object 'age.groups' is a single row, single column vector describing the number of age bins for the 'age_gender2' df

对象的年龄。组是一个单行,单列向量描述了“age_gender2”df的年龄箱数。

age.groups <- length(age_gender2$Age) 

The object 'age.all' is a 1 row, 18 column empty matrix that will describe the minimum age range extracted from the age bins (categories) in the 'Age' column from age_gender2 df

对象的年龄。all'是1行18列空矩阵,描述从age_gender2 df的“age”列(类别)中提取的最小年龄范围

age.all <- matrix(0,1,age.groups)

Next line assigns min age to each element of matrix (1 X 18) for first column in each age group. So 'for' loop assigns each column of matrix as an age (HELP: writing a for loop in R).

下一行为每个年龄组第一列的矩阵(1 X 18)的每个元素分配最小年龄。因此,for循环将矩阵的每一列分配为一个年龄(HELP:在R中编写for循环)。

Structure of the 'for' loop # RULE (given in parentheses()): for each element (i) loop from 2 to the value presented in the 'age.groups' object (i.e. 18) # COMMAND (given in curly brackets {}): taking each element (i) in the 'age.male' matrix and starting at the first row (i.e. [1, by each element (i.e. [1,i], perform / assign ('<-') the following operation: ((5 X (ith element - 1)) - 2.5). This operation provides the 'middle' age for the bin

“for”循环#规则的结构(在括号()中给出):对于每个元素(i)循环,从2循环到在“age”中显示的值。组的对象(例如18)#命令(在括号{}中给出):在“age”中取每个元素(i)。male' matrix从第一行开始(即[1],由每个元素(即[1,i],执行/分配('<-')以下操作:(5 X (ith element - 1) - 2.5)。这个操作为垃圾箱提供了“中年”

this assigns the first element (row, column) in the 'age.all' matrix the value 2.5

这将分配“age”中的第一个元素(行、列)。所有的矩阵值是2。5

age.all[1,1] <- 2.5 

for(i in 2:age.groups){ 

age.all[1,i] <- ((5*(i)) - 2.5)  

}

This next command 'rep' creates a (1 X 25190500) vector of all the ages within a particular bin

下一个命令“rep”创建一个(1 X 25190500)向量,该向量包含特定bin中的所有年龄

male.data <- rep(age.all,age_gender2$Male) 
female.data <- rep(age.all,age_gender2$Female)

Fit weibull distribution to age for male and female

适合男性和女性的威布尔分布

male.weib <- fitdistr(male.data, "weibull")
female.weib <- fitdistr(female.data, "weibull")


male.shape <- male.weib$estimate[1] 
male.scale <- male.weib$estimate[2] 

female.shape <- female.weib$estimate[1] 
female.scale <- female.weib$estimate[2] 

Add column "Age_Median" to 'age_gender2' df with median age. Need to transpose as 'age.all' is an 1 row X 18 column vector.

将“年龄中位数”列添加到“年龄中位数2”df中。需要转置为'年龄。所有'都是1行X 18列向量。

age_gender2["Age_Median"] <- t(age.all)

Fit weibull distribution

符合威布尔分布

The function 'pweibull' is a PDF and finds the cumulative probability over all ages, therefore we need to subtract the previous age bin(s) from the present bin to find the probability for that bin and hence (by multiplying by the total male population) the expected population for that bin.

函数“pweibull”是一个PDF,它可以找到所有年龄段的累计概率,因此我们需要从当前的bin中减去先前的age bin(s)来找到该bin的概率,从而(通过乘以总男性人口)得到该bin的期望总体。

male.p.weibull <- matrix(0,1,age.groups)
female.p.weibull <- matrix(0,1,age.groups)

for (i in 1:age.groups){

male.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, male.shape, male.scale) -  pweibull(age.all[1,i]-2.5, male.shape, male.scale)

 }

for (i in 1:age.groups){

female.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, female.shape, female.scale) - pweibull(age.all[1,i]-2.5, female.shape, female.scale)

 }

Add column to list calculated population per age bin - 'transpose' to 1 x 18 -> 18 row x 1 column vector

将列添加到列表中,计算出每个年龄组的总体——“转置”到1 x 18 -> 18行x 1列向量

age_gender2["male.prob"] <- t(male.p.weibull * total.male)
age_gender2["female.prob"] <- t(female.p.weibull * total.female)

Create bar plots describing Age-Gender population distributions

创建描述年龄-性别人口分布的条形图

Males (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)

雄性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)

agp.male <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Male, fill=Male)) +   geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Male Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$male.prob, group=1))

Females (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)

女性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)

agp.female <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Female, fill=Female)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Female Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$female.prob, group=1))