I have created a barplot of Age vs. Population size (by gender) from Census data in ggplot2. Similarly, I have used the 'fitdist' function from the fitdistrplus package to derive Weibull parameters for the normalised (by maximum observed population across all Age bins) population data.
我用ggplot2的人口普查数据创建了一个年龄与人口规模(按性别)的条形图。类似地,我使用了fitdist函数,从fitdetplus包中获得了标准化(按所有年龄组的最大观察人口)人口数据的威布尔参数。
What I would like to do is to overlay the plotted data with the distribution as a line plot. I have tried
我想要做的是将绘制的数据与分布叠加成一条线。我有试过
+ geom_line (denscomp(malefit.w))
Plus other numerous (unsuccessful) strategies.
加上其他许多(不成功的)策略。
Any help that could be provided would be much appreciated! Please find the syntax appended below:
如能提供任何帮助,我们将不胜感激!请查找下面的语法:
Data Structure
数据结构
Order Age Male Female Total male.norm
1 1 0 - 5 2870000 2820000 5690000 1.00000000
2 2 5 - 9 2430000 2390000 4820000 0.84668990
3 3 10 - 14 2340000 2250000 4590000 0.81533101
4 4 15 - 19 2500000 2500000 5000000 0.87108014
5 5 20 - 24 2690000 2680000 5370000 0.93728223
6 6 25 - 29 2540000 2520000 5060000 0.88501742
7 7 30 - 34 2040000 1990000 4030000 0.71080139
8 8 35 - 39 1710000 1760000 3470000 0.59581882
9 9 40 - 44 1400000 1550000 2950000 0.48780488
10 10 45 - 49 1200000 1420000 2620000 0.41811847
11 11 50 - 54 1010000 1210000 2220000 0.35191638
12 12 55 - 59 812000 985000 1800000 0.28292683
13 13 60 - 64 612000 773000 1390000 0.21324042
14 14 65 - 69 402000 556000 958000 0.14006969
15 15 70 - 74 293000 455000 748000 0.10209059
16 16 75 - 79 165000 316000 481000 0.05749129
17 17 80 - 84 101000 222000 323000 0.03519164
18 18 85 plus 75500 180000 256000 0.02630662
female.norm
1 1.00000000
2 0.84751773
3 0.79787234
4 0.88652482
5 0.95035461
6 0.89361702
7 0.70567376
8 0.62411348
9 0.54964539
10 0.50354610
11 0.42907801
12 0.34929078
13 0.27411348
14 0.19716312
15 0.16134752
16 0.11205674
17 0.07872340
18 0.06382979
1 个解决方案
#1
1
This is the answer to the original question I posed above. In conjunction with the data posted in the question it is a beginning to end solution (i.e. raw data to plot).
这就是我刚才提出的问题的答案。结合问题中公布的数据,这是一个开始结束的解决方案(即绘制原始数据)。
Fitting of South-African age-population data (by gender) to a Weibull distribution (Theresa Cain and Ben Small)
将南非年龄-人口数据(按性别)拟合为威布尔分布(Theresa Cain和Ben Small)
load libraries
加载库
library(MASS)
library(ggplot2)
Import dataset
导入数据集
age_gender2 <- read.csv("age_gender2.csv", sep=",", header = T)
Define total population size by gender - that is sum the entire male / female population across all age bins and place in an objects 'total.male' and 'total.female' respectively
按性别定义总体人口数量——即所有年龄层的所有男性/女性人口的总和,并将其放在一个对象的总数中。男’和‘。女性的分别
total.male <- sum(age_gender2$Male)
total.female <- sum(age_gender2$Female)
The object 'age.groups' is a single row, single column vector describing the number of age bins for the 'age_gender2' df
对象的年龄。组是一个单行,单列向量描述了“age_gender2”df的年龄箱数。
age.groups <- length(age_gender2$Age)
The object 'age.all' is a 1 row, 18 column empty matrix that will describe the minimum age range extracted from the age bins (categories) in the 'Age' column from age_gender2 df
对象的年龄。all'是1行18列空矩阵,描述从age_gender2 df的“age”列(类别)中提取的最小年龄范围
age.all <- matrix(0,1,age.groups)
Next line assigns min age to each element of matrix (1 X 18) for first column in each age group. So 'for' loop assigns each column of matrix as an age (HELP: writing a for loop in R).
下一行为每个年龄组第一列的矩阵(1 X 18)的每个元素分配最小年龄。因此,for循环将矩阵的每一列分配为一个年龄(HELP:在R中编写for循环)。
Structure of the 'for' loop # RULE (given in parentheses()): for each element (i) loop from 2 to the value presented in the 'age.groups' object (i.e. 18) # COMMAND (given in curly brackets {}): taking each element (i) in the 'age.male' matrix and starting at the first row (i.e. [1, by each element (i.e. [1,i], perform / assign ('<-') the following operation: ((5 X (ith element - 1)) - 2.5). This operation provides the 'middle' age for the bin
“for”循环#规则的结构(在括号()中给出):对于每个元素(i)循环,从2循环到在“age”中显示的值。组的对象(例如18)#命令(在括号{}中给出):在“age”中取每个元素(i)。male' matrix从第一行开始(即[1],由每个元素(即[1,i],执行/分配('<-')以下操作:(5 X (ith element - 1) - 2.5)。这个操作为垃圾箱提供了“中年”
this assigns the first element (row, column) in the 'age.all' matrix the value 2.5
这将分配“age”中的第一个元素(行、列)。所有的矩阵值是2。5
age.all[1,1] <- 2.5
for(i in 2:age.groups){
age.all[1,i] <- ((5*(i)) - 2.5)
}
This next command 'rep' creates a (1 X 25190500) vector of all the ages within a particular bin
下一个命令“rep”创建一个(1 X 25190500)向量,该向量包含特定bin中的所有年龄
male.data <- rep(age.all,age_gender2$Male)
female.data <- rep(age.all,age_gender2$Female)
Fit weibull distribution to age for male and female
适合男性和女性的威布尔分布
male.weib <- fitdistr(male.data, "weibull")
female.weib <- fitdistr(female.data, "weibull")
male.shape <- male.weib$estimate[1]
male.scale <- male.weib$estimate[2]
female.shape <- female.weib$estimate[1]
female.scale <- female.weib$estimate[2]
Add column "Age_Median" to 'age_gender2' df with median age. Need to transpose as 'age.all' is an 1 row X 18 column vector.
将“年龄中位数”列添加到“年龄中位数2”df中。需要转置为'年龄。所有'都是1行X 18列向量。
age_gender2["Age_Median"] <- t(age.all)
Fit weibull distribution
符合威布尔分布
The function 'pweibull' is a PDF and finds the cumulative probability over all ages, therefore we need to subtract the previous age bin(s) from the present bin to find the probability for that bin and hence (by multiplying by the total male population) the expected population for that bin.
函数“pweibull”是一个PDF,它可以找到所有年龄段的累计概率,因此我们需要从当前的bin中减去先前的age bin(s)来找到该bin的概率,从而(通过乘以总男性人口)得到该bin的期望总体。
male.p.weibull <- matrix(0,1,age.groups)
female.p.weibull <- matrix(0,1,age.groups)
for (i in 1:age.groups){
male.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, male.shape, male.scale) - pweibull(age.all[1,i]-2.5, male.shape, male.scale)
}
for (i in 1:age.groups){
female.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, female.shape, female.scale) - pweibull(age.all[1,i]-2.5, female.shape, female.scale)
}
Add column to list calculated population per age bin - 'transpose' to 1 x 18 -> 18 row x 1 column vector
将列添加到列表中,计算出每个年龄组的总体——“转置”到1 x 18 -> 18行x 1列向量
age_gender2["male.prob"] <- t(male.p.weibull * total.male)
age_gender2["female.prob"] <- t(female.p.weibull * total.female)
Create bar plots describing Age-Gender population distributions
创建描述年龄-性别人口分布的条形图
Males (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)
雄性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)
agp.male <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Male, fill=Male)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Male Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$male.prob, group=1))
Females (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)
女性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)
agp.female <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Female, fill=Female)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Female Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$female.prob, group=1))
#1
1
This is the answer to the original question I posed above. In conjunction with the data posted in the question it is a beginning to end solution (i.e. raw data to plot).
这就是我刚才提出的问题的答案。结合问题中公布的数据,这是一个开始结束的解决方案(即绘制原始数据)。
Fitting of South-African age-population data (by gender) to a Weibull distribution (Theresa Cain and Ben Small)
将南非年龄-人口数据(按性别)拟合为威布尔分布(Theresa Cain和Ben Small)
load libraries
加载库
library(MASS)
library(ggplot2)
Import dataset
导入数据集
age_gender2 <- read.csv("age_gender2.csv", sep=",", header = T)
Define total population size by gender - that is sum the entire male / female population across all age bins and place in an objects 'total.male' and 'total.female' respectively
按性别定义总体人口数量——即所有年龄层的所有男性/女性人口的总和,并将其放在一个对象的总数中。男’和‘。女性的分别
total.male <- sum(age_gender2$Male)
total.female <- sum(age_gender2$Female)
The object 'age.groups' is a single row, single column vector describing the number of age bins for the 'age_gender2' df
对象的年龄。组是一个单行,单列向量描述了“age_gender2”df的年龄箱数。
age.groups <- length(age_gender2$Age)
The object 'age.all' is a 1 row, 18 column empty matrix that will describe the minimum age range extracted from the age bins (categories) in the 'Age' column from age_gender2 df
对象的年龄。all'是1行18列空矩阵,描述从age_gender2 df的“age”列(类别)中提取的最小年龄范围
age.all <- matrix(0,1,age.groups)
Next line assigns min age to each element of matrix (1 X 18) for first column in each age group. So 'for' loop assigns each column of matrix as an age (HELP: writing a for loop in R).
下一行为每个年龄组第一列的矩阵(1 X 18)的每个元素分配最小年龄。因此,for循环将矩阵的每一列分配为一个年龄(HELP:在R中编写for循环)。
Structure of the 'for' loop # RULE (given in parentheses()): for each element (i) loop from 2 to the value presented in the 'age.groups' object (i.e. 18) # COMMAND (given in curly brackets {}): taking each element (i) in the 'age.male' matrix and starting at the first row (i.e. [1, by each element (i.e. [1,i], perform / assign ('<-') the following operation: ((5 X (ith element - 1)) - 2.5). This operation provides the 'middle' age for the bin
“for”循环#规则的结构(在括号()中给出):对于每个元素(i)循环,从2循环到在“age”中显示的值。组的对象(例如18)#命令(在括号{}中给出):在“age”中取每个元素(i)。male' matrix从第一行开始(即[1],由每个元素(即[1,i],执行/分配('<-')以下操作:(5 X (ith element - 1) - 2.5)。这个操作为垃圾箱提供了“中年”
this assigns the first element (row, column) in the 'age.all' matrix the value 2.5
这将分配“age”中的第一个元素(行、列)。所有的矩阵值是2。5
age.all[1,1] <- 2.5
for(i in 2:age.groups){
age.all[1,i] <- ((5*(i)) - 2.5)
}
This next command 'rep' creates a (1 X 25190500) vector of all the ages within a particular bin
下一个命令“rep”创建一个(1 X 25190500)向量,该向量包含特定bin中的所有年龄
male.data <- rep(age.all,age_gender2$Male)
female.data <- rep(age.all,age_gender2$Female)
Fit weibull distribution to age for male and female
适合男性和女性的威布尔分布
male.weib <- fitdistr(male.data, "weibull")
female.weib <- fitdistr(female.data, "weibull")
male.shape <- male.weib$estimate[1]
male.scale <- male.weib$estimate[2]
female.shape <- female.weib$estimate[1]
female.scale <- female.weib$estimate[2]
Add column "Age_Median" to 'age_gender2' df with median age. Need to transpose as 'age.all' is an 1 row X 18 column vector.
将“年龄中位数”列添加到“年龄中位数2”df中。需要转置为'年龄。所有'都是1行X 18列向量。
age_gender2["Age_Median"] <- t(age.all)
Fit weibull distribution
符合威布尔分布
The function 'pweibull' is a PDF and finds the cumulative probability over all ages, therefore we need to subtract the previous age bin(s) from the present bin to find the probability for that bin and hence (by multiplying by the total male population) the expected population for that bin.
函数“pweibull”是一个PDF,它可以找到所有年龄段的累计概率,因此我们需要从当前的bin中减去先前的age bin(s)来找到该bin的概率,从而(通过乘以总男性人口)得到该bin的期望总体。
male.p.weibull <- matrix(0,1,age.groups)
female.p.weibull <- matrix(0,1,age.groups)
for (i in 1:age.groups){
male.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, male.shape, male.scale) - pweibull(age.all[1,i]-2.5, male.shape, male.scale)
}
for (i in 1:age.groups){
female.p.weibull[1,i] <- pweibull(age.all[1,i]+2.5, female.shape, female.scale) - pweibull(age.all[1,i]-2.5, female.shape, female.scale)
}
Add column to list calculated population per age bin - 'transpose' to 1 x 18 -> 18 row x 1 column vector
将列添加到列表中,计算出每个年龄组的总体——“转置”到1 x 18 -> 18行x 1列向量
age_gender2["male.prob"] <- t(male.p.weibull * total.male)
age_gender2["female.prob"] <- t(female.p.weibull * total.female)
Create bar plots describing Age-Gender population distributions
创建描述年龄-性别人口分布的条形图
Males (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)
雄性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)
agp.male <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Male, fill=Male)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Male Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$male.prob, group=1))
Females (real data) and super-imposed curve showing Weibull calculated probabilities (ggplot2)
女性(真实数据)和超强加曲线显示威布尔计算概率(ggplot2)
agp.female <- ggplot(age_gender2, aes(x=reorder(Age, Order), y=Female, fill=Female)) + geom_bar(stat="identity") + theme (axis.text.x=element_text(angle=45, hjust=1)) + xlab("Age Group (5 yr bin)") + ylab("Female Population (M)") + geom_smooth(aes(age_gender2$Age,age_gender2$female.prob, group=1))