I am in the process of creating an epidemic curve (histogram of number of cases of a disease per day) using R, and am struggling a little with formatting the x-axis.
我正在使用R创建流行曲线(每天疾病病例数的直方图),并且在格式化x轴方面有点挣扎。
I am aware that ggplot gives very nice graphs and easily manipulatable axes ( Understanding dates and plotting a histogram with ggplot2 in R ), but in this case I prefer to use the hist()
command, because I am describing 2 different patterns at the same time, as below (I don't think you can do something similar in ggplot):
我知道ggplot提供了非常好的图形和易于操作的轴(了解日期并在R中用ggplot2绘制直方图),但在这种情况下我更喜欢使用hist()命令,因为我在同一时间描述了两种不同的模式时间,如下(我不认为你可以在ggplot中做类似的事情):
The problem here is that the x-axis does not begin at the first case, has too many tick marks, and I'd like to be able to have a systematic date marker, eg. every 7 days, or every 1st of the month.
这里的问题是x轴不是从第一种情况开始,有太多的刻度线,我希望能够有一个系统的日期标记,例如。每7天,或每月1日。
The data are stored in a database (dat.geo) as one row per suspected case, with info on date of onset and suburb (whether black or white in histogram), as below:
数据存储在数据库(dat.geo)中,每个疑似病例一行,有关于发病日期和郊区的信息(直方图中是黑色还是白色),如下所示:
> head(dat.geo)
number age sex suburb Date_of_Onset
1 1 12 F x 2011-10-11
2 2 28 M x 2011-10-10
3 3 15 F x 2011-10-12
4 4 12 M y 2011-10-25
5 5 10 F x 2011-10-15
6 6 9 M y 2011-10-20
Here is my code:
这是我的代码:
pdf(file='1.epi.curve.pdf')
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="x")], "days",
format = "%d %b %y", freq=T, col=rgb(0,0,0,1), axes=T, main="", add=T)
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="y")], "days",
format = "%d %b %y", freq=T, main="", col=rgb(1,1,1,.6), add=T, axes=F)
dev.off()
I have tried suppressing the axis and adding a manipulated one later using this code
我已经尝试过抑制轴并在以后使用此代码添加一个被操纵的轴
axis(1, labels=T)
axis(2)
but this is what I get (and I have no idea how to manipulate that):
但这就是我得到的(我不知道如何操纵它):
Your help is greatly appreciated!
非常感谢您的帮助!
thanks
谢谢
2 个解决方案
#1
16
Since you effectively challenged us to provide a ggplot
solution, here it is:
既然你有效地挑战我们提供ggplot解决方案,那么它是:
dates <- seq(as.Date("2011-10-01"), length.out=60, by="+1 day")
set.seed(1)
dat <- data.frame(
suburb <- rep(LETTERS[24:26], times=c(100, 200, 300)),
Date_of_Onset <- c(
sample(dates-30, 100, replace=TRUE),
sample(dates, 200, replace=TRUE),
sample(dates+30, 300, replace=TRUE)
)
)
library(scales)
library(ggplot2)
ggplot(dat, aes(x=Date_of_Onset, fill=suburb)) +
stat_bin(binwidth=1, position="identity") +
scale_x_date(breaks=date_breaks(width="1 month"))
Note the use of position="identity"
to force each bar to originate on the axis, otherwise you get a stacked chart by default.
注意使用position =“identity”强制每个条形图在轴上生成,否则默认情况下会得到堆积图表。
#2
8
There are 2 available solutions; 1 using hist() and the other using ggplot():
有2种可用的解决方案; 1使用hist(),另一个使用ggplot():
library(date)
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="x")], "weeks",
format = "%d %b %y", freq=T, col=rgb(0,0,0,1), axes=F, main="")
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="y")], "weeks",
format = "%d %b %y", freq=T, main="", col=rgb(1,1,1,.6), add=T, axes=F)
axis.Date(1, at=seq(as.Date("2011-10-10"), as.Date("2012-03-19"), by="2 weeks"),
format="%d %b %y")
axis.Date(1, at=seq(as.Date("2011-10-10"), as.Date("2012-03-19"), by="weeks"),
labels=F, tcl= -0.5)
This epidemic curve is as below:
这种流行曲线如下:
A solution using ggplot, suggested by Andrie above, is as follows:
使用上面的Andrie建议的ggplot解决方案如下:
library(scales)
library(ggplot2)
ggplot(dat.geo,aes(x=Date_of_Onset, group=suburb, fill=suburb))+
stat_bin(colour="black", binwidth=1, alpha=0.5,
position="identity") + theme_bw()+
xlab("Date of onset of symptoms")+
ylab("Number of cases")+
scale_x_date(breaks=date_breaks("1 month"), labels=date_format("%b %y"))
which gives an epidemic curve as below:
给出如下流行曲线:
#1
16
Since you effectively challenged us to provide a ggplot
solution, here it is:
既然你有效地挑战我们提供ggplot解决方案,那么它是:
dates <- seq(as.Date("2011-10-01"), length.out=60, by="+1 day")
set.seed(1)
dat <- data.frame(
suburb <- rep(LETTERS[24:26], times=c(100, 200, 300)),
Date_of_Onset <- c(
sample(dates-30, 100, replace=TRUE),
sample(dates, 200, replace=TRUE),
sample(dates+30, 300, replace=TRUE)
)
)
library(scales)
library(ggplot2)
ggplot(dat, aes(x=Date_of_Onset, fill=suburb)) +
stat_bin(binwidth=1, position="identity") +
scale_x_date(breaks=date_breaks(width="1 month"))
Note the use of position="identity"
to force each bar to originate on the axis, otherwise you get a stacked chart by default.
注意使用position =“identity”强制每个条形图在轴上生成,否则默认情况下会得到堆积图表。
#2
8
There are 2 available solutions; 1 using hist() and the other using ggplot():
有2种可用的解决方案; 1使用hist(),另一个使用ggplot():
library(date)
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="x")], "weeks",
format = "%d %b %y", freq=T, col=rgb(0,0,0,1), axes=F, main="")
hist(dat.geo$Date_of_Onset[(dat.geo$suburb=="y")], "weeks",
format = "%d %b %y", freq=T, main="", col=rgb(1,1,1,.6), add=T, axes=F)
axis.Date(1, at=seq(as.Date("2011-10-10"), as.Date("2012-03-19"), by="2 weeks"),
format="%d %b %y")
axis.Date(1, at=seq(as.Date("2011-10-10"), as.Date("2012-03-19"), by="weeks"),
labels=F, tcl= -0.5)
This epidemic curve is as below:
这种流行曲线如下:
A solution using ggplot, suggested by Andrie above, is as follows:
使用上面的Andrie建议的ggplot解决方案如下:
library(scales)
library(ggplot2)
ggplot(dat.geo,aes(x=Date_of_Onset, group=suburb, fill=suburb))+
stat_bin(colour="black", binwidth=1, alpha=0.5,
position="identity") + theme_bw()+
xlab("Date of onset of symptoms")+
ylab("Number of cases")+
scale_x_date(breaks=date_breaks("1 month"), labels=date_format("%b %y"))
which gives an epidemic curve as below:
给出如下流行曲线: