I have a large dataset that I would prefer not to split up because it will be rather time consuming. One column contains a list of parks which I want to make separate plots for as each plot belongs somewhere different. Each park needs to be grouped by Zone and Year as time series graphs. The mean for Height_mm also needs to be calculated with standard errors. There are 5 different parks each with 3 different zones and 10 different years. There are over 5000 records in the csv.
我有一个大型数据集,我不想拆分,因为它会相当耗时。一列包含一个公园列表,我想制作单独的图,因为每个图都属于不同的地方。每个公园需要按区域和年份分组为时间序列图。 Height_mm的平均值也需要用标准误差计算。有5个不同的公园,每个公园有3个不同的区域和10个不同的年份。 csv中有超过5000条记录。
head(data)
Park_name Zone Year Height_mm
1 Park1 Zone1 2011 380
2 Park1 Zone1 2011 510
3 Park1 Zone1 2011 270
4 Park1 Zone2 2011 270
5 Park1 Zone2 2011 230
6 Park1 Zone2 2011 330
I would like to be able to manipulate the code below to make this work though I just can't figure it out. I'll gladly take any other suggestions though.
我希望能够操纵下面的代码来完成这项工作,虽然我无法弄明白。我很乐意接受任何其他建议。
library(ggplot2)
library(plyr)
data=read.table("C:/data.csv", sep=",", header=TRUE)
ggplot(data, aes(x=Year, y=Height_mm)) +
#geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.05, colour="black", position=pd) +
geom_line() +
geom_point(size=3, fill="black") +
xlab("Year") +
ylab("Mean height (mm)") +
#facet_wrap(~Park_name, scales = "free", ncol=2) + #I'd like something like this but with all plots as separate figures
theme_bw() +
theme(axis.text.x=theme_text(),
#axis.title.x=theme_blank(),
#axis.title.y=theme_blank(),
axis.line=theme_segment(colour="black"),
panel.grid.minor = theme_blank(),
panel.grid.major = theme_blank(),
panel.border=theme_blank(),
panel.background=theme_blank(),
legend.justification=c(10,10), legend.position=c(10,10),
legend.title = theme_text(),
legend.key = theme_blank()
)
I'm assuming I need a 'for' loop of some kind though I don't know where to put it or how to use it. Thanks
我假设我需要一个'for'循环,虽然我不知道放在哪里或如何使用它。谢谢
1 个解决方案
#1
1
It seems that you would like to do something similar to the following. If I missunderstood your question, please revise your question. You may also want to provide data from more than one park, zone and year.
看来你想做类似以下的事情。如果我错过了您的问题,请修改您的问题。您可能还希望提供来自多个停车场,区域和年份的数据。
# load packages
require(ggplot2)
require(plyr)
# read data
Y <- read.table("C:/data.csv", sep=",", header=TRUE)
# define the theme
th <- theme_bw() +
theme(axis.text.x=element_text(),
axis.line=element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.background=element_blank(),
legend.justification=c(10,10), legend.position=c(10,10),
legend.title = element_text(),
legend.key = element_blank()
)
# determine park levels
parks <- levels(Y[,"Park_name"])
# apply seperately for each park
p <- lapply(parks, function(park) {
ggplot(Y[Y[, "Park_name"]==park,], aes(x=as.factor(Year), y=Height_mm)) +
facet_grid(Zone~.) + # show each zone in a seperate facet
geom_point() + # plot the actual heights (if desired)
# plot the mean and confidence interval
stat_summary(fun.data="mean_cl_boot", color="red")
})
# finally print your plots
lapply(p, function(x) print(x+th))
#1
1
It seems that you would like to do something similar to the following. If I missunderstood your question, please revise your question. You may also want to provide data from more than one park, zone and year.
看来你想做类似以下的事情。如果我错过了您的问题,请修改您的问题。您可能还希望提供来自多个停车场,区域和年份的数据。
# load packages
require(ggplot2)
require(plyr)
# read data
Y <- read.table("C:/data.csv", sep=",", header=TRUE)
# define the theme
th <- theme_bw() +
theme(axis.text.x=element_text(),
axis.line=element_line(colour="black"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.background=element_blank(),
legend.justification=c(10,10), legend.position=c(10,10),
legend.title = element_text(),
legend.key = element_blank()
)
# determine park levels
parks <- levels(Y[,"Park_name"])
# apply seperately for each park
p <- lapply(parks, function(park) {
ggplot(Y[Y[, "Park_name"]==park,], aes(x=as.factor(Year), y=Height_mm)) +
facet_grid(Zone~.) + # show each zone in a seperate facet
geom_point() + # plot the actual heights (if desired)
# plot the mean and confidence interval
stat_summary(fun.data="mean_cl_boot", color="red")
})
# finally print your plots
lapply(p, function(x) print(x+th))