Using ggplot2
, I want to create a histogram where anything above X is grouped into the final bin. For example, if most of my distribution was between 100 and 200, and I wanted to bin by 10, I would want anything above 200 to be binned in "200+".
使用ggplot2,我想创建一个直方图,其中X以上的任何内容都被分组到最终的bin中。例如,如果我的大多数发行版都在100到200之间,并且我希望以10分为单位,那么我希望将200以上的任何内容分成“200+”。
# create some fake data
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#plot the data
hist <- ggplot(df, aes(x=visits)) + geom_histogram(binwidth=50)
How can I limit the X axis, while still representing the data I want limit?
如何限制X轴,同时仍然表示我想要限制的数据?
2 个解决方案
#1
5
Perhaps you're looking for the breaks
argument for geom_histogram
:
也许你正在寻找geom_histogram的break参数:
# create some fake data
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#plot the data
require(ggplot2)
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10), max(visits)), position = "identity") +
coord_cartesian(xlim=c(0,210))
This would look like this (with the caveats that the fake data looks pretty bad here and the axis need to be adjusted as well to match the breaks):
这看起来像这样(警告说这里假数据看起来很糟糕,轴也需要调整以匹配断点):
Edit:
Maybe someone else can weigh in here:
也许其他人可以在这里权衡:
# create breaks and labels
brks <- c(seq(0, 200, by=10), max(visits))
lbls <- c(as.character(seq(0, 190, by=10)), "200+", "")
# true
length(brks)==length(lbls)
# hmmm
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=brks, position = "identity") +
coord_cartesian(xlim=c(0,220)) +
scale_x_continuous(labels=lbls)
The plot errors with:
情节错误:
Error in scale_labels.continuous(scale) :
Breaks and labels are different lengths
Which looks like this but that was fixed 8 months ago.
这看起来像这样但是8个月前修复了。
#2
3
If you want to fudge it a little to get around the issues of bin labelling then just subset your data and create the binned values in a new sacrificial data-frame:
如果你想稍微捏一下它来解决bin标签的问题,那么只需将数据子集化并在新的牺牲数据框中创建分箱值:
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#create sacrificical data frame
dfsac <- df
dfsac$visits[dfsac$visits > 200 ] <- 200
Then use the breaks
command in scale_x_continuous
to define your bin labels easily:
然后使用scale_x_continuous中的breaks命令轻松定义bin标签:
ggplot(data=dfsac, aes(dfsac$visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10)),
col="black",
fill="red") +
labs(x="Visits", y="Count")+
scale_x_continuous(limits=c(0, 200), breaks=c(seq(0, 200, by=10)), labels=c(seq(0,190, by=10), "200+"))
#1
5
Perhaps you're looking for the breaks
argument for geom_histogram
:
也许你正在寻找geom_histogram的break参数:
# create some fake data
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#plot the data
require(ggplot2)
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10), max(visits)), position = "identity") +
coord_cartesian(xlim=c(0,210))
This would look like this (with the caveats that the fake data looks pretty bad here and the axis need to be adjusted as well to match the breaks):
这看起来像这样(警告说这里假数据看起来很糟糕,轴也需要调整以匹配断点):
Edit:
Maybe someone else can weigh in here:
也许其他人可以在这里权衡:
# create breaks and labels
brks <- c(seq(0, 200, by=10), max(visits))
lbls <- c(as.character(seq(0, 190, by=10)), "200+", "")
# true
length(brks)==length(lbls)
# hmmm
ggplot(df, aes(x=visits)) +
geom_histogram(breaks=brks, position = "identity") +
coord_cartesian(xlim=c(0,220)) +
scale_x_continuous(labels=lbls)
The plot errors with:
情节错误:
Error in scale_labels.continuous(scale) :
Breaks and labels are different lengths
Which looks like this but that was fixed 8 months ago.
这看起来像这样但是8个月前修复了。
#2
3
If you want to fudge it a little to get around the issues of bin labelling then just subset your data and create the binned values in a new sacrificial data-frame:
如果你想稍微捏一下它来解决bin标签的问题,那么只需将数据子集化并在新的牺牲数据框中创建分箱值:
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)
#merge to create a dataframe
df <- data.frame(cbind(id,visits))
#create sacrificical data frame
dfsac <- df
dfsac$visits[dfsac$visits > 200 ] <- 200
Then use the breaks
command in scale_x_continuous
to define your bin labels easily:
然后使用scale_x_continuous中的breaks命令轻松定义bin标签:
ggplot(data=dfsac, aes(dfsac$visits)) +
geom_histogram(breaks=c(seq(0, 200, by=10)),
col="black",
fill="red") +
labs(x="Visits", y="Count")+
scale_x_continuous(limits=c(0, 200), breaks=c(seq(0, 200, by=10)), labels=c(seq(0,190, by=10), "200+"))