Say we have the following simple data-frame of date-value pairs, where some dates are missing in the sequence (i.e. Jan 12 thru Jan 14). When I plot the points, it shows these missing dates on the x-axis, but there are no points corresponding to those dates. I want to prevent these missing dates from showing up in the x-axis, so that the point sequence has no breaks. Any suggestions on how to do this? Thanks!
假设我们有以下简单的数据-值对数据框架,其中一些日期在序列中丢失(即1月12日至1月14日)。当我绘制这些点的时候,它会在x轴上显示这些缺失的日期,但是没有对应这些日期的点。我想要防止这些缺失的日期出现在x轴上,这样点序列就没有中断。有什么建议吗?谢谢!
dts <- c(as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16')))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
3 个解决方案
#1
8
Turn the date data into a factor then. At the moment, ggplot is interpreting the data in the sense you have told it the data are in - a continuous date scale. You don't want that scale, you want a categorical scale:
然后把日期数据转换成一个因子。目前,ggplot正在解释数据的意义,你已经告诉它数据是一个连续的日期范围。你不想要那种规模,你想要一个绝对规模:
require(ggplot2)
dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
versus
与
df <- data.frame(dt = factor(format(dts, format = '%d%b')),
val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
which produces:
生产:
Is that what you wanted?
这就是你想要的吗?
#2
7
I made a package that does this. It's called bdscale
and it's on CRAN and github. Shameless plug.
我做了一个这样的包。它叫bdscale,它在CRAN和github上。无耻的插头。
To replicate your example:
复制你的例子:
> library(bdscale)
> library(ggplot2)
> library(scales)
> dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=dts, labels=date_format('%d%b'))
But what you probably want is to load known valid dates, then plot your data using the valid dates on the x-axis:
但是您可能想要加载已知的有效日期,然后使用x轴上的有效日期来绘制数据:
> nyse <- bdscale::yahoo('SPY') # get valid dates from SPY prices
> dts <- as.Date('2011-01-10') + 1:10
> df <- data.frame(dt=dts, val=seq_along(dts))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=nyse, labels=date_format('%d%b'), max.major.breaks=10)
Warning message:
Removed 3 rows containing missing values (geom_point).
The warning is telling you that it removed three dates:
这个警告告诉你,它删除了三个日期:
- 15th = Saturday
- = 15日星期六
- 16th = Sunday
- 16日周日=
- 17th = MLK Day
- 17 =马丁路德•金纪念日
#3
5
First question is : why do you want to do that? There is no point in showing a coordinate-based plot if your axes are not coordinates. If you really want to do this, you can convert to a factor. Be careful for the order though :
第一个问题是:你为什么要这么做?如果坐标轴不是坐标,那么显示一个基于坐标的图是没有意义的。如果你真的想这样做,你可以转换成一个因子。不过要注意顺序:
dts <- c(as.Date( c('31-10-2011', '01-11-2011', '02-11-2011',
'05-11-2011'),format="%d-%m-%Y"))
dtsf <- format(dts, format= '%d%b')
df <- data.frame(dt=ordered(dtsf,levels=dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
With factors you have to be careful, as the order is arbitrary in a factor,unless you make it an ordered factor. As factors are ordered alphabetically by default, you can get in trouble with some date formats. So be careful what you do. If you don't take the order into account, you get :
考虑因素,你必须小心,因为顺序是任意的因子,除非你使它成为一个有序的因子。在默认情况下按字母顺序排序,您可能会遇到一些日期格式的麻烦。所以要小心你所做的。如果你不考虑订单,你会:
df <- data.frame(dt=factor(dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
#1
8
Turn the date data into a factor then. At the moment, ggplot is interpreting the data in the sense you have told it the data are in - a continuous date scale. You don't want that scale, you want a categorical scale:
然后把日期数据转换成一个因子。目前,ggplot正在解释数据的意义,你已经告诉它数据是一个连续的日期范围。你不想要那种规模,你想要一个绝对规模:
require(ggplot2)
dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
df <- data.frame(dt = dts, val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point() +
scale_x_date(format = '%d%b', major='days')
versus
与
df <- data.frame(dt = factor(format(dts, format = '%d%b')),
val = seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
which produces:
生产:
Is that what you wanted?
这就是你想要的吗?
#2
7
I made a package that does this. It's called bdscale
and it's on CRAN and github. Shameless plug.
我做了一个这样的包。它叫bdscale,它在CRAN和github上。无耻的插头。
To replicate your example:
复制你的例子:
> library(bdscale)
> library(ggplot2)
> library(scales)
> dts <- as.Date( c('2011-01-10', '2011-01-11', '2011-01-15', '2011-01-16'))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=dts, labels=date_format('%d%b'))
But what you probably want is to load known valid dates, then plot your data using the valid dates on the x-axis:
但是您可能想要加载已知的有效日期,然后使用x轴上的有效日期来绘制数据:
> nyse <- bdscale::yahoo('SPY') # get valid dates from SPY prices
> dts <- as.Date('2011-01-10') + 1:10
> df <- data.frame(dt=dts, val=seq_along(dts))
> ggplot(df, aes(x=dt, y=val)) + geom_point() +
scale_x_bd(business.dates=nyse, labels=date_format('%d%b'), max.major.breaks=10)
Warning message:
Removed 3 rows containing missing values (geom_point).
The warning is telling you that it removed three dates:
这个警告告诉你,它删除了三个日期:
- 15th = Saturday
- = 15日星期六
- 16th = Sunday
- 16日周日=
- 17th = MLK Day
- 17 =马丁路德•金纪念日
#3
5
First question is : why do you want to do that? There is no point in showing a coordinate-based plot if your axes are not coordinates. If you really want to do this, you can convert to a factor. Be careful for the order though :
第一个问题是:你为什么要这么做?如果坐标轴不是坐标,那么显示一个基于坐标的图是没有意义的。如果你真的想这样做,你可以转换成一个因子。不过要注意顺序:
dts <- c(as.Date( c('31-10-2011', '01-11-2011', '02-11-2011',
'05-11-2011'),format="%d-%m-%Y"))
dtsf <- format(dts, format= '%d%b')
df <- data.frame(dt=ordered(dtsf,levels=dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()
With factors you have to be careful, as the order is arbitrary in a factor,unless you make it an ordered factor. As factors are ordered alphabetically by default, you can get in trouble with some date formats. So be careful what you do. If you don't take the order into account, you get :
考虑因素,你必须小心,因为顺序是任意的因子,除非你使它成为一个有序的因子。在默认情况下按字母顺序排序,您可能会遇到一些日期格式的麻烦。所以要小心你所做的。如果你不考虑订单,你会:
df <- data.frame(dt=factor(dtsf),val=seq_along(dts))
ggplot(df, aes(dt,val)) + geom_point()