I am trying to format Cost and Revenue (both in thousands) and Impressions (in millions) data for a ggplot graph's y-axis labels.
我正在尝试为一个ggplot图形的y轴标签格式化成本和收入(以千计)和印象(以百万计)数据。
My plot runs from 31 days ago to 'yesterday' and uses the min and max values over that period for the ylim(c(min,max))
option. Showing just the Cost example,
我的情节从31天前开始到“昨天”,在这段时间内使用ylim(c(min,max))选项的最小值和最大值。只展示成本的例子,
library(ggplot2)
library(TTR)
set.seed(1984)
#make series
start <- as.Date('2016-01-01')
end <- Sys.Date()
days <- as.numeric(end - start)
#make cost and moving averages
cost <- rnorm(days, mean = 45400, sd = 11640)
date <- seq.Date(from = start, to = end - 1, by = 'day')
cost_7 <- SMA(cost, 7)
cost_30 <- SMA(cost, 30)
df <- data.frame(Date = date, Cost = cost, Cost_7 = cost_7, Cost_30 = cost_30)
# set parameters for window
left <- end - 31
right <- end - 1
# plot series
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
ylim(c(min(df$Cost[df$Date > left]), max(df$Cost[df$Date > left]))) +
xlab("")
I would a) like to represent thousands and millions on the y-axis with commas, and b) like those numbers abbreviated and with 'K' for thousands or 'MM' for millions. I realize b) may be a tall order, but for now a) cannot be accomplished with
我想用逗号来表示成千上万的y轴,b)就像这些数字的缩写,千百万的“K”表示“K”。我知道b)可能是一个艰巨的任务,但目前a)不可能完成
ggplot(...) + ... + ylim(c(min, max)) + scale_y_continuous(labels = comma)
ggplot(…)+…+ ylim(c(min, max)) + scale_y_continuous(标签=逗号)
Because the following error is thrown:
因为抛出了以下错误:
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
I have tried putting the scale_y_continuous(labels = comma)
section after the geom_line()
layer (which throws the error above) or at the end of all the ggplot layers, which overrides my limits in the ylim
call and then throws the error above, anyway.
我尝试过将scale_y_continuous(label =逗号)部分放在geom_line()层之后(它在上面抛出错误),或者放在所有ggplot层的末尾,它覆盖了我在ylim调用中的限制,然后在上面抛出错误。
Any ideas?
什么好主意吗?
2 个解决方案
#1
11
For the comma formatting, you need to include the scales
library for label=comma
. The "error" you discussed is actually just a warning, because you used both ylim
and then scale_y_continuous
. The second call overrides the first. You can instead set the limits and specify comma-separated labels in a single call to scale_y_continuous
:
对于逗号格式,需要包含label=逗号的scale库。您讨论的“错误”实际上只是一个警告,因为您同时使用了ylim和scale_y_continuous。第二个调用覆盖第一个调用。您可以设置限制,并在对scale_y_continuous的单个调用中指定逗号分隔的标签:
library(scales)
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Another option would be to melt your data to long format before plotting, which reduces the amount of code needed and streamlines aesthetic mappings:
另一种选择是在绘制之前将数据转换为长格式,这将减少所需的代码量,并简化美学映射:
library(reshape2)
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Either way, to put the y values in terms of thousands or millions you could divide the y values by 1,000 or 1,000,000. I've used dollar_format()
below, but I think you'll also need to divide by the appropriate power of ten if you use unit_format
(per @joran's suggestion). For example:
不管怎样,要把y的值用千或百万来表示你可以把y的值除以1000或100万。我在下面使用了dollar_format(),但是我认为如果您使用unit_format(根据@joran的建议),您还需要除以10的适当幂。例如:
div=1000
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value/div, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost (Thousands)") +
scale_y_continuous(label=dollar_format(),
limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left]))/div)
Use scale_color_manual
and scale_linetype_manual
to set custom colors and linetypes, if desired.
如果需要,可以使用scale_color_manual和scale_linetype_manual来设置自定义颜色和线类型。
#2
0
Here might be a possible solution for part b).
这可能是b部分的一种可能的解决方案。
On this blog post, a solution in form a function is proposed.
在这篇博文中,提出了一个函数形式的解决方案。
format_si <- function(...) {
limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
1e-9, 1e-6, 1e-3, 1e0, 1e3,
1e6, 1e9, 1e12, 1e15, 1e18,
1e21, 1e24)
prefix <- c("y", "z", "a", "f", "p",
"n", "µ", "m", " ", "k",
"M", "G", "T", "P", "E",
"Z", "Y")
# Vector with array indices according to position in intervals
i <- findInterval(abs(x), limits)
# Set prefix to " " for very small values < 1e-24
i <- ifelse(i==0, which(limits == 1e0), i)
paste(format(round(x/limits[i], 1),
trim=TRUE, scientific=FALSE, ...),
prefix[i])
}
}
return(paste(format(round(x,1), trim=TRUE, scientific=FALSE, ...), p))
}
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=format_si(), limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Needless to say, prefix
can be adapted as one pleases. Here is how the output looks like (dates are in French as R is set to FR on my pc).
毋庸置疑,前缀可以随个人喜好进行调整。这里是输出的样子(日期是法语,因为在我的pc上R被设置为FR)。
#1
11
For the comma formatting, you need to include the scales
library for label=comma
. The "error" you discussed is actually just a warning, because you used both ylim
and then scale_y_continuous
. The second call overrides the first. You can instead set the limits and specify comma-separated labels in a single call to scale_y_continuous
:
对于逗号格式,需要包含label=逗号的scale库。您讨论的“错误”实际上只是一个警告,因为您同时使用了ylim和scale_y_continuous。第二个调用覆盖第一个调用。您可以设置限制,并在对scale_y_continuous的单个调用中指定逗号分隔的标签:
library(scales)
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Another option would be to melt your data to long format before plotting, which reduces the amount of code needed and streamlines aesthetic mappings:
另一种选择是在绘制之前将数据转换为长格式,这将减少所需的代码量,并简化美学映射:
library(reshape2)
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost") +
scale_y_continuous(label=comma, limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Either way, to put the y values in terms of thousands or millions you could divide the y values by 1,000 or 1,000,000. I've used dollar_format()
below, but I think you'll also need to divide by the appropriate power of ten if you use unit_format
(per @joran's suggestion). For example:
不管怎样,要把y的值用千或百万来表示你可以把y的值除以1000或100万。我在下面使用了dollar_format(),但是我认为如果您使用unit_format(根据@joran的建议),您还需要除以10的适当幂。例如:
div=1000
ggplot(melt(df, id.var="Date"),
aes(x = Date, y = value/div, color=variable, linetype=variable))+
geom_line() +
xlim(c(left, right)) +
labs(x="", y="Cost (Thousands)") +
scale_y_continuous(label=dollar_format(),
limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left]))/div)
Use scale_color_manual
and scale_linetype_manual
to set custom colors and linetypes, if desired.
如果需要,可以使用scale_color_manual和scale_linetype_manual来设置自定义颜色和线类型。
#2
0
Here might be a possible solution for part b).
这可能是b部分的一种可能的解决方案。
On this blog post, a solution in form a function is proposed.
在这篇博文中,提出了一个函数形式的解决方案。
format_si <- function(...) {
limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12,
1e-9, 1e-6, 1e-3, 1e0, 1e3,
1e6, 1e9, 1e12, 1e15, 1e18,
1e21, 1e24)
prefix <- c("y", "z", "a", "f", "p",
"n", "µ", "m", " ", "k",
"M", "G", "T", "P", "E",
"Z", "Y")
# Vector with array indices according to position in intervals
i <- findInterval(abs(x), limits)
# Set prefix to " " for very small values < 1e-24
i <- ifelse(i==0, which(limits == 1e0), i)
paste(format(round(x/limits[i], 1),
trim=TRUE, scientific=FALSE, ...),
prefix[i])
}
}
return(paste(format(round(x,1), trim=TRUE, scientific=FALSE, ...), p))
}
ggplot(df, aes(x = Date, y = Cost))+
geom_line(lwd = 0.5) +
geom_line(aes(y = Cost_7), col = 'red', linetype = 3, lwd = 1) +
geom_line(aes(y = Cost_30), col = 'blue', linetype = 5, lwd = 0.75) +
xlim(c(left, right)) +
xlab("") +
scale_y_continuous(label=format_si(), limits=c(min(df$Cost[df$Date > left]),
max(df$Cost[df$Date > left])))
Needless to say, prefix
can be adapted as one pleases. Here is how the output looks like (dates are in French as R is set to FR on my pc).
毋庸置疑,前缀可以随个人喜好进行调整。这里是输出的样子(日期是法语,因为在我的pc上R被设置为FR)。