绘制所有数据点的平滑线

时间:2021-12-12 00:06:05

I'm trying to plot a smooth line that runs directly through all my data points and has a gradient based on another variable. Theoretically polynomial interpolation would get the job done but I'm not sure how I would do that with ggplot. This is what I've come up with so far:

我正在尝试绘制一条直接穿过我所有数据点的平滑线,并且具有基于另一个变量的渐变。理论上多项式插值可以完成工作,但我不确定如何使用ggplot。这是我到目前为止所提出的:

DATA:

 dayofweek hour impressions conversions      cvr
         1    0     3997982       352.0 8.80e-05
         1    1     3182678       321.2 1.01e-04
         1    2     2921004       248.6 8.51e-05
         1    3     1708627       115.6 6.77e-05
         1    4     1225059        98.4 8.03e-05
         1    5     1211708        62.0 5.12e-05
         1    6     1653280       150.0 9.07e-05
         1    7     2511577       309.4 1.23e-04
         1    8     3801969       397.8 1.05e-04
         1    9     5144399       573.0 1.11e-04
         1   10     5770269       675.6 1.17e-04
         1   11     6936943       869.8 1.25e-04
         1   12     7953053       996.4 1.25e-04
         1   13     8711737      1117.8 1.28e-04
         1   14     9114872      1217.4 1.34e-04
         1   15     9257161      1155.2 1.25e-04
         1   16     8437068      1082.0 1.28e-04
         1   17     8688057      1047.2 1.21e-04
         1   18     9200450      1114.0 1.21e-04
         1   19     8494295      1086.8 1.28e-04
         1   20     9409142      1092.6 1.16e-04
         1   21    10500000      1266.8 1.21e-04
         1   22     9783073      1196.4 1.22e-04
         1   23     8225267       812.0 9.87e-05

R CODE:

ggplot(d) + 
  geom_line(aes(y=impressions, x=hour, color=cvr)) +
  stat_smooth(aes(y=impressions, x=hour), method = lm, formula = y ~ poly(x, 10), se = FALSE)

So I can get the gradient I want using geom_line but its not smooth. With stat_smooth I get a smooth line but it doesn't run through all the data points and doesn't have the gradient I want. Any ideas of how to accomplish this?

所以我可以使用geom_line获得我想要的渐变但不平滑。使用stat_smooth,我得到一条平滑的线,但它不会遍历所有数据点,也没有我想要的渐变。有关如何实现这一目标的任何想法?

绘制所有数据点的平滑线

1 个解决方案

#1


19  

A polynomial interpolation in the sense that you are using it is probably not the best idea, if you want it to go through all of your points. You have 24 points, which would need a polynomial of order 23, if it should go through all the points. I can't seem to use poly with degree 23, but using a lesser degree is already enough to show you, why this won't work:

如果你希望它通过你所有的点,那么在你使用它的意义上的多项式插值可能不是最好的想法。如果它应该通过所有点,你有24个点,需要23阶的多项式。我似乎无法使用23度的聚合物,但使用较小的程度已经足以告诉你,为什么这不起作用:

ggplot(d) + 
  geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) +
  stat_smooth(aes(x = hour, y = impressions), method = "lm",
              formula = y ~ poly(x, 21), se = FALSE) +
  coord_cartesian(ylim = c(0, 1.5e7))

绘制所有数据点的平滑线

This does more or less go through all the points (and it would indeed, if I managed to use an even higher order polynomial), but otherwise it's probably not the kind of smooth curve you want. A better option is to use interpolation with splines. This is also an interpolation that uses polynomials, but instead of using just one (as you tried), it uses many. They are enforced to go through all the data points in such a way that your curve is continuous.

这或多或少地经历了所有点(如果我设法使用更高阶的多项式,它确实会这样),但是否则它可能不是你想要的那种平滑曲线。更好的选择是使用样条插值。这也是使用多项式的插值,但它不使用一个(如您所尝试的),而是使用多个。它们被强制执行以使曲线连续的方式遍历所有数据点。

As far as I know, this can't be done directly with ggplot, but it can be done using ggalt::geom_xspline.

据我所知,这不能用ggplot直接完成,但可以使用ggalt :: geom_xspline完成。

Here I show a base solution, where the spline interpolation is produced in a separate step:

在这里,我展示了一个基本解决方案,其中样条插值在单独的步骤中生成:

spline_int <- as.data.frame(spline(d$hour, d$impressions))

You need as.data.frame because spline returns a list. Now You can use that new data in the plot with geom_line():

您需要as.data.frame,因为样条线返回一个列表。现在,您可以使用geom_line()在绘图中使用该新数据:

ggplot(d) + 
  geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) +
  geom_line(data = spline_int, aes(x = x, y = y))

绘制所有数据点的平滑线

#1


19  

A polynomial interpolation in the sense that you are using it is probably not the best idea, if you want it to go through all of your points. You have 24 points, which would need a polynomial of order 23, if it should go through all the points. I can't seem to use poly with degree 23, but using a lesser degree is already enough to show you, why this won't work:

如果你希望它通过你所有的点,那么在你使用它的意义上的多项式插值可能不是最好的想法。如果它应该通过所有点,你有24个点,需要23阶的多项式。我似乎无法使用23度的聚合物,但使用较小的程度已经足以告诉你,为什么这不起作用:

ggplot(d) + 
  geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) +
  stat_smooth(aes(x = hour, y = impressions), method = "lm",
              formula = y ~ poly(x, 21), se = FALSE) +
  coord_cartesian(ylim = c(0, 1.5e7))

绘制所有数据点的平滑线

This does more or less go through all the points (and it would indeed, if I managed to use an even higher order polynomial), but otherwise it's probably not the kind of smooth curve you want. A better option is to use interpolation with splines. This is also an interpolation that uses polynomials, but instead of using just one (as you tried), it uses many. They are enforced to go through all the data points in such a way that your curve is continuous.

这或多或少地经历了所有点(如果我设法使用更高阶的多项式,它确实会这样),但是否则它可能不是你想要的那种平滑曲线。更好的选择是使用样条插值。这也是使用多项式的插值,但它不使用一个(如您所尝试的),而是使用多个。它们被强制执行以使曲线连续的方式遍历所有数据点。

As far as I know, this can't be done directly with ggplot, but it can be done using ggalt::geom_xspline.

据我所知,这不能用ggplot直接完成,但可以使用ggalt :: geom_xspline完成。

Here I show a base solution, where the spline interpolation is produced in a separate step:

在这里,我展示了一个基本解决方案,其中样条插值在单独的步骤中生成:

spline_int <- as.data.frame(spline(d$hour, d$impressions))

You need as.data.frame because spline returns a list. Now You can use that new data in the plot with geom_line():

您需要as.data.frame,因为样条线返回一个列表。现在,您可以使用geom_line()在绘图中使用该新数据:

ggplot(d) + 
  geom_point(aes(x = hour, y = impressions, colour = cvr), size = 3) +
  geom_line(data = spline_int, aes(x = x, y = y))

绘制所有数据点的平滑线