有效地绘制R中的数百万个数据点

时间:2021-12-12 00:05:59

I'm trying to plot some million data points in R. I'm currently using ggplot2 (but I'm open to suggestions of alternate packages). The problem is that the graph takes too long to render (often upwards of a minute). I'm looking for ways to do this faster -- in real time ideally. I would appreciate any help -- attaching code to the question for clarity.

我正在尝试在R中绘制数百万个数据点。我目前正在使用ggplot2(但我愿意接受备用软件包的建议)。问题是图形渲染时间太长(通常超过一分钟)。我正在寻找方法来更快地做到这一点 - 理想的实时。我将不胜感激任何帮助 - 为了清楚起见,将代码附加到问题上。

Creating a (random) data frame with ~500000 data points:

使用~500000个数据点创建(随机)数据框:

letters <- c("A", "B", "C", "D", "E", "F", "G")
myLetters <- sample(x = letters, size = 100000, replace = T)
direction <- c("x", "y", "z")
factor1 <- sample(x = direction, size = 100000, replace = T)
factor2 <- runif(100000, 0, 20)
factor3 <- runif(100000, 0, 100)
decile <- sample(x = 1:10, size = 100000, replace = T)


new.plot.df <- data.frame(letters = myLetters, factor1 = factor1, factor2 = factor2, 
                      factor3 = factor3, decile = decile)

Now, plotting the data:

现在,绘制数据:

color.plot <- ggplot(new.plot.df, aes(x = factor3, y = factor2, color = factor1)) +
geom_point(aes(alpha = factor2)) +
facet_grid(decile ~ letters)

有效地绘制R中的数百万个数据点

How do I make the rendering faster?

如何使渲染更快?

1 个解决方案

#1


1  

In general there are two strategies that I use for this:

一般来说,我使用两种策略:

1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render.

1)如评论中所述,对数据进行合理的描述性样本不会影响您的绘图,您将减少要渲染的点数。

2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.

2)我使用的一个技巧实际上是创建对象而不显示绘图,而是将绘图保存为PNG图像。这实际上加快了这个过程,因为当你打开图像时,它将是一个光栅而不是一个矢量图像。

#1


1  

In general there are two strategies that I use for this:

一般来说,我使用两种策略:

1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render.

1)如评论中所述,对数据进行合理的描述性样本不会影响您的绘图,您将减少要渲染的点数。

2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.

2)我使用的一个技巧实际上是创建对象而不显示绘图,而是将绘图保存为PNG图像。这实际上加快了这个过程,因为当你打开图像时,它将是一个光栅而不是一个矢量图像。