加速大型数据集的plot()函数

时间:2021-11-29 02:43:52

I am using plot() for over 1 mln data points and it turns out to be very slow.

我使用plot()超过1百万个数据点,结果非常慢。

Is there any way to improve the speed including programming and hardware solutions (more RAM, graphic card...)?

有没有办法提高速度,包括编程和硬件解决方案(更多内存,图形卡......)?

Where are data for plot stored?

存储数据在哪里存储?

3 个解决方案

#1


23  

A hexbin plot actually shows you something (unlike the scatterplot @Roland proposes in the comments, which is likely to just be a giant, slow, blob) and takes about 3.5 seconds on my machine for your example:

一个hexbin图实际上显示了一些东西(不像散点图@Roland在评论中提出的,可能只是一个巨大的,缓慢的,blob)并且在我的机器上花了大约3.5秒为你的例子:

set.seed(101)
a<-rnorm(1E7,1,1)
b<-rnorm(1E7,1,1)
library(hexbin)
system.time(plot(hexbin(a,b)))

加速大型数据集的plot()函数

#2


9  

an easy and fast way is to set pch='.' . The performance is shown below

一种简单快捷的方法是设置pch ='。' 。性能如下所示

x=rnorm(10^6)
> system.time(plot(x))
  user  system elapsed 
  2.87   15.32   18.74 
> system.time(plot(x,pch=20))
  user  system elapsed 
  3.59   22.20   26.16 
> system.time(plot(x,pch='.'))
  user  system elapsed 
  1.78    2.26    4.06 

#3


2  

have you looked at the tabplot package. it is designed specifically for large data http://cran.r-project.org/web/packages/tabplot/ I use that its faster than using hexbin (or even the default sunflower plots for overplotting)

你看过tabplot包吗?它是专门为大数据设计的http://cran.r-project.org/web/packages/tabplot/我使用它比使用hexbin更快(甚至是用于过度绘图的默认向日葵图)

also i think Hadley wrote something on DS 's blog modifying ggplot for big data at http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html

另外我认为Hadley在DS的博客上写了一些内容,在http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html修改大数据的ggplot

"""I'm currently with working another student, Yue Hu, to turn our research into a robust R package.""" October 21, 2011

“”我目前正在与另一名学生Yue Hu一起工作,将我们的研究转变为一个强大的R包。“”“2011年10月21日

Maybe we can ask Hadley if the updated ggplot3 is ready

也许我们可以问哈德利是否已经准备好更新的ggplot3

#1


23  

A hexbin plot actually shows you something (unlike the scatterplot @Roland proposes in the comments, which is likely to just be a giant, slow, blob) and takes about 3.5 seconds on my machine for your example:

一个hexbin图实际上显示了一些东西(不像散点图@Roland在评论中提出的,可能只是一个巨大的,缓慢的,blob)并且在我的机器上花了大约3.5秒为你的例子:

set.seed(101)
a<-rnorm(1E7,1,1)
b<-rnorm(1E7,1,1)
library(hexbin)
system.time(plot(hexbin(a,b)))

加速大型数据集的plot()函数

#2


9  

an easy and fast way is to set pch='.' . The performance is shown below

一种简单快捷的方法是设置pch ='。' 。性能如下所示

x=rnorm(10^6)
> system.time(plot(x))
  user  system elapsed 
  2.87   15.32   18.74 
> system.time(plot(x,pch=20))
  user  system elapsed 
  3.59   22.20   26.16 
> system.time(plot(x,pch='.'))
  user  system elapsed 
  1.78    2.26    4.06 

#3


2  

have you looked at the tabplot package. it is designed specifically for large data http://cran.r-project.org/web/packages/tabplot/ I use that its faster than using hexbin (or even the default sunflower plots for overplotting)

你看过tabplot包吗?它是专门为大数据设计的http://cran.r-project.org/web/packages/tabplot/我使用它比使用hexbin更快(甚至是用于过度绘图的默认向日葵图)

also i think Hadley wrote something on DS 's blog modifying ggplot for big data at http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html

另外我认为Hadley在DS的博客上写了一些内容,在http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html修改大数据的ggplot

"""I'm currently with working another student, Yue Hu, to turn our research into a robust R package.""" October 21, 2011

“”我目前正在与另一名学生Yue Hu一起工作,将我们的研究转变为一个强大的R包。“”“2011年10月21日

Maybe we can ask Hadley if the updated ggplot3 is ready

也许我们可以问哈德利是否已经准备好更新的ggplot3