包装R的绘图函数（或ggplot2）以防止绘制大型数据集

Rather than ask how to plot big data sets, I want to wrap plot so that code that produces a lot of plots doesn't get hammered when it is plotting a large object. How can I wrap plot with a very simple manner so that all of its functionality is preserved, but first tests to determine whether or not the object being passed is too large?

而不是询问如何绘制大数据集,我想包装绘图,以便生成大量绘图的代码在绘制大对象时不会受到重创。如何以非常简单的方式包装绘图,以便保留其所有功能,但首先测试以确定传递的对象是否太大?

This code works for very vanilla calls to plot, but it's missing the same generality as plot (see below).

这段代码适用于非常普遍的绘图调用,但它缺少与绘图相同的通用性(见下文)。

myPlot <- function(x, ...){
    isBad <- any( (length(x) > 10^6) || (object.size(x) > 8*10^6) || (nrow(x) > 10^6) )
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    return(plot(x, ...))
}

x = rnorm(1000)
x = rnorm(10^6 + 1)

myPlot(x)

An example where this fails:

失败的示例:

x = rnorm(1000)
y = rnorm(1000)
plot(y ~ x)
myPlot(y ~ x)

Is there some easy way to wrap plot to enable this checking of the data to be plotted, while still passing through all of the arguments? If not, then how about ggplot2? I'm an equal opportunity non-plotter. (In the cases where the dataset is large, I will use hexbin, sub-sampling, density plots, etc., but that's not the focus here.)

是否有一些简单的方法来包装绘图,以便能够对要绘制的数据进行检查,同时仍然通过所有参数?如果没有,那么ggplot2怎么样?我是一个平等的机会非绘图员。 (在数据集很大的情况下,我将使用hexbin,子采样,密度图等,但这不是重点。)

Note 1: When testing ideas, I recommend testing for size > 100 (or set a variable, e.g. myThreshold <- 1000), rather than versus a size of > 1M - otherwise there will be a lot of pain in hitting the slow plotting. :)

注1:在测试想法时,我建议测试大小> 100(或设置一个变量,例如myThreshold < - 1000),而不是大于> 1M的大小 - 否则在打击慢速绘图时会有很多痛苦。 :)

1 个解决方案

#1

The problem you have is that as currently coded, myplot() assumes x is a data object, but then you try to pass it a formula. R's plot() achieves this via methods - when x is a formula, the plot.formula() method gets dispatched to instead of the basic plot.default() method.

您遇到的问题是,当前编码时,myplot()假设x是一个数据对象,但是您尝试将其传递给公式。 R的plot()通过方法实现了这一点 - 当x是公式时,plot.formula()方法被派遣到而不是基本的plot.default()方法。

You need to do the same:

你需要做同样的事情:

myplot <- function(x, ...)
    UseMethod("myplot")

myplot.default <- function(x, ....) {
    isBad <- any((length(x) > 10^6) || (object.size(x) > 8*10^6) || 
                    (nrow(x) > 10^6))
    if(is.na(isBad)){isBad = FALSE}
    if(isBad){
        stop("No plots for you!")
    }
    invisible(plot(x, ...))
}

myplot.formula <- function(x, ...) {
    ## code here to process the formula into a data object for plotting
    ....
    myplot.default(processed_x, ...)
}

You can steal code from plot.formula() to use in the code needed to process x into an object. Alternatively, you can roll your own following the standard non-standard evaluation rules (PDF).

您可以从plot.formula()中窃取代码,以便在将x处理成对象所需的代码中使用。或者,您可以按照标准的非标准评估规则(PDF)自行推出。

#1