计算曲线下的面积

I would like to calculate the area under a curve to do integration without defining a function such as in integrate().

我想计算曲线下的面积来进行积分，而不需要定义一个函数，比如integration()。

My data looks as this:

我的数据如下:

Date          Strike     Volatility
2003-01-01    20         0.2
2003-01-01    30         0.3
2003-01-01    40         0.4
etc.

I plotted plot(strike, volatility) to look at the volatility smile. Is there a way to integrate this plotted "curve"?

我绘制了plot(strike，波动性)来观察波动率的微笑。有没有一种方法可以将这个曲线积分?

7 个解决方案

#1

The AUC is approximated pretty easily by looking at a lot of trapezium figures, each time bound between x_i, x_{i+1}, y{i+1} and y_i. Using the rollmean of the zoo package, you can do:

AUC很容易通过查看大量梯形图来近似，每次都在x_i、x_{i+1}、y{i+1}和y_i之间绑定。使用动物园包裹的滚动，你可以做:

library(zoo)

x <- 1:10
y <- 3*x+25
id <- order(x)

AUC <- sum(diff(x[id])*rollmean(y[id],2))

Make sure you order the x values, or your outcome won't make sense. If you have negative values somewhere along the y axis, you'd have to figure out how exactly you want to define the area under the curve, and adjust accordingly (e.g. using abs() )

确保你订购了x值，否则你的结果就没有意义了。如果你在y轴上的某个地方有负值，你就必须弄清楚你到底想要如何定义曲线下的面积，并相应地进行调整(例如使用abs()))

Regarding your follow-up : if you don't have a formal function, how would you plot it? So if you only have values, the only thing you can approximate is a definite integral. Even if you have the function in R, you can only calculate definite integrals using integrate(). Plotting the formal function is only possible if you can also define it.

关于你的后续行动:如果你没有一个正式的功能，你会如何规划它?如果只有值，唯一可以近似的就是定积分。即使函数在R中，也只能使用integration()计算定积分。绘制形式函数只有在可以定义它的情况下才能实现。

#2

Just add the following to your program and you will get the area under the curve:

只要在程序中添加以下内容，就会得到曲线下的面积:

require(pracma)
AUC = trapz(strike,volatility)

From ?trapz:

从? trapz:

This approach matches exactly the approximation for integrating the function using the trapezoidal rule with basepoints x.

该方法与利用梯形规则的梯形积分法对函数积分的近似近似。

#3

Three more options, including one using a spline method and one using Simpson's rule...

还有三个选项，其中一个使用样条方法，另一个使用辛普森规则……

# get data
n <- 100
mean <- 50
sd <- 50

x <- seq(20, 80, length=n)
y <- dnorm(x, mean, sd) *100

# using sintegral in Bolstad2
require(Bolstad2)
sintegral(x,y)$int

# using auc in MESS
require(MESS)
auc(x,y, type = 'spline')

# using integrate.xy in sfsmisc
require(sfsmisc)
integrate.xy(x,y)

The trapezoidal method is less accurate than the spline method, so MESS::auc (uses spline method) or Bolstad2::sintegral (uses Simpson's rule) should probably be preferred. DIY versions of these (and an additional approach using the quadrature rule) are here: http://www.r-bloggers.com/one-dimensional-integrals/

梯形法不像样条法那样准确，所以混乱:auc(使用样条方法)或Bolstad2::sintegral(使用辛普森法则)应该是首选。这些(以及使用正交规则的另一种方法)的DIY版本在这里:http://www.r- bloggers.com/一维-integrals/。

#4

OK so I arrive a bit late at the party but going over the answers a plain R solution to the problem is missing. Here goes, simple and clean:

好吧，我在聚会上迟到了一点，但是在复习答案的时候，这个问题没有一个简单的答案。简单而干净:

sum(diff(x) * (head(y,-1)+tail(y,-1)))/2

The solution for OP then reads as:

OP的解决方案如下:

sum(diff(strike) * (head(volatility,-1)+tail(volatility,-1)))/2

This effectively calculates the area using the trapezoidal method by taking the average of the "left" and "right" y-values.

这有效地计算了面积使用梯形方法通过取“左”和“右”y值的平均值。

NB: as @Joris already pointed out you could use abs(y) if that would make more sense.

NB:正如@Joris已经指出的那样，如果有意义的话，你可以使用abs(y)。

#5

In the pharmacokinetics (PK) world, calculating different types of AUC is a common and fundamental task. The are lots of different AUC calculations for pharmacokietics, such as

在药代动力学(PK)领域，计算不同类型的AUC是一项常见而基本的任务。这是许多不同的AUC计算药物kietics，例如

AUC0-t = AUC from zero to time t
au0 -t = AUC从0到t
AUC0-last = AUC from zero to the last time point (may be same as above)
AUC0-last = AUC从零到上一个时间点(可能与上面相同)
AUC0-inf = AUC from zero to time infinity
AUC0-inf = AUC从0到time infinity
AUCint = AUC over a time interval
AUCint = AUC除以时间间隔
AUCall = AUC over the whole time period for which data exists
在数据存在的整个时间段内，AUCall = AUC。

One of the best packages which does these calculations is the relatively new package PKNCA from the folks at Pfizer. Check it out.

做这些计算的最佳方案之一是来自辉瑞公司的相对较新的PKNCA方案。检查出来。

#6

Joris Meys's answer was great but I struggled to remove NAs from my samples. Here is the little function I wrote to deal with them :

乔里斯·米斯的回答很好，但我努力想把NAs从我的样本中移除。这是我写来处理它们的一个小函数:

library(zoo) #for the rollmean function

######
#' Calculate the Area Under Curve of y~x
#'
#'@param y Your y values (measures ?)
#'@param x Your x values (time ?)
#'@param start : The first x value 
#'@param stop : The last x value
#'@param na.stop : returns NA if one value is NA
#'@param ex.na.stop : returns NA if the first or the last value is NA
#'
#'@examples 
#'myX = 1:5
#'myY = c(17, 25, NA, 35, 56)
#'auc(myY, myX)
#'auc(myY, myX, na.stop=TRUE)
#'myY = c(17, 25, 28, 35, NA)
#'auc(myY, myX, ex.na.stop=FALSE)
auc = function(y, x, start=first(x), stop=last(x), na.stop=FALSE, ex.na.stop=TRUE){
  if(all(is.na(y))) return(NA)
  bounds = which(x==start):which(x==stop)
  x=x[bounds]
  y=y[bounds]
  r = which(is.na(y))
  if(length(r)>0){
    if(na.stop==TRUE) return(NA)
    if(ex.na.stop==TRUE & (is.na(first(y)) | is.na(last(y)))) return(NA)
    if(is.na(last(y))) warning("Last value is NA, so this AUC is bad and you should feel bad", call. = FALSE) 
    if(is.na(first(y))) warning("First value is NA, so this AUC is bad and you should feel bad", call. = FALSE) 
    x = x[-r]
    y = y[-r]
  }
  sum(diff(x[order(x)])*rollmean(y[order(x)],2))
}

I then use it with an apply onto my dataframe : myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))

然后我将它与我的dataframe一起使用:myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))

Hope it can help noobs like me :-)

希望它能帮助像我这样的人:

EDIT : added bounds

编辑:添加范围

#7

-2

You can use ROCR package, where the following lines will give you the AUC:

你可以使用ROCR包，以下几行会给你AUC:

pred <- prediction(classifier.labels, actual.labs)
attributes(performance(pred, 'auc'))$y.values[[1]]

#1