I would like to calculate the area under a curve to do integration without defining a function such as in integrate()
.
我想计算曲线下的面积来进行积分,而不需要定义一个函数,比如integration()。
My data looks as this:
我的数据如下:
Date Strike Volatility
2003-01-01 20 0.2
2003-01-01 30 0.3
2003-01-01 40 0.4
etc.
I plotted plot(strike, volatility)
to look at the volatility smile. Is there a way to integrate this plotted "curve"?
我绘制了plot(strike,波动性)来观察波动率的微笑。有没有一种方法可以将这个曲线积分?
7 个解决方案
#1
33
The AUC is approximated pretty easily by looking at a lot of trapezium figures, each time bound between x_i
, x_{i+1}
, y{i+1}
and y_i
. Using the rollmean of the zoo package, you can do:
AUC很容易通过查看大量梯形图来近似,每次都在x_i、x_{i+1}、y{i+1}和y_i之间绑定。使用动物园包裹的滚动,你可以做:
library(zoo)
x <- 1:10
y <- 3*x+25
id <- order(x)
AUC <- sum(diff(x[id])*rollmean(y[id],2))
Make sure you order the x values, or your outcome won't make sense. If you have negative values somewhere along the y axis, you'd have to figure out how exactly you want to define the area under the curve, and adjust accordingly (e.g. using abs()
)
确保你订购了x值,否则你的结果就没有意义了。如果你在y轴上的某个地方有负值,你就必须弄清楚你到底想要如何定义曲线下的面积,并相应地进行调整(例如使用abs()))
Regarding your follow-up : if you don't have a formal function, how would you plot it? So if you only have values, the only thing you can approximate is a definite integral. Even if you have the function in R, you can only calculate definite integrals using integrate()
. Plotting the formal function is only possible if you can also define it.
关于你的后续行动:如果你没有一个正式的功能,你会如何规划它?如果只有值,唯一可以近似的就是定积分。即使函数在R中,也只能使用integration()计算定积分。绘制形式函数只有在可以定义它的情况下才能实现。
#2
29
Just add the following to your program and you will get the area under the curve:
只要在程序中添加以下内容,就会得到曲线下的面积:
require(pracma)
AUC = trapz(strike,volatility)
From ?trapz
:
从? trapz:
This approach matches exactly the approximation for integrating the function using the trapezoidal rule with basepoints x.
该方法与利用梯形规则的梯形积分法对函数积分的近似近似。
#3
17
Three more options, including one using a spline method and one using Simpson's rule...
还有三个选项,其中一个使用样条方法,另一个使用辛普森规则……
# get data
n <- 100
mean <- 50
sd <- 50
x <- seq(20, 80, length=n)
y <- dnorm(x, mean, sd) *100
# using sintegral in Bolstad2
require(Bolstad2)
sintegral(x,y)$int
# using auc in MESS
require(MESS)
auc(x,y, type = 'spline')
# using integrate.xy in sfsmisc
require(sfsmisc)
integrate.xy(x,y)
The trapezoidal method is less accurate than the spline method, so MESS::auc
(uses spline method) or Bolstad2::sintegral
(uses Simpson's rule) should probably be preferred. DIY versions of these (and an additional approach using the quadrature rule) are here: http://www.r-bloggers.com/one-dimensional-integrals/
梯形法不像样条法那样准确,所以混乱:auc(使用样条方法)或Bolstad2::sintegral(使用辛普森法则)应该是首选。这些(以及使用正交规则的另一种方法)的DIY版本在这里:http://www.r- bloggers.com/一维-integrals/。
#4
10
OK so I arrive a bit late at the party but going over the answers a plain R
solution to the problem is missing. Here goes, simple and clean:
好吧,我在聚会上迟到了一点,但是在复习答案的时候,这个问题没有一个简单的答案。简单而干净:
sum(diff(x) * (head(y,-1)+tail(y,-1)))/2
The solution for OP then reads as:
OP的解决方案如下:
sum(diff(strike) * (head(volatility,-1)+tail(volatility,-1)))/2
This effectively calculates the area using the trapezoidal method by taking the average of the "left" and "right" y-values.
这有效地计算了面积使用梯形方法通过取“左”和“右”y值的平均值。
NB: as @Joris already pointed out you could use abs(y)
if that would make more sense.
NB:正如@Joris已经指出的那样,如果有意义的话,你可以使用abs(y)。
#5
3
In the pharmacokinetics (PK) world, calculating different types of AUC is a common and fundamental task. The are lots of different AUC calculations for pharmacokietics, such as
在药代动力学(PK)领域,计算不同类型的AUC是一项常见而基本的任务。这是许多不同的AUC计算药物kietics,例如
- AUC0-t = AUC from zero to time t
- au0 -t = AUC从0到t
- AUC0-last = AUC from zero to the last time point (may be same as above)
- AUC0-last = AUC从零到上一个时间点(可能与上面相同)
- AUC0-inf = AUC from zero to time infinity
- AUC0-inf = AUC从0到time infinity
- AUCint = AUC over a time interval
- AUCint = AUC除以时间间隔
- AUCall = AUC over the whole time period for which data exists
- 在数据存在的整个时间段内,AUCall = AUC。
One of the best packages which does these calculations is the relatively new package PKNCA
from the folks at Pfizer. Check it out.
做这些计算的最佳方案之一是来自辉瑞公司的相对较新的PKNCA方案。检查出来。
#6
0
Joris Meys's answer was great but I struggled to remove NAs from my samples. Here is the little function I wrote to deal with them :
乔里斯·米斯的回答很好,但我努力想把NAs从我的样本中移除。这是我写来处理它们的一个小函数:
library(zoo) #for the rollmean function
######
#' Calculate the Area Under Curve of y~x
#'
#'@param y Your y values (measures ?)
#'@param x Your x values (time ?)
#'@param start : The first x value
#'@param stop : The last x value
#'@param na.stop : returns NA if one value is NA
#'@param ex.na.stop : returns NA if the first or the last value is NA
#'
#'@examples
#'myX = 1:5
#'myY = c(17, 25, NA, 35, 56)
#'auc(myY, myX)
#'auc(myY, myX, na.stop=TRUE)
#'myY = c(17, 25, 28, 35, NA)
#'auc(myY, myX, ex.na.stop=FALSE)
auc = function(y, x, start=first(x), stop=last(x), na.stop=FALSE, ex.na.stop=TRUE){
if(all(is.na(y))) return(NA)
bounds = which(x==start):which(x==stop)
x=x[bounds]
y=y[bounds]
r = which(is.na(y))
if(length(r)>0){
if(na.stop==TRUE) return(NA)
if(ex.na.stop==TRUE & (is.na(first(y)) | is.na(last(y)))) return(NA)
if(is.na(last(y))) warning("Last value is NA, so this AUC is bad and you should feel bad", call. = FALSE)
if(is.na(first(y))) warning("First value is NA, so this AUC is bad and you should feel bad", call. = FALSE)
x = x[-r]
y = y[-r]
}
sum(diff(x[order(x)])*rollmean(y[order(x)],2))
}
I then use it with an apply onto my dataframe : myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))
然后我将它与我的dataframe一起使用:myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))
Hope it can help noobs like me :-)
希望它能帮助像我这样的人:
EDIT : added bounds
编辑:添加范围
#7
-2
You can use ROCR package, where the following lines will give you the AUC:
你可以使用ROCR包,以下几行会给你AUC:
pred <- prediction(classifier.labels, actual.labs)
attributes(performance(pred, 'auc'))$y.values[[1]]
#1
33
The AUC is approximated pretty easily by looking at a lot of trapezium figures, each time bound between x_i
, x_{i+1}
, y{i+1}
and y_i
. Using the rollmean of the zoo package, you can do:
AUC很容易通过查看大量梯形图来近似,每次都在x_i、x_{i+1}、y{i+1}和y_i之间绑定。使用动物园包裹的滚动,你可以做:
library(zoo)
x <- 1:10
y <- 3*x+25
id <- order(x)
AUC <- sum(diff(x[id])*rollmean(y[id],2))
Make sure you order the x values, or your outcome won't make sense. If you have negative values somewhere along the y axis, you'd have to figure out how exactly you want to define the area under the curve, and adjust accordingly (e.g. using abs()
)
确保你订购了x值,否则你的结果就没有意义了。如果你在y轴上的某个地方有负值,你就必须弄清楚你到底想要如何定义曲线下的面积,并相应地进行调整(例如使用abs()))
Regarding your follow-up : if you don't have a formal function, how would you plot it? So if you only have values, the only thing you can approximate is a definite integral. Even if you have the function in R, you can only calculate definite integrals using integrate()
. Plotting the formal function is only possible if you can also define it.
关于你的后续行动:如果你没有一个正式的功能,你会如何规划它?如果只有值,唯一可以近似的就是定积分。即使函数在R中,也只能使用integration()计算定积分。绘制形式函数只有在可以定义它的情况下才能实现。
#2
29
Just add the following to your program and you will get the area under the curve:
只要在程序中添加以下内容,就会得到曲线下的面积:
require(pracma)
AUC = trapz(strike,volatility)
From ?trapz
:
从? trapz:
This approach matches exactly the approximation for integrating the function using the trapezoidal rule with basepoints x.
该方法与利用梯形规则的梯形积分法对函数积分的近似近似。
#3
17
Three more options, including one using a spline method and one using Simpson's rule...
还有三个选项,其中一个使用样条方法,另一个使用辛普森规则……
# get data
n <- 100
mean <- 50
sd <- 50
x <- seq(20, 80, length=n)
y <- dnorm(x, mean, sd) *100
# using sintegral in Bolstad2
require(Bolstad2)
sintegral(x,y)$int
# using auc in MESS
require(MESS)
auc(x,y, type = 'spline')
# using integrate.xy in sfsmisc
require(sfsmisc)
integrate.xy(x,y)
The trapezoidal method is less accurate than the spline method, so MESS::auc
(uses spline method) or Bolstad2::sintegral
(uses Simpson's rule) should probably be preferred. DIY versions of these (and an additional approach using the quadrature rule) are here: http://www.r-bloggers.com/one-dimensional-integrals/
梯形法不像样条法那样准确,所以混乱:auc(使用样条方法)或Bolstad2::sintegral(使用辛普森法则)应该是首选。这些(以及使用正交规则的另一种方法)的DIY版本在这里:http://www.r- bloggers.com/一维-integrals/。
#4
10
OK so I arrive a bit late at the party but going over the answers a plain R
solution to the problem is missing. Here goes, simple and clean:
好吧,我在聚会上迟到了一点,但是在复习答案的时候,这个问题没有一个简单的答案。简单而干净:
sum(diff(x) * (head(y,-1)+tail(y,-1)))/2
The solution for OP then reads as:
OP的解决方案如下:
sum(diff(strike) * (head(volatility,-1)+tail(volatility,-1)))/2
This effectively calculates the area using the trapezoidal method by taking the average of the "left" and "right" y-values.
这有效地计算了面积使用梯形方法通过取“左”和“右”y值的平均值。
NB: as @Joris already pointed out you could use abs(y)
if that would make more sense.
NB:正如@Joris已经指出的那样,如果有意义的话,你可以使用abs(y)。
#5
3
In the pharmacokinetics (PK) world, calculating different types of AUC is a common and fundamental task. The are lots of different AUC calculations for pharmacokietics, such as
在药代动力学(PK)领域,计算不同类型的AUC是一项常见而基本的任务。这是许多不同的AUC计算药物kietics,例如
- AUC0-t = AUC from zero to time t
- au0 -t = AUC从0到t
- AUC0-last = AUC from zero to the last time point (may be same as above)
- AUC0-last = AUC从零到上一个时间点(可能与上面相同)
- AUC0-inf = AUC from zero to time infinity
- AUC0-inf = AUC从0到time infinity
- AUCint = AUC over a time interval
- AUCint = AUC除以时间间隔
- AUCall = AUC over the whole time period for which data exists
- 在数据存在的整个时间段内,AUCall = AUC。
One of the best packages which does these calculations is the relatively new package PKNCA
from the folks at Pfizer. Check it out.
做这些计算的最佳方案之一是来自辉瑞公司的相对较新的PKNCA方案。检查出来。
#6
0
Joris Meys's answer was great but I struggled to remove NAs from my samples. Here is the little function I wrote to deal with them :
乔里斯·米斯的回答很好,但我努力想把NAs从我的样本中移除。这是我写来处理它们的一个小函数:
library(zoo) #for the rollmean function
######
#' Calculate the Area Under Curve of y~x
#'
#'@param y Your y values (measures ?)
#'@param x Your x values (time ?)
#'@param start : The first x value
#'@param stop : The last x value
#'@param na.stop : returns NA if one value is NA
#'@param ex.na.stop : returns NA if the first or the last value is NA
#'
#'@examples
#'myX = 1:5
#'myY = c(17, 25, NA, 35, 56)
#'auc(myY, myX)
#'auc(myY, myX, na.stop=TRUE)
#'myY = c(17, 25, 28, 35, NA)
#'auc(myY, myX, ex.na.stop=FALSE)
auc = function(y, x, start=first(x), stop=last(x), na.stop=FALSE, ex.na.stop=TRUE){
if(all(is.na(y))) return(NA)
bounds = which(x==start):which(x==stop)
x=x[bounds]
y=y[bounds]
r = which(is.na(y))
if(length(r)>0){
if(na.stop==TRUE) return(NA)
if(ex.na.stop==TRUE & (is.na(first(y)) | is.na(last(y)))) return(NA)
if(is.na(last(y))) warning("Last value is NA, so this AUC is bad and you should feel bad", call. = FALSE)
if(is.na(first(y))) warning("First value is NA, so this AUC is bad and you should feel bad", call. = FALSE)
x = x[-r]
y = y[-r]
}
sum(diff(x[order(x)])*rollmean(y[order(x)],2))
}
I then use it with an apply onto my dataframe : myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))
然后我将它与我的dataframe一起使用:myDF$auc = apply(myDF, MARGIN=1, FUN=auc, x=c(0,5,10,15,20))
Hope it can help noobs like me :-)
希望它能帮助像我这样的人:
EDIT : added bounds
编辑:添加范围
#7
-2
You can use ROCR package, where the following lines will give you the AUC:
你可以使用ROCR包,以下几行会给你AUC:
pred <- prediction(classifier.labels, actual.labs)
attributes(performance(pred, 'auc'))$y.values[[1]]