R:蒙特卡罗积分的重要性抽样。

时间:2022-06-02 21:08:30

I have an integral to evaluate

我有一个积分要计算

      "x^(-0.5)" ; x in [0.01,1] 

for which I am using Importance Sampling MC : The theory says that an approximate PDF has to be used to compute the expected value (which will almost surely converge to the mean - value of the integral)

对于这个问题,我使用的是重要性抽样MC:这个理论说必须使用一个近似的PDF来计算期望值(它几乎肯定会收敛到积分的平均值)

After plotting the given integral, and exponential PDF, based only on the plots, I chose the rexp and dexp to generate the PDF - and my code looks like this -

在绘制了给定的积分和指数型PDF之后,我选择了rexp和dexp来生成PDF,而我的代码看起来是这样的。

#Without Importance Sampling
set.seed(1909)
X <- runif(1000,0.01,1)
Y <- X^(-0.5)
c( mean(Y), var(Y) )

#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dexp(x,rate=1.5)
f <- function(x) x^(-0.5)
X= rexp(1000,rate=1.5)
Y=w(X)*f(X)
c( mean(Y), var(Y) )

Could someone please confirm if my line of thought is correct? If wrong, how differently am I supposed to approach this? Please elucidate - I have understood the theory but implementation is proving to be problematic for me.

能不能确认一下我的想法是否正确?如果我错了,我该如何处理这个问题呢?请解释一下——我已经理解了这个理论,但是实现对我来说是有问题的。

For integrals that are not so simple,

对于不那么简单的积分,

1.) f(x) = [1+sinh(2x)ln(x)]^-1 I chose the normal PDF = g(x) (with mean = 0.5 and SD = 5) as approximate only after observing the plot. I wrote a code similar to the one for it , but it says NaN's produced in case of importance sampling. (this ideally means undefined function but I don't know how to solve this).

1)f(x)=(1 + sinh(2 x)ln(x))^ 1我选择正常的PDF = g(x)(意味着= 0.5和SD = 5)近似后观察情节。我为它写了一个类似的代码,但是它说NaN是在重要抽样的情况下生成的。(这理想的方法是未定义的函数,但我不知道如何解决这个问题)。

2.) f(x,y) = exp(-x^4 - y^4)

2)f(x,y)= exp(- x y ^ ^ 4 - 4)

How do I choose the g(x,y) for the above function ?

如何为上述函数选择g(x,y) ?

1 个解决方案

#1


4  

Generally your approach seems to be correct, but you have to be more careful with the domain over which you want to integrate. In your original example, about 20% of values rexp(1000, 1.5) are above 1. The function dexp(x, rate=1.5) is not a density function on the interval [0,1]. You have to divide by pexp(1, rate=1.5). So here is what I would do for the importance sampling example:

一般来说,您的方法似乎是正确的,但是您必须对要集成的域更加小心。在最初的示例中,大约20%的值rexp(1000, 1.5)在1以上。函数dexp(x, rate=1.5)在区间上不是一个密度函数[0,1]。你要除以pexp(1,速率=1。5)这就是我要做的重要性抽样的例子

#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dexp(x,rate=1.5) * pexp(1, rate=1.5)
f <- function(x) x^(-0.5)
X <- rexp(1000,rate=1.5)
X <- X[X<=1]
Y <- w(X)*f(X)
c(mean(Y), var(Y))

In your second example the same thing causes the problem. You get negative X and therefore get NA values for log(X). Furthermore, your normal function should be centered at 0.5 with less variance. Here's my approach:

在第二个例子中,同样的事情导致了问题。得到- X,得到log(X)的NA值。此外,你的正常函数应该以0.5为中心,方差更小。这是我的方法:

#Without Importance Sampling
set.seed(1909)
X <- runif(1000,0.01,1)
Y <- (1+sinh(2*X)*log(X))^(-1)
c(mean(Y), var(Y))

#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dnorm(x, mean=0.5, sd=0.25) * (1-2*pnorm(0, mean=0.5, sd=0.25))
f <- function(x) (1+sinh(2*x)*log(x))^(-1)
X <- rnorm(1000, mean=0.5, sd=0.25)
Y1 <- w(X)
Y2 <- f(X)
Y <- Y1*Y2
Y <- Y[!(is.na(Y2)&Y1==0)]
c(mean(Y), var(Y))

In your second example, I don't really understand what y is. Is it just a constant? Then perhaps a Weibull distribution may work.

在第二个例子中,我不太明白y是什么。它只是一个常数吗?那么,或许威布尔的分配方案能够奏效。

EDIT: Regarding your additional questions in the comments. (1) Any probability density function should integrate to 1. Therefore dexp(x, rate=1.5) is not a density function on the interval [0,1], it only integrates to pexp(1, rate=1.5). However, the function

编辑:关于你在评论中的其他问题。(1)任何概率密度函数都应该对1积分。因此dexp(x, rate=1.5)不是区间[0,1]上的密度函数,它只与pexp(1, rate=1.5)积分。然而,这个函数

dexp01 <- function(x, rate){
  dexp(x, rate=rate)/pexp(1, rate=rate)
}

actually integrates to 1:

实际上集成1:

integrate(dexp, 0, 1, rate=1.5)
integrate(dexp01, 0, 1, rate=1.5)

That's the rationale of including the probability distribution function. If you have a different interval, e.g. [0.3,8], you have to adjust the function accordingly:

这就是包含概率分布函数的基本原理。如果你有不同的间隔,例如[0.3,8],你必须相应地调整函数:

dexp0.3_8 <- function(x, rate){
  dexp(x, rate=rate)/(pexp(8, rate=rate)-pexp(0.3, rate=rate))
}
integrate(dexp0.3_8, 0.3, 8, rate=1.5)

(2) Here I choose the variance so that approximately 95% of the values in rnorm(1000, .5, .25) were in the interval from 0 to 1 (having many values outside this interval would certainly increase the variance). However, I am not certain that this is the best choice of distribution function. The selection of the importance function is a problem that I am not very familiar with. You could ask on CrossValidated. Same goes for your next question.

(2)这里我选择方差,使得rnorm中大约95%的值(1000,。5,。25)在0到1的区间内(在这个区间外有很多值肯定会增加方差)。然而,我不确定这是否是分布函数的最佳选择。重要性函数的选择是我不太熟悉的一个问题。你可以问一下交叉验证。下一个问题也是如此。

#1


4  

Generally your approach seems to be correct, but you have to be more careful with the domain over which you want to integrate. In your original example, about 20% of values rexp(1000, 1.5) are above 1. The function dexp(x, rate=1.5) is not a density function on the interval [0,1]. You have to divide by pexp(1, rate=1.5). So here is what I would do for the importance sampling example:

一般来说,您的方法似乎是正确的,但是您必须对要集成的域更加小心。在最初的示例中,大约20%的值rexp(1000, 1.5)在1以上。函数dexp(x, rate=1.5)在区间上不是一个密度函数[0,1]。你要除以pexp(1,速率=1。5)这就是我要做的重要性抽样的例子

#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dexp(x,rate=1.5) * pexp(1, rate=1.5)
f <- function(x) x^(-0.5)
X <- rexp(1000,rate=1.5)
X <- X[X<=1]
Y <- w(X)*f(X)
c(mean(Y), var(Y))

In your second example the same thing causes the problem. You get negative X and therefore get NA values for log(X). Furthermore, your normal function should be centered at 0.5 with less variance. Here's my approach:

在第二个例子中,同样的事情导致了问题。得到- X,得到log(X)的NA值。此外,你的正常函数应该以0.5为中心,方差更小。这是我的方法:

#Without Importance Sampling
set.seed(1909)
X <- runif(1000,0.01,1)
Y <- (1+sinh(2*X)*log(X))^(-1)
c(mean(Y), var(Y))

#Importance sampling Monte Carlo
w <- function(x) dunif(x, 0.01, 1)/dnorm(x, mean=0.5, sd=0.25) * (1-2*pnorm(0, mean=0.5, sd=0.25))
f <- function(x) (1+sinh(2*x)*log(x))^(-1)
X <- rnorm(1000, mean=0.5, sd=0.25)
Y1 <- w(X)
Y2 <- f(X)
Y <- Y1*Y2
Y <- Y[!(is.na(Y2)&Y1==0)]
c(mean(Y), var(Y))

In your second example, I don't really understand what y is. Is it just a constant? Then perhaps a Weibull distribution may work.

在第二个例子中,我不太明白y是什么。它只是一个常数吗?那么,或许威布尔的分配方案能够奏效。

EDIT: Regarding your additional questions in the comments. (1) Any probability density function should integrate to 1. Therefore dexp(x, rate=1.5) is not a density function on the interval [0,1], it only integrates to pexp(1, rate=1.5). However, the function

编辑:关于你在评论中的其他问题。(1)任何概率密度函数都应该对1积分。因此dexp(x, rate=1.5)不是区间[0,1]上的密度函数,它只与pexp(1, rate=1.5)积分。然而,这个函数

dexp01 <- function(x, rate){
  dexp(x, rate=rate)/pexp(1, rate=rate)
}

actually integrates to 1:

实际上集成1:

integrate(dexp, 0, 1, rate=1.5)
integrate(dexp01, 0, 1, rate=1.5)

That's the rationale of including the probability distribution function. If you have a different interval, e.g. [0.3,8], you have to adjust the function accordingly:

这就是包含概率分布函数的基本原理。如果你有不同的间隔,例如[0.3,8],你必须相应地调整函数:

dexp0.3_8 <- function(x, rate){
  dexp(x, rate=rate)/(pexp(8, rate=rate)-pexp(0.3, rate=rate))
}
integrate(dexp0.3_8, 0.3, 8, rate=1.5)

(2) Here I choose the variance so that approximately 95% of the values in rnorm(1000, .5, .25) were in the interval from 0 to 1 (having many values outside this interval would certainly increase the variance). However, I am not certain that this is the best choice of distribution function. The selection of the importance function is a problem that I am not very familiar with. You could ask on CrossValidated. Same goes for your next question.

(2)这里我选择方差,使得rnorm中大约95%的值(1000,。5,。25)在0到1的区间内(在这个区间外有很多值肯定会增加方差)。然而,我不确定这是否是分布函数的最佳选择。重要性函数的选择是我不太熟悉的一个问题。你可以问一下交叉验证。下一个问题也是如此。