Possible Duplicate:
how to generate pseudo-random positive definite matrix with constraints on the off-diagonal elements?可能重复:如何生成具有非对角线元素约束的伪随机正定矩阵?
The user wants to impose a unique, non-trivial, upper/lower bound on the correlation between every pair of variable in a var/covar matrix.
用户想要对var / covar矩阵中的每对变量之间的相关性施加唯一的,非平凡的上/下界限。
For example: I want a variance matrix in which all variables have 0.9 > |rho(x_i,x_j)| > 0.6
, rho(x_i,x_j)
being the correlation between variables x_i
and x_j
.
例如:我想要一个方差矩阵,其中所有变量都有0.9> | rho(x_i,x_j)| > 0.6,rho(x_i,x_j)是变量x_i和x_j之间的相关性。
Thanks.
3 个解决方案
#1
There are MANY issues here.
这里有很多问题。
First of all, are the pseudo-random deviates assumed to be normally distributed? I'll assume they are, as any discussion of correlation matrices gets nasty if we diverge into non-normal distributions.
首先,伪随机偏差假设是正态分布的吗?我会假设它们是,因为如果我们分歧到非正态分布,任何关于相关矩阵的讨论都会变得讨厌。
Next, it is rather simple to generate pseudo-random normal deviates, given a covariance matrix. Generate standard normal (independent) deviates, and then transform by multiplying by the Cholesky factor of the covariance matrix. Add in the mean at the end if the mean was not zero.
接下来,给定协方差矩阵,生成伪随机正态偏差相当简单。生成标准正态(独立)偏差,然后乘以协方差矩阵的Cholesky因子进行变换。如果平均值不为零,则在末尾添加平均值。
And, a covariance matrix is also rather simple to generate given a correlation matrix. Just pre and post multiply the correlation matrix by a diagonal matrix composed of the standard deviations. This scales a correlation matrix into a covariance matrix.
并且,给定相关矩阵,协方差矩阵也相当简单。只是前后相关矩阵乘以由标准偏差组成的对角矩阵。这将相关矩阵缩放为协方差矩阵。
I'm still not sure where the problem lies in this question, since it would seem easy enough to generate a "random" correlation matrix, with elements uniformly distributed in the desired range.
我仍然不确定这个问题在哪里,因为生成“随机”相关矩阵似乎很容易,元素均匀分布在所需范围内。
So all of the above is rather trivial by any reasonable standards, and there are many tools out there to generate pseudo-random normal deviates given the above information.
因此,根据任何合理的标准,所有上述内容都相当微不足道,并且鉴于上述信息,有许多工具可用于生成伪随机正态偏差。
Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range. You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense. Thus, as the sample size goes to infinity, you should expect to see the specified distribution parameters. But any small sample set will not necessarily have the desired parameters, in the desired ranges.
也许问题是用户坚持认为偏差的随机矩阵必须具有指定范围内的相关性。您必须认识到,一组随机数只能以渐近的方式获得所需的分布参数。因此,当样本大小变为无穷大时,您应该会看到指定的分布参数。但是,任何小样本集都不一定具有所需范围内的所需参数。
For example, (in MATLAB) here is a simple positive definite 3x3 matrix. As such, it makes a very nice covariance matrix.
例如,(在MATLAB中)这里是一个简单的正定3x3矩阵。因此,它构成了一个非常好的协方差矩阵。
S = randn(3);
S = S'*S
S =
0.78863 0.01123 -0.27879
0.01123 4.9316 3.5732
-0.27879 3.5732 2.7872
I'll convert S into a correlation matrix.
我将S转换为相关矩阵。
s = sqrt(diag(S));
C = diag(1./s)*S*diag(1./s)
C =
1 0.0056945 -0.18804
0.0056945 1 0.96377
-0.18804 0.96377 1
Now, I can sample from a normal distribution using the statistics toolbox (mvnrnd should do the trick.) As easy is to use a Cholesky factor.
现在,我可以使用统计工具箱从正态分布中进行采样(mvnrnd应该这样做。)使用Cholesky因子很容易。
L = chol(S)
L =
0.88805 0.012646 -0.31394
0 2.2207 1.6108
0 0 0.30643
Now, generate pseudo-random deviates, then transform them as desired.
现在,生成伪随机偏差,然后根据需要变换它们。
X = randn(20,3)*L;
cov(X)
ans =
0.79069 -0.14297 -0.45032
-0.14297 6.0607 4.5459
-0.45032 4.5459 3.6549
corr(X)
ans =
1 -0.06531 -0.2649
-0.06531 1 0.96587
-0.2649 0.96587 1
If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough.
如果您希望相关性必须始终大于-0.188,则此采样技术失败,因为数字是伪随机的。事实上,除非你的样本量足够大,否则这个目标将很难实现。
You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring.
您可以使用简单的拒绝方案,然后进行采样,然后重复进行重复,直到样本具有所需的属性,并且相关性在所需的范围内。这可能会很累人。
An approach that might work (but one that I've not totally thought out at this point) is to use the standard scheme as above to generate a random sample. Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired. Now, find a zero mean random perturbation to your sampled data that would move the sample covariance matrix in the desired direction.
可能有效的方法(但是我此时并未完全考虑过的方法)是使用上述标准方案生成随机样本。计算相关性。我们未能在适当的范围内,然后确定需要对数据的实际(测量)协方差矩阵进行的扰动,以便相关性符合要求。现在,找到对您的采样数据进行零均值随机扰动,以便将样本协方差矩阵移动到所需方向。
This might work, but unless I knew that this is actually the question at hand, I won't bother to go any more deeply into it. (Edit: I've thought some more about this problem, and it appears to be a quadratic programming problem, with quadratic constraints, to find the smallest perturbation to a matrix X, such that the resulting covariance (or correlation) matrix has the desired properties.)
这可能有用,但除非我知道这实际上是手头的问题,否则我不会再费心去深入了解它。 (编辑:我更多地考虑了这个问题,并且它似乎是一个二次规划问题,带有二次约束,找到矩阵X的最小扰动,这样得到的协方差(或相关)矩阵具有所需的属性。)
#2
This is not a complete answer, but a suggestion of a possible constructive method:
这不是一个完整的答案,而是一个可能的建设性方法的建议:
Looking at the characterizations of the positive definite matrices (http://en.wikipedia.org/wiki/Positive-definite_matrix) I think one of the most affordable approaches could be using the Sylvester criterion.
看一下正定矩阵的特征(http://en.wikipedia.org/wiki/Positive-definite_matrix),我认为最经济的方法之一就是使用西尔维斯特准则。
You can start with a trivial 1x1 random matrix with positive determinant and expand it in one row and column step by step while ensuring that the new matrix has also a positive determinant (how to achieve that is up to you ^_^).
您可以从具有正决定因素的普通1x1随机矩阵开始,并逐步将其扩展为一行和一列,同时确保新矩阵也具有正决定因素(如何实现这取决于您^ _ ^)。
#3
Woodship,
"First of all, are the pseudo-random deviates assumed to be normally distributed?"
“首先,伪随机偏差假设是正态分布的吗?”
yes.
"Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range."
“也许问题是用户坚持认为,由此产生的偏差随机矩阵必须具有指定范围内的相关性。”
Yes, that's the whole difficulty
是的,这就是整个难度
"You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense."
“你必须认识到,一组随机数只能在渐近的意义上得到所需的分布参数。”
True, but this is not the problem here: your strategy works for p=2, but fails for p>2, regardless of sample size.
是的,但这不是问题所在:您的策略适用于p = 2,但无论样本大小如何,p> 2都会失败。
"If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough."
“如果你的愿望是相关性必须总是大于-0.188,那么这种采样技术就失败了,因为数字是伪随机的。事实上,除非你的样本量足够大,否则这个目标将是难以实现的。 “。
It is not a sample size issue b/c with p>2 you do not even observe convergence to the right range for the correlations, as sample size growths: i tried the technique you suggest before posting here, it obviously is flawed.
这不是样本大小问题b / c,p> 2你甚至没有观察到相关性的正确范围的收敛,因为样本大小增长:我在发布之前尝试了你建议的技术,它显然是有缺陷的。
"You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring."
“您可以采用简单的拒绝方案,然后进行采样,然后重复进行重复,直到样本具有所需的属性,并且相关性在所需的范围内。这可能会很累人。”
Not an option, for p large (say larger than 10) this option is intractable.
不是一个选项,对于p大(比如说大于10)这个选项是难以处理的。
"Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired."
“计算相关性。我们未能在适当的范围内,然后确定需要对数据的实际(测量)协方差矩阵进行的扰动,以便相关性符合要求。”
Ditto
As for the QP, i understand the constraints, but i'm not sure about the way you define the objective function; by using the "smallest perturbation" off some initial matrix, you will always end up getting the same (solution) matrix: all the off diagonal entries will be exactly equal to either one of the two bounds (e.g. not pseudo random); plus it is kind of an overkill isn't it ?
至于QP,我理解约束,但我不确定你定义目标函数的方式;通过使用一些初始矩阵的“最小扰动”,您将总是得到相同的(解)矩阵:所有非对角线条目将完全等于两个边界中的任何一个(例如,不是伪随机);加上它有点矫枉过正不是吗?
Come on people, there must be something simpler
来吧,人们必须有更简单的事情
#1
There are MANY issues here.
这里有很多问题。
First of all, are the pseudo-random deviates assumed to be normally distributed? I'll assume they are, as any discussion of correlation matrices gets nasty if we diverge into non-normal distributions.
首先,伪随机偏差假设是正态分布的吗?我会假设它们是,因为如果我们分歧到非正态分布,任何关于相关矩阵的讨论都会变得讨厌。
Next, it is rather simple to generate pseudo-random normal deviates, given a covariance matrix. Generate standard normal (independent) deviates, and then transform by multiplying by the Cholesky factor of the covariance matrix. Add in the mean at the end if the mean was not zero.
接下来,给定协方差矩阵,生成伪随机正态偏差相当简单。生成标准正态(独立)偏差,然后乘以协方差矩阵的Cholesky因子进行变换。如果平均值不为零,则在末尾添加平均值。
And, a covariance matrix is also rather simple to generate given a correlation matrix. Just pre and post multiply the correlation matrix by a diagonal matrix composed of the standard deviations. This scales a correlation matrix into a covariance matrix.
并且,给定相关矩阵,协方差矩阵也相当简单。只是前后相关矩阵乘以由标准偏差组成的对角矩阵。这将相关矩阵缩放为协方差矩阵。
I'm still not sure where the problem lies in this question, since it would seem easy enough to generate a "random" correlation matrix, with elements uniformly distributed in the desired range.
我仍然不确定这个问题在哪里,因为生成“随机”相关矩阵似乎很容易,元素均匀分布在所需范围内。
So all of the above is rather trivial by any reasonable standards, and there are many tools out there to generate pseudo-random normal deviates given the above information.
因此,根据任何合理的标准,所有上述内容都相当微不足道,并且鉴于上述信息,有许多工具可用于生成伪随机正态偏差。
Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range. You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense. Thus, as the sample size goes to infinity, you should expect to see the specified distribution parameters. But any small sample set will not necessarily have the desired parameters, in the desired ranges.
也许问题是用户坚持认为偏差的随机矩阵必须具有指定范围内的相关性。您必须认识到,一组随机数只能以渐近的方式获得所需的分布参数。因此,当样本大小变为无穷大时,您应该会看到指定的分布参数。但是,任何小样本集都不一定具有所需范围内的所需参数。
For example, (in MATLAB) here is a simple positive definite 3x3 matrix. As such, it makes a very nice covariance matrix.
例如,(在MATLAB中)这里是一个简单的正定3x3矩阵。因此,它构成了一个非常好的协方差矩阵。
S = randn(3);
S = S'*S
S =
0.78863 0.01123 -0.27879
0.01123 4.9316 3.5732
-0.27879 3.5732 2.7872
I'll convert S into a correlation matrix.
我将S转换为相关矩阵。
s = sqrt(diag(S));
C = diag(1./s)*S*diag(1./s)
C =
1 0.0056945 -0.18804
0.0056945 1 0.96377
-0.18804 0.96377 1
Now, I can sample from a normal distribution using the statistics toolbox (mvnrnd should do the trick.) As easy is to use a Cholesky factor.
现在,我可以使用统计工具箱从正态分布中进行采样(mvnrnd应该这样做。)使用Cholesky因子很容易。
L = chol(S)
L =
0.88805 0.012646 -0.31394
0 2.2207 1.6108
0 0 0.30643
Now, generate pseudo-random deviates, then transform them as desired.
现在,生成伪随机偏差,然后根据需要变换它们。
X = randn(20,3)*L;
cov(X)
ans =
0.79069 -0.14297 -0.45032
-0.14297 6.0607 4.5459
-0.45032 4.5459 3.6549
corr(X)
ans =
1 -0.06531 -0.2649
-0.06531 1 0.96587
-0.2649 0.96587 1
If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough.
如果您希望相关性必须始终大于-0.188,则此采样技术失败,因为数字是伪随机的。事实上,除非你的样本量足够大,否则这个目标将很难实现。
You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring.
您可以使用简单的拒绝方案,然后进行采样,然后重复进行重复,直到样本具有所需的属性,并且相关性在所需的范围内。这可能会很累人。
An approach that might work (but one that I've not totally thought out at this point) is to use the standard scheme as above to generate a random sample. Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired. Now, find a zero mean random perturbation to your sampled data that would move the sample covariance matrix in the desired direction.
可能有效的方法(但是我此时并未完全考虑过的方法)是使用上述标准方案生成随机样本。计算相关性。我们未能在适当的范围内,然后确定需要对数据的实际(测量)协方差矩阵进行的扰动,以便相关性符合要求。现在,找到对您的采样数据进行零均值随机扰动,以便将样本协方差矩阵移动到所需方向。
This might work, but unless I knew that this is actually the question at hand, I won't bother to go any more deeply into it. (Edit: I've thought some more about this problem, and it appears to be a quadratic programming problem, with quadratic constraints, to find the smallest perturbation to a matrix X, such that the resulting covariance (or correlation) matrix has the desired properties.)
这可能有用,但除非我知道这实际上是手头的问题,否则我不会再费心去深入了解它。 (编辑:我更多地考虑了这个问题,并且它似乎是一个二次规划问题,带有二次约束,找到矩阵X的最小扰动,这样得到的协方差(或相关)矩阵具有所需的属性。)
#2
This is not a complete answer, but a suggestion of a possible constructive method:
这不是一个完整的答案,而是一个可能的建设性方法的建议:
Looking at the characterizations of the positive definite matrices (http://en.wikipedia.org/wiki/Positive-definite_matrix) I think one of the most affordable approaches could be using the Sylvester criterion.
看一下正定矩阵的特征(http://en.wikipedia.org/wiki/Positive-definite_matrix),我认为最经济的方法之一就是使用西尔维斯特准则。
You can start with a trivial 1x1 random matrix with positive determinant and expand it in one row and column step by step while ensuring that the new matrix has also a positive determinant (how to achieve that is up to you ^_^).
您可以从具有正决定因素的普通1x1随机矩阵开始,并逐步将其扩展为一行和一列,同时确保新矩阵也具有正决定因素(如何实现这取决于您^ _ ^)。
#3
Woodship,
"First of all, are the pseudo-random deviates assumed to be normally distributed?"
“首先,伪随机偏差假设是正态分布的吗?”
yes.
"Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range."
“也许问题是用户坚持认为,由此产生的偏差随机矩阵必须具有指定范围内的相关性。”
Yes, that's the whole difficulty
是的,这就是整个难度
"You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense."
“你必须认识到,一组随机数只能在渐近的意义上得到所需的分布参数。”
True, but this is not the problem here: your strategy works for p=2, but fails for p>2, regardless of sample size.
是的,但这不是问题所在:您的策略适用于p = 2,但无论样本大小如何,p> 2都会失败。
"If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough."
“如果你的愿望是相关性必须总是大于-0.188,那么这种采样技术就失败了,因为数字是伪随机的。事实上,除非你的样本量足够大,否则这个目标将是难以实现的。 “。
It is not a sample size issue b/c with p>2 you do not even observe convergence to the right range for the correlations, as sample size growths: i tried the technique you suggest before posting here, it obviously is flawed.
这不是样本大小问题b / c,p> 2你甚至没有观察到相关性的正确范围的收敛,因为样本大小增长:我在发布之前尝试了你建议的技术,它显然是有缺陷的。
"You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring."
“您可以采用简单的拒绝方案,然后进行采样,然后重复进行重复,直到样本具有所需的属性,并且相关性在所需的范围内。这可能会很累人。”
Not an option, for p large (say larger than 10) this option is intractable.
不是一个选项,对于p大(比如说大于10)这个选项是难以处理的。
"Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired."
“计算相关性。我们未能在适当的范围内,然后确定需要对数据的实际(测量)协方差矩阵进行的扰动,以便相关性符合要求。”
Ditto
As for the QP, i understand the constraints, but i'm not sure about the way you define the objective function; by using the "smallest perturbation" off some initial matrix, you will always end up getting the same (solution) matrix: all the off diagonal entries will be exactly equal to either one of the two bounds (e.g. not pseudo random); plus it is kind of an overkill isn't it ?
至于QP,我理解约束,但我不确定你定义目标函数的方式;通过使用一些初始矩阵的“最小扰动”,您将总是得到相同的(解)矩阵:所有非对角线条目将完全等于两个边界中的任何一个(例如,不是伪随机);加上它有点矫枉过正不是吗?
Come on people, there must be something simpler
来吧,人们必须有更简单的事情