I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
我正在处理来自神经影像的数据,由于大量的数据,我想用稀疏矩阵来做我的代码(scipy.稀疏。lil_matrix或csr_matrix)。
In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem. I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).
特别地,我需要计算矩阵的伪逆来解决最小平方问题。我发现方法很稀疏。lsqr,但不是很有效。有一种方法来计算摩尔-彭罗斯的伪逆(对应于普通矩阵的pinv)。
The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.
矩阵A的大小约为600'000x2000,在矩阵的每一行中,我将从0到4个非零值。这个矩阵是由voxel x纤维束(白质纤维束)提供的,我们期望在一个体素中最多有4个区域。在大多数的白色物质中,我们期望至少有1道,但我要说大约20%的线可能是零。
The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.
向量b不应该是稀疏的,实际上b包含了每一个体素的度量值,它一般不为零。
I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0
我需要将误差最小化,但在向量x上也有一些条件,当我在较小的矩阵上尝试这个模型时,我从不需要约束系统来满足这些条件(一般为0)。
Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?
有帮助吗?是否有办法避免取a的伪逆?
Thanks
谢谢
Update 1st June: thanks again for the help. I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.
6月1日更新:再次感谢您的帮助。我无法向您展示我的数据,因为python中的代码给了我一些问题。然而,为了理解我如何选择一个好的k,我尝试在Matlab中创建一个测试函数。
The code is as follow:
代码如下:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short). Also now I can compare the results with the known solution, but how could one choose k in general? I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.
你知道k和F的大小有什么关系吗?我花了250英镑(超过1000英镑),结果并不令人满意(等待的时间是可以接受的,但不是很短)。现在我可以将结果与已知的解决方案进行比较,但是如何选择k ?我还附上了我得到的250个单值的图和它们的平方。我不知道如何更好地在matlab中做一个screeplot。我现在要用更大的k来看看这个值是否会更小。
Thanks again, Jennifer
再次感谢,詹妮弗
2 个解决方案
#1
6
You could study more on the alternatives offered in scipy.sparse.linalg.
你可以更多地研究scipy.sparse.linalg提供的替代方案。
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
无论如何,请注意,一个稀疏矩阵的伪逆很可能是一个(非常)密集的矩阵,所以在求解稀疏线性系统时,它并不是一个有效的途径(通常)。
You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e
). At least specify:
你可能想要详细描述你的特殊问题(点(a, x)= b+ e),至少说明:
- 'typical' size of
A
- “典型”的大小
- 'typical' percentage of nonzero entries in
A
- A中的非零项的典型百分比。
- least-squares implies that
norm(e)
is minimized, but please indicate whether your main interest is onx_hat
or onb_hat
, wheree= b- b_hat
andb_hat= dot(A, x_hat)
- 最小二乘表示norm(e)最小,但请说明你的主要兴趣是在x_hat上还是在b_hat上,其中e= b- b_hat和b_hat= dot(A, x_hat)
Update: If you have some idea of the rank of A
(and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k
is the number of first singular values and vectors to use (i.e. 'effective' rank).
更新:如果您对A的秩有一些概念(并且它比列数要小得多),您可以尝试使用total最小二乘法。这里是一个简单的实现,其中k是使用的第一个奇异值和向量的个数。“有效”排名)。
from scipy.sparse import hstack
from scipy.sparse.linalg import svds
def tls(A, b, k= 6):
"""A tls solution of Ax= b, for sparse A."""
u, s, v= svds(hstack([A, b]), k)
return v[-1, :-1]/ -v[-1, -1]
#2
3
Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.
不管我的评论是什么,我认为你可以很容易地使用Moore-Penrose SVD表示来完成。用scipy.sparse.linalg找到SVD。svds,用它的伪逆代替,然后用V*Sigma_pi*U来找到原矩阵的伪逆。
#1
6
You could study more on the alternatives offered in scipy.sparse.linalg.
你可以更多地研究scipy.sparse.linalg提供的替代方案。
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
无论如何,请注意,一个稀疏矩阵的伪逆很可能是一个(非常)密集的矩阵,所以在求解稀疏线性系统时,它并不是一个有效的途径(通常)。
You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e
). At least specify:
你可能想要详细描述你的特殊问题(点(a, x)= b+ e),至少说明:
- 'typical' size of
A
- “典型”的大小
- 'typical' percentage of nonzero entries in
A
- A中的非零项的典型百分比。
- least-squares implies that
norm(e)
is minimized, but please indicate whether your main interest is onx_hat
or onb_hat
, wheree= b- b_hat
andb_hat= dot(A, x_hat)
- 最小二乘表示norm(e)最小,但请说明你的主要兴趣是在x_hat上还是在b_hat上,其中e= b- b_hat和b_hat= dot(A, x_hat)
Update: If you have some idea of the rank of A
(and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k
is the number of first singular values and vectors to use (i.e. 'effective' rank).
更新:如果您对A的秩有一些概念(并且它比列数要小得多),您可以尝试使用total最小二乘法。这里是一个简单的实现,其中k是使用的第一个奇异值和向量的个数。“有效”排名)。
from scipy.sparse import hstack
from scipy.sparse.linalg import svds
def tls(A, b, k= 6):
"""A tls solution of Ax= b, for sparse A."""
u, s, v= svds(hstack([A, b]), k)
return v[-1, :-1]/ -v[-1, -1]
#2
3
Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.
不管我的评论是什么,我认为你可以很容易地使用Moore-Penrose SVD表示来完成。用scipy.sparse.linalg找到SVD。svds,用它的伪逆代替,然后用V*Sigma_pi*U来找到原矩阵的伪逆。