有没有办法在Python3中形成稀疏的n维数组？

I am pretty new to Python and have been wondering if there an easy way so that I could form a sparse n-dimensional array M in Python3 with following 2 conditions mainly required (along the lines of SciPy COO_Matrix):

我是Python的新手,一直想知道是否有一个简单的方法,以便我可以在Python3中形成一个稀疏的n维数组M,主要需要以下两个条件(沿着SciPy COO_Matrix行):

M[dim1,dim2,dim3,...] = 1.0

M [dim1,dim2,dim3,...] = 1.0

Like SciPy COO_Matrix M: M.row, M.col, I may be able to get all the row and column indices for which non-zero entries exist in the matrix. In N-dimension, this generalizes to calling: M.1 for 1st dimension, M.2 for 2nd dimension and so on...

与SciPy COO_Matrix M:M.row,M.col一样,我可以获得矩阵中存在非零条目的所有行和列索引。在N维中,这概括为:第一维M.1,第二维M.2等等......

For 2-dimension (the 2 conditions):

对于二维(2个条件):

 1.
     for u, i in data:
        mat[u, i] = 1.0

 2. def get_triplets(mat):
        return mat.row, mat.col

Can these 2 conditions be generalized in N-dimensions? I searched and came across this:

这两个条件可以在N维中推广吗?我搜查并发现了这个:

sparse 3d matrix/array in Python?

Python中的稀疏3d矩阵/数组?

But here 2nd condition is not satisfied: In other words, I can't get the all the nth dimensional indices in a vectorized format.

但是这里第二个条件不满足:换句话说,我无法以矢量化格式获得所有第n维索引。

Also this: http://www.janeriksolem.net/sparray-sparse-n-dimensional-arrays-in.html works for python and not python3.

另外这个:http://www.janeriksolem.net/sparray-sparse-n-dimensional-arrays-in.html适用于python而不适用于python3。

Is there a way to implement n-dimensional arrays with above mentioned 2 conditions satisfied? Or I am over-complicating things? I appreciate any help with this :)

有没有办法实现满足上述2个条件的n维数组?或者我过于复杂了?我感谢任何帮助:)

1 个解决方案

#1

In the spirit of coo format I could generate a 3d sparse array representation:

在coo格式的精神下,我可以生成一个3d稀疏数组表示:

In [106]: dims = 2,4,6
In [107]: data = np.zeros((10,4),int)
In [108]: data[:,-1] = 1
In [112]: for i in range(3):
     ...:     data[:,i] = np.random.randint(0,dims[i],10)

In [113]: data
Out[113]: 
array([[0, 2, 3, 1],
       [0, 3, 4, 1],
       [0, 0, 1, 1],
       [0, 3, 0, 1],
       [1, 1, 3, 1],
       [1, 0, 2, 1],
       [1, 1, 2, 1],
       [0, 2, 5, 1],
       [0, 1, 5, 1],
       [0, 1, 2, 1]])

Does that meet your requirements? It's possible there are some duplicates. sparse.coo sums duplicates before it converts the array to dense for display, or to csr for calculations.

这符合您的要求吗?有可能存在一些重复。 sparse.coo在将数组转换为密集显示之前对重复项进行求和,或者对csr进行计算。

The corresponding dense array is:

相应的密集数组是:

In [130]: A=np.zeros(dims, int)
In [131]: for row in data:
     ...:     A[tuple(row[:3])] += row[-1]

In [132]: A
Out[132]: 
array([[[0, 1, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 1],
        [0, 0, 0, 1, 0, 1],
        [1, 0, 0, 0, 1, 0]],

       [[0, 0, 1, 0, 0, 0],
        [0, 0, 1, 1, 0, 0],
        [0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]]])

(no duplicates in this case).

(在这种情况下没有重复)。

A 2d sparse matrix using a subset of this data is

使用该数据子集的2d稀疏矩阵是

In [118]: sparse.coo_matrix((data[:,3],(data[:,1],data[:,2])),(4,6)).A
Out[118]: 
array([[0, 1, 1, 0, 0, 0],
       [0, 0, 2, 1, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]])

That's in effect the sum over the first dimension.

这实际上是第一维的总和。

I'm assuming that

我在假设

M[dim1,dim2,dim3,...] = 1.0

means the non-zero elements of the array must have a data value of 1.

表示数组的非零元素必须具有数据值1。

Pandas has a sparse data series and data frame format. That allows for a non-zero 'fill' value. I don't know if the multi-index version can be thought of as higher than 2d or not. There have been a few SO questions about converting the Pandas sparse arrays to/from the scipy sparse.

Pandas具有稀疏数据系列和数据帧格式。这允许非零'填充'值。我不知道多索引版本是否可以被认为高于2d。关于将Pandas稀疏数组转换为scipy稀疏数据的问题有几个问题。

Convert Pandas SparseDataframe to Scipy sparse csc_matrix

将Pandas SparseDataframe转换为Scipy稀疏csc_matrix

http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse

#1