I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()
s like this:
我注意到熊猫现在支持稀疏矩阵和数组。目前,我创建DataFrame()如下:
return DataFrame(matrix.toarray(), columns=features, index=observations)
Is there a way to create a SparseDataFrame()
with a scipy.sparse.csc_matrix()
or csr_matrix()
? Converting to dense format kills RAM badly. Thanks!
是否有一种方法可以使用scipy.sparse.csc_matrix()或csr_matrix()创建SparseDataFrame() ?转换成密集格式严重地破坏了RAM。谢谢!
3 个解决方案
#1
25
A direct conversion is not supported ATM. Contributions are welcome!
不支持直接转换ATM。贡献是受欢迎的!
Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient
试试这个,内存上应该没问题,因为备件很像一个csc_matrix(针对一列),而且空间效率很高
In [37]: col = np.array([0,0,1,2,2,2])
In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')
In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )
In [40]: m
Out[40]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Column format>
In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
Out[46]:
0 1 2
0 1 0 4
1 0 0 5
2 2 3 6
In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
In [48]: type(df)
Out[48]: pandas.sparse.frame.SparseDataFrame
#2
11
As of pandas v 0.20.0 you can use the SparseDataFrame
constructor.
对于panda v 0.20.0,您可以使用SparseDataFrame构造函数。
An example from the pandas docs:
熊猫医生的一个例子:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
arr = np.random.random(size=(1000, 5))
arr[arr < .9] = 0
sp_arr = csr_matrix(arr)
sdf = pd.SparseDataFrame(sp_arr)
#3
-10
A much shorter version:
更短的版本:
df = pd.DataFrame(m.toarray())
#1
25
A direct conversion is not supported ATM. Contributions are welcome!
不支持直接转换ATM。贡献是受欢迎的!
Try this, should be ok on memory as the SpareSeries is much like a csc_matrix (for 1 column) and pretty space efficient
试试这个,内存上应该没问题,因为备件很像一个csc_matrix(针对一列),而且空间效率很高
In [37]: col = np.array([0,0,1,2,2,2])
In [38]: data = np.array([1,2,3,4,5,6],dtype='float64')
In [39]: m = csc_matrix( (data,(row,col)), shape=(3,3) )
In [40]: m
Out[40]:
<3x3 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Column format>
In [46]: pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
Out[46]:
0 1 2
0 1 0 4
1 0 0 5
2 2 3 6
In [47]: df = pd.SparseDataFrame([ pd.SparseSeries(m[i].toarray().ravel())
for i in np.arange(m.shape[0]) ])
In [48]: type(df)
Out[48]: pandas.sparse.frame.SparseDataFrame
#2
11
As of pandas v 0.20.0 you can use the SparseDataFrame
constructor.
对于panda v 0.20.0,您可以使用SparseDataFrame构造函数。
An example from the pandas docs:
熊猫医生的一个例子:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
arr = np.random.random(size=(1000, 5))
arr[arr < .9] = 0
sp_arr = csr_matrix(arr)
sdf = pd.SparseDataFrame(sp_arr)
#3
-10
A much shorter version:
更短的版本:
df = pd.DataFrame(m.toarray())