Data in the csv file is of the format ("user_id", "group_id", "group_value"). "group_id" ranges from 0 to 100.
csv文件中的数据具有格式(“user_id”,“group_id”,“group_value”)。 “group_id”的范围是0到100。
For a given user_id, it may be possible that group_value for a particular group_id is not available.
对于给定的user_id,特定group_id的group_value可能不可用。
I want to create a sparse matrix representation of the above data. ("group_id_0", "group_id_1", ... , "group_id_100")
我想创建上述数据的稀疏矩阵表示。 (“group_id_0”,“group_id_1”,...,“group_id_100”)
What is the best way to achieve this in Python?
在Python中实现这一目标的最佳方法是什么?
Edit: Data is too big to iterate over.
编辑:数据太大而无法迭代。
1 个解决方案
#1
0
You could do this with pandas.
你可以用熊猫做到这一点。
Update 08.08.2018:
As noticed by Can Kavaklıoğlu, as_matrix()
is deprecated as of pandas version 0.23.0. Changed to values
.
正如CanKavaklıoğlu所注意到的那样,as_matrix()从pandas版本0.23.0开始被弃用。更改为值。
import pandas as pd
df = pd.read_csv('csv_file.csv', names=['user_id', 'group_id', 'group_value'])
df = df.pivot(index='user_id', columns='group_id', values='group_value')
mat = df.values
#1
0
You could do this with pandas.
你可以用熊猫做到这一点。
Update 08.08.2018:
As noticed by Can Kavaklıoğlu, as_matrix()
is deprecated as of pandas version 0.23.0. Changed to values
.
正如CanKavaklıoğlu所注意到的那样,as_matrix()从pandas版本0.23.0开始被弃用。更改为值。
import pandas as pd
df = pd.read_csv('csv_file.csv', names=['user_id', 'group_id', 'group_value'])
df = df.pivot(index='user_id', columns='group_id', values='group_value')
mat = df.values