从csv文件数据创建稀疏矩阵

Data in the csv file is of the format ("user_id", "group_id", "group_value"). "group_id" ranges from 0 to 100.

csv文件中的数据具有格式(“user_id”,“group_id”,“group_value”)。 “group_id”的范围是0到100。

For a given user_id, it may be possible that group_value for a particular group_id is not available.

对于给定的user_id,特定group_id的group_value可能不可用。

I want to create a sparse matrix representation of the above data. ("group_id_0", "group_id_1", ... , "group_id_100")

我想创建上述数据的稀疏矩阵表示。 (“group_id_0”,“group_id_1”,...,“group_id_100”)

What is the best way to achieve this in Python?

在Python中实现这一目标的最佳方法是什么?

Edit: Data is too big to iterate over.

编辑:数据太大而无法迭代。

1 个解决方案

#1

You could do this with pandas.

你可以用熊猫做到这一点。

Update 08.08.2018:

As noticed by Can Kavaklıoğlu, as_matrix() is deprecated as of pandas version 0.23.0. Changed to values.

正如CanKavaklıoğlu所注意到的那样,as_matrix()从pandas版本0.23.0开始被弃用。更改为值。

import pandas as pd

df = pd.read_csv('csv_file.csv', names=['user_id', 'group_id', 'group_value'])
df = df.pivot(index='user_id', columns='group_id', values='group_value')
mat = df.values

#1