Pandas - 制作列dtype对象或因子

In pandas, how can I convert a column of a DataFrame into dtype object? Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()?)

在pandas中，如何将DataFrame的列转换为dtype对象？或者更好的是，成为一个因素？（对于那些说R的人，在Python中，我如何as.factor（）？）

Also, what's the difference between pandas.Factor and pandas.Categorical?

另外，pandas.Factor和pandas.Categorical有什么区别？

3 个解决方案

#1

You can use the astype method to cast a Series (one column):

您可以使用astype方法来转换Series（一列）：

df['col_name'] = df['col_name'].astype(object)

Or the entire DataFrame:

或者整个DataFrame：

df = df.astype(object)

Update

Since version 0.15, you can use the category datatype in a Series/column:

从版本0.15开始，您可以在Series /列中使用category数据类型：

df['col_name'] = df['col_name'].astype('category')

Note: pd.Factor was been deprecated and has been removed in favor of pd.Categorical.

注意：pd.Factor已被弃用，已被删除，转而使用pd.Categorical。

#2

Factor and Categorical are the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array, something like this:

据我所知，因子和分类是相同的。我认为它最初被称为因子，然后改为分类。要转换为Categorical，你可以使用pandas.Categorical.from_array，如下所示：

In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})

In [28]: df
Out[28]: 
   a       b
0  1     yes
1  2      no
2  3     yes
3  4      no
4  5  absent

In [29]: df['c'] = pd.Categorical.from_array(df.b).labels

In [30]: df
Out[30]: 
   a       b  c
0  1     yes  2
1  2      no  1
2  3     yes  2
3  4      no  1
4  5  absent  0

#3

There's also pd.factorize function to use:

还有pd.factorize函数可供使用：

# use the df data from @herrfz

In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]

In [153]: df
Out[153]: 
   a       b  c
0  1     yes  0
1  2      no  1
2  3     yes  0
3  4      no  1
4  5  absent  2

#1

You can use the astype method to cast a Series (one column):

您可以使用astype方法来转换Series（一列）：

df['col_name'] = df['col_name'].astype(object)

Or the entire DataFrame:

或者整个DataFrame：

df = df.astype(object)

Update

Since version 0.15, you can use the category datatype in a Series/column:

从版本0.15开始，您可以在Series /列中使用category数据类型：

df['col_name'] = df['col_name'].astype('category')

Note: pd.Factor was been deprecated and has been removed in favor of pd.Categorical.

注意：pd.Factor已被弃用，已被删除，转而使用pd.Categorical。

#2

据我所知，因子和分类是相同的。我认为它最初被称为因子，然后改为分类。要转换为Categorical，你可以使用pandas.Categorical.from_array，如下所示：

In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})

In [28]: df
Out[28]: 
   a       b
0  1     yes
1  2      no
2  3     yes
3  4      no
4  5  absent

In [29]: df['c'] = pd.Categorical.from_array(df.b).labels

In [30]: df
Out[30]: 
   a       b  c
0  1     yes  2
1  2      no  1
2  3     yes  2
3  4      no  1
4  5  absent  0

#3

There's also pd.factorize function to use:

还有pd.factorize函数可供使用：

# use the df data from @herrfz

In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]

In [153]: df
Out[153]: 
   a       b  c
0  1     yes  0
1  2      no  1
2  3     yes  0
3  4      no  1
4  5  absent  2

秒客网

Pandas - 制作列dtype对象或因子

3 个解决方案

#1

Update

#2

#3

#1

Update

#2

#3

相关文章