In pandas, how can I convert a column of a DataFrame into dtype object? Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()
?)
在pandas中,如何将DataFrame的列转换为dtype对象?或者更好的是,成为一个因素? (对于那些说R的人,在Python中,我如何as.factor()?)
Also, what's the difference between pandas.Factor
and pandas.Categorical
?
另外,pandas.Factor和pandas.Categorical有什么区别?
3 个解决方案
#1
48
You can use the astype
method to cast a Series (one column):
您可以使用astype方法来转换Series(一列):
df['col_name'] = df['col_name'].astype(object)
Or the entire DataFrame:
或者整个DataFrame:
df = df.astype(object)
Update
Since version 0.15, you can use the category datatype in a Series/column:
从版本0.15开始,您可以在Series /列中使用category数据类型:
df['col_name'] = df['col_name'].astype('category')
Note: pd.Factor
was been deprecated and has been removed in favor of pd.Categorical
.
注意:pd.Factor已被弃用,已被删除,转而使用pd.Categorical。
#2
12
Factor
and Categorical
are the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array
, something like this:
据我所知,因子和分类是相同的。我认为它最初被称为因子,然后改为分类。要转换为Categorical,你可以使用pandas.Categorical.from_array,如下所示:
In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})
In [28]: df
Out[28]:
a b
0 1 yes
1 2 no
2 3 yes
3 4 no
4 5 absent
In [29]: df['c'] = pd.Categorical.from_array(df.b).labels
In [30]: df
Out[30]:
a b c
0 1 yes 2
1 2 no 1
2 3 yes 2
3 4 no 1
4 5 absent 0
#3
8
There's also pd.factorize function to use:
还有pd.factorize函数可供使用:
# use the df data from @herrfz
In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]
In [153]: df
Out[153]:
a b c
0 1 yes 0
1 2 no 1
2 3 yes 0
3 4 no 1
4 5 absent 2
#1
48
You can use the astype
method to cast a Series (one column):
您可以使用astype方法来转换Series(一列):
df['col_name'] = df['col_name'].astype(object)
Or the entire DataFrame:
或者整个DataFrame:
df = df.astype(object)
Update
Since version 0.15, you can use the category datatype in a Series/column:
从版本0.15开始,您可以在Series /列中使用category数据类型:
df['col_name'] = df['col_name'].astype('category')
Note: pd.Factor
was been deprecated and has been removed in favor of pd.Categorical
.
注意:pd.Factor已被弃用,已被删除,转而使用pd.Categorical。
#2
12
Factor
and Categorical
are the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array
, something like this:
据我所知,因子和分类是相同的。我认为它最初被称为因子,然后改为分类。要转换为Categorical,你可以使用pandas.Categorical.from_array,如下所示:
In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})
In [28]: df
Out[28]:
a b
0 1 yes
1 2 no
2 3 yes
3 4 no
4 5 absent
In [29]: df['c'] = pd.Categorical.from_array(df.b).labels
In [30]: df
Out[30]:
a b c
0 1 yes 2
1 2 no 1
2 3 yes 2
3 4 no 1
4 5 absent 0
#3
8
There's also pd.factorize function to use:
还有pd.factorize函数可供使用:
# use the df data from @herrfz
In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]
In [153]: df
Out[153]:
a b c
0 1 yes 0
1 2 no 1
2 3 yes 0
3 4 no 1
4 5 absent 2