Let's say that I have a DataFrame df1 with 2 columns: a
with dtype bool
and b
with dtype int64
. When I initialise a new DataFrame (df1_bis
) from df1
, columns a
and b
are automatically converted into objects, even if I force the dtype of df1_bis
:
假设我有一个包含2列的DataFrame df1:a使用dtype bool,b使用dtype int64。当我从df1初始化一个新的DataFrame(df1_bis)时,即使我强制df1_bis的dtype,列a和b也会自动转换为对象:
In [2]: df1 = pd.DataFrame({"a": [True], 'b': [0]})
Out[3]:
a b
0 True 0
In [4]: df1.dtypes
Out[4]:
a bool
b int64
dtype: object
In [5]: df1_bis = pd.DataFrame(df1.values, columns=df1.columns, dtype=df1.dtypes)
Out[6]:
a b
0 True 0
In [7]: df1_bis.dtypes
Out[7]:
a object
b object
dtype: object
Is there something I'm doing wrong with the dtype
argument of DataFrame?
我在使用DataFrame的dtype参数做错了吗?
2 个解决方案
#1
3
It is numpy
that is causing the problem. pandas
is inferring the types from the numpy array. If you convert to a list, you won't have the problem.
导致问题的是numpy。 pandas推断numpy数组中的类型。如果转换为列表,则不会出现问题。
df1_bis = pd.DataFrame(df1.values.tolist(),
columns=df1.columns)
print(df1_bis)
print
print(df1_bis.dtypes)
a b
0 True 0
a bool
b int64
dtype: object
#2
4
For me works:
对我而言:
df1_bis = pd.DataFrame(df1, columns=df1.columns, index=df1.index)
#df1_bis = pd.DataFrame(df1)
print (df1_bis)
a b
0 True 0
print (df1_bis.dtypes)
a bool
b int64
dtype: object
But I think better is use copy
:
但我认为更好的是使用副本:
df1_bis = df1.copy()
If you want use dtype
, you need works with Series
because parameter dtype
in DataFrame
is for all columns:
如果要使用dtype,则需要使用Series,因为DataFrame中的参数dtype适用于所有列:
df1_bis = pd.DataFrame({'a':pd.Series(df1.a.values, dtype=df1.a.dtypes),
'b':pd.Series(df1.b.values, dtype=df1.b.dtypes)}
, index=df1.index)
print (df1_bis)
a b
0 True 0
print (df1_bis.dtypes)
a bool
b int64
dtype: object
df = pd.DataFrame({"a": [1,5], 'b': [0,4]}, dtype=float)
print (df)
a b
0 1.0 0.0
1 5.0 4.0
print (df.dtypes)
a float64
b float64
dtype: object
#1
3
It is numpy
that is causing the problem. pandas
is inferring the types from the numpy array. If you convert to a list, you won't have the problem.
导致问题的是numpy。 pandas推断numpy数组中的类型。如果转换为列表,则不会出现问题。
df1_bis = pd.DataFrame(df1.values.tolist(),
columns=df1.columns)
print(df1_bis)
print
print(df1_bis.dtypes)
a b
0 True 0
a bool
b int64
dtype: object
#2
4
For me works:
对我而言:
df1_bis = pd.DataFrame(df1, columns=df1.columns, index=df1.index)
#df1_bis = pd.DataFrame(df1)
print (df1_bis)
a b
0 True 0
print (df1_bis.dtypes)
a bool
b int64
dtype: object
But I think better is use copy
:
但我认为更好的是使用副本:
df1_bis = df1.copy()
If you want use dtype
, you need works with Series
because parameter dtype
in DataFrame
is for all columns:
如果要使用dtype,则需要使用Series,因为DataFrame中的参数dtype适用于所有列:
df1_bis = pd.DataFrame({'a':pd.Series(df1.a.values, dtype=df1.a.dtypes),
'b':pd.Series(df1.b.values, dtype=df1.b.dtypes)}
, index=df1.index)
print (df1_bis)
a b
0 True 0
print (df1_bis.dtypes)
a bool
b int64
dtype: object
df = pd.DataFrame({"a": [1,5], 'b': [0,4]}, dtype=float)
print (df)
a b
0 1.0 0.0
1 5.0 4.0
print (df.dtypes)
a float64
b float64
dtype: object