从另一个DataFrame初始化新的DataFrame时，dtypes发生了变化

Let's say that I have a DataFrame df1 with 2 columns: a with dtype bool and b with dtype int64. When I initialise a new DataFrame (df1_bis) from df1, columns a and b are automatically converted into objects, even if I force the dtype of df1_bis:

假设我有一个包含2列的DataFrame df1：a使用dtype bool，b使用dtype int64。当我从df1初始化一个新的DataFrame（df1_bis）时，即使我强制df1_bis的dtype，列a和b也会自动转换为对象：

In [2]: df1 = pd.DataFrame({"a": [True], 'b': [0]})
Out[3]:
      a  b
0  True  0

In [4]: df1.dtypes
Out[4]:
a     bool
b    int64
dtype: object

In [5]: df1_bis = pd.DataFrame(df1.values, columns=df1.columns,     dtype=df1.dtypes)
Out[6]:
      a  b
0  True  0

In [7]: df1_bis.dtypes
Out[7]:
a    object
b    object
dtype: object

Is there something I'm doing wrong with the dtype argument of DataFrame?

我在使用DataFrame的dtype参数做错了吗？

2 个解决方案

#1

It is numpy that is causing the problem. pandas is inferring the types from the numpy array. If you convert to a list, you won't have the problem.

导致问题的是numpy。 pandas推断numpy数组中的类型。如果转换为列表，则不会出现问题。

df1_bis = pd.DataFrame(df1.values.tolist(),
                       columns=df1.columns)


print(df1_bis)
print
print(df1_bis.dtypes)

      a  b
0  True  0

a     bool
b    int64
dtype: object

#2

For me works:

对我而言：

df1_bis = pd.DataFrame(df1, columns=df1.columns, index=df1.index)
#df1_bis = pd.DataFrame(df1)

print (df1_bis)
      a  b
0  True  0

print (df1_bis.dtypes)
a     bool
b    int64
dtype: object

But I think better is use copy:

但我认为更好的是使用副本：

df1_bis = df1.copy()

If you want use dtype, you need works with Series because parameter dtype in DataFrame is for all columns:

如果要使用dtype，则需要使用Series，因为DataFrame中的参数dtype适用于所有列：

df1_bis = pd.DataFrame({'a':pd.Series(df1.a.values, dtype=df1.a.dtypes),
                        'b':pd.Series(df1.b.values, dtype=df1.b.dtypes)}
                       , index=df1.index)

print (df1_bis)
      a  b
0  True  0

print (df1_bis.dtypes)
a     bool
b    int64
dtype: object

df = pd.DataFrame({"a": [1,5], 'b': [0,4]}, dtype=float)
print (df)
     a    b
0  1.0  0.0
1  5.0  4.0

print (df.dtypes)
a    float64
b    float64
dtype: object

#1