无法更改数据框中的数据类型

I have a data frame df looks like this:

我有一个数据框df看起来像这样:

        birth_year  person
    0       1980         0
    1       1981         1
    2       1982         2
    3       1983         3
    4       1984         4

the birth_year column looks like numbers but when I check the data type df['birth_year'].dtype the result is dtype('O')

birth_year列看起来像数字但是当我检查数据类型df ['birth_year']时.dtype结果是dtype('O')

so I thought it might actually be a string, and tried to convert it to numbers with df['birth_year'].astype('int')but got an error:

所以我认为它可能实际上是一个字符串,并尝试将其转换为数字与df ['birth_year']。astype('int')但出现错误:

    UnicodeEncodeError: 'decimal' codec can't encode characters in position 
    0-3: invalid decimal Unicode string

After a little googling I came to understand (might be wrongly) that there seems to be some invisible characters in it. when accessing the values df['birth_year'][0] the value I got is 1980L, rather than 1980.

经过一番谷歌搜索后,我开始明白(可能是错误的)其中似乎有一些看不见的字符。当访问值df ['birth_year'] [0]时,我得到的值是1980L,而不是1980。

so what exactly is the data type, and how can I convert it to integers? I read somewhere that if the returned data type is dtype('O'), it usually means it's a string, but this doesn't seem to be the case.

那究竟什么是数据类型,以及如何将其转换为整数?我读到某个地方,如果返回的数据类型是dtype('O'),它通常意味着它是一个字符串,但似乎并非如此。

1 个解决方案

#1

You can convert normally using df['birth_year'].astype(int) but it seems you have invalid values, using df = df.convert_objects(convert_numeric=True) will coerce invalid values to NaN which may or may not be what you desire as this changes the dtype to float64 rather than int64.

您可以使用df ['birth_year']正常转换.astype(int)但似乎您的值无效,使用df = df.convert_objects(convert_numeric = True)会将无效值强制转换为NaN,这可能是也可能不是您想要的因为这会将dtype更改为float64而不是int64。

It's best to look at the invalid string values to determine why they failed to convert.

最好查看无效的字符串值以确定它们无法转换的原因。

So you could do df[df.convert_objects(convert_numeric).isnull()] to get the rows that have invalid 'birth_year' values

所以你可以做df [df.convert_objects(convert_numeric).isnull()]来获取具有无效'birth_year'值的行

#1