如何将最后几列从pandas中的字符串类型转换为整数

时间:2021-02-12 16:32:24

I have a df called df.

我有一个名为df的df。

I want to convert the last 10 columns of this dataframe from string type to integers. How can I do this the pythonic way?

我想将此数据帧的最后10列从字符串类型转换为整数。我怎么能用pythonic方式做到这一点?

1 个解决方案

#1


2  

I think the fastest method is to use convert_objects and select the last 10 columns using subscript/slicing notation, example:

我认为最快的方法是使用convert_objects并使用下标/切片表示法选择最后10列,例如:

In [23]:

df = pd.DataFrame({'a':['1','2','3','4','5']})

df = pd.concat([df]*11, axis=1)
df.columns = list('abcdefghijk')
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null object
c    5 non-null object
d    5 non-null object
e    5 non-null object
f    5 non-null object
g    5 non-null object
h    5 non-null object
i    5 non-null object
j    5 non-null object
k    5 non-null object
dtypes: object(11)
memory usage: 480.0+ bytes
In [21]:

converted = df[df.columns[-10:]].convert_objects(convert_numeric=True)
converted
Out[21]:
   b  c  d  e  f  g  h  i  j  k
0  1  1  1  1  1  1  1  1  1  1
1  2  2  2  2  2  2  2  2  2  2
2  3  3  3  3  3  3  3  3  3  3
3  4  4  4  4  4  4  4  4  4  4
4  5  5  5  5  5  5  5  5  5  5
In [22]:

converted.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 10 columns):
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10)
memory usage: 440.0 bytes

You can then either directly assign the result back:

然后,您可以直接将结果分配回来:

In [31]:

df[df.columns[-10:]] = converted
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes

or do it in a 1 liner:

或者用1个衬里做:

In [33]:

df[df.columns[-10:]] = df[df.columns[-10:]].convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes

#1


2  

I think the fastest method is to use convert_objects and select the last 10 columns using subscript/slicing notation, example:

我认为最快的方法是使用convert_objects并使用下标/切片表示法选择最后10列,例如:

In [23]:

df = pd.DataFrame({'a':['1','2','3','4','5']})

df = pd.concat([df]*11, axis=1)
df.columns = list('abcdefghijk')
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null object
c    5 non-null object
d    5 non-null object
e    5 non-null object
f    5 non-null object
g    5 non-null object
h    5 non-null object
i    5 non-null object
j    5 non-null object
k    5 non-null object
dtypes: object(11)
memory usage: 480.0+ bytes
In [21]:

converted = df[df.columns[-10:]].convert_objects(convert_numeric=True)
converted
Out[21]:
   b  c  d  e  f  g  h  i  j  k
0  1  1  1  1  1  1  1  1  1  1
1  2  2  2  2  2  2  2  2  2  2
2  3  3  3  3  3  3  3  3  3  3
3  4  4  4  4  4  4  4  4  4  4
4  5  5  5  5  5  5  5  5  5  5
In [22]:

converted.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 10 columns):
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10)
memory usage: 440.0 bytes

You can then either directly assign the result back:

然后,您可以直接将结果分配回来:

In [31]:

df[df.columns[-10:]] = converted
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes

or do it in a 1 liner:

或者用1个衬里做:

In [33]:

df[df.columns[-10:]] = df[df.columns[-10:]].convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a    5 non-null object
b    5 non-null int64
c    5 non-null int64
d    5 non-null int64
e    5 non-null int64
f    5 non-null int64
g    5 non-null int64
h    5 non-null int64
i    5 non-null int64
j    5 non-null int64
k    5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes