I'm working with the following df:
我正在使用以下df:
c.sort_values('2005', ascending=False).head(3)
GeoName ComponentName IndustryId IndustryClassification Description 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
37926 Alabama Real GDP by state 9 213 Support activities for mining 99 98 117 117 115 87 96 95 103 102 (NA)
37951 Alabama Real GDP by state 34 42 Wholesale trade 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199
37932 Alabama Real GDP by state 15 327 Nonmetallic mineral products manufacturing 980 968 940 1084 861 724 714 701 589 641 (NA)
I want to force numeric on all of the years:
我想在所有年份强制数字:
c['2014'] = pd.to_numeric(c['2014'], errors='coerce')
is there an easy way to do this or do I have to type them all out?
是否有一种简单的方法可以做到这一点,还是我必须全部输入?
4 个解决方案
#1
28
UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:
更新:之后您无需转换您的值,您可以在阅读CSV时即时执行此操作:
In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)
In [166]: df.dtypes
Out[166]:
GeoName object
ComponentName object
IndustryId int64
IndustryClassification object
Description object
2004 int64
2005 int64
2006 int64
2007 int64
2008 int64
2009 int64
2010 int64
2011 int64
2012 int64
2013 int64
2014 float64
dtype: object
If you need to convert multiple columns to numeric dtypes - use the following technique:
如果需要将多个列转换为数字dtypes - 请使用以下技术:
Sample source DF:
样本来源DF:
In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2
In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object
Converting selected columns to numeric dtypes:
将所选列转换为数字dtypes:
In [273]: cols = df.columns.drop('id')
In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0
In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object
PS if you want to select all string
(object
) columns use the following simple trick:
PS如果要选择所有字符串(对象)列,请使用以下简单技巧:
cols = df.columns[df.dtypes.eq('object')]
#2
21
another way is using apply
, one liner:
另一种方法是使用apply,one liner:
cols = ['col1', 'col2', 'col3']
data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)
#3
8
You can use:
您可以使用:
print df.columns[5:]
Index([u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011',
u'2012', u'2013', u'2014'],
dtype='object')
for col in df.columns[5:]:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification \
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007 \
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
Another solution with filter
:
过滤器的另一个解决方
print df.filter(like='20')
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
37926 99 98 117 117 115 87 96 95 103 102 (NA)
37951 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199
37932 980 968 940 1084 861 724 714 701 589 641 (NA)
for col in df.filter(like='20').columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification \
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007 \
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
#4
-1
If you are looking for a range of columns, you can try this:
如果您正在寻找一系列列,您可以尝试这样做:
df.iloc[7:] = df.iloc[7:].astype(float)
The examples above will convert type to be float, for all the columns begin with the 7th to the end. You of course can use different type or different range.
上面的示例将type转换为float,因为所有列都以7开头到结尾。你当然可以使用不同的类型或不同的范围。
I think this is useful when you have a big range of columns to convert and a lot of rows. It doesn't make you go over each row by yourself - I believe numpy do it more efficiently.
我认为当你有大量的列要转换和很多行时,这很有用。它不会让你自己超越每一行 - 我相信numpy会更有效地做到这一点。
This is useful only if you know that all the required columns contain numbers only - it will not change "bad values" (like string) to be NaN for you.
仅当您知道所有必需列仅包含数字时才有用 - 它不会将“错误值”(如字符串)更改为NaN。
#1
28
UPDATE: you don't need to convert your values afterwards, you can do it on-the-fly when reading your CSV:
更新:之后您无需转换您的值,您可以在阅读CSV时即时执行此操作:
In [165]: df=pd.read_csv(url, index_col=0, na_values=['(NA)']).fillna(0)
In [166]: df.dtypes
Out[166]:
GeoName object
ComponentName object
IndustryId int64
IndustryClassification object
Description object
2004 int64
2005 int64
2006 int64
2007 int64
2008 int64
2009 int64
2010 int64
2011 int64
2012 int64
2013 int64
2014 float64
dtype: object
If you need to convert multiple columns to numeric dtypes - use the following technique:
如果需要将多个列转换为数字dtypes - 请使用以下技术:
Sample source DF:
样本来源DF:
In [271]: df
Out[271]:
id a b c d e f
0 id_3 AAA 6 3 5 8 1
1 id_9 3 7 5 7 3 BBB
2 id_7 4 2 3 5 4 2
3 id_0 7 3 5 7 9 4
4 id_0 2 4 6 4 0 2
In [272]: df.dtypes
Out[272]:
id object
a object
b int64
c int64
d int64
e int64
f object
dtype: object
Converting selected columns to numeric dtypes:
将所选列转换为数字dtypes:
In [273]: cols = df.columns.drop('id')
In [274]: df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
In [275]: df
Out[275]:
id a b c d e f
0 id_3 NaN 6 3 5 8 1.0
1 id_9 3.0 7 5 7 3 NaN
2 id_7 4.0 2 3 5 4 2.0
3 id_0 7.0 3 5 7 9 4.0
4 id_0 2.0 4 6 4 0 2.0
In [276]: df.dtypes
Out[276]:
id object
a float64
b int64
c int64
d int64
e int64
f float64
dtype: object
PS if you want to select all string
(object
) columns use the following simple trick:
PS如果要选择所有字符串(对象)列,请使用以下简单技巧:
cols = df.columns[df.dtypes.eq('object')]
#2
21
another way is using apply
, one liner:
另一种方法是使用apply,one liner:
cols = ['col1', 'col2', 'col3']
data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)
#3
8
You can use:
您可以使用:
print df.columns[5:]
Index([u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011',
u'2012', u'2013', u'2014'],
dtype='object')
for col in df.columns[5:]:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification \
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007 \
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
Another solution with filter
:
过滤器的另一个解决方
print df.filter(like='20')
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
37926 99 98 117 117 115 87 96 95 103 102 (NA)
37951 9898 10613 10952 11034 11075 9722 9765 9703 9600 9884 10199
37932 980 968 940 1084 861 724 714 701 589 641 (NA)
for col in df.filter(like='20').columns:
df[col] = pd.to_numeric(df[col], errors='coerce')
print df
GeoName ComponentName IndustryId IndustryClassification \
37926 Alabama Real GDP by state 9 213
37951 Alabama Real GDP by state 34 42
37932 Alabama Real GDP by state 15 327
Description 2004 2005 2006 2007 \
37926 Support activities for mining 99 98 117 117
37951 Wholesale trade 9898 10613 10952 11034
37932 Nonmetallic mineral products manufacturing 980 968 940 1084
2008 2009 2010 2011 2012 2013 2014
37926 115 87 96 95 103 102 NaN
37951 11075 9722 9765 9703 9600 9884 10199.0
37932 861 724 714 701 589 641 NaN
#4
-1
If you are looking for a range of columns, you can try this:
如果您正在寻找一系列列,您可以尝试这样做:
df.iloc[7:] = df.iloc[7:].astype(float)
The examples above will convert type to be float, for all the columns begin with the 7th to the end. You of course can use different type or different range.
上面的示例将type转换为float,因为所有列都以7开头到结尾。你当然可以使用不同的类型或不同的范围。
I think this is useful when you have a big range of columns to convert and a lot of rows. It doesn't make you go over each row by yourself - I believe numpy do it more efficiently.
我认为当你有大量的列要转换和很多行时,这很有用。它不会让你自己超越每一行 - 我相信numpy会更有效地做到这一点。
This is useful only if you know that all the required columns contain numbers only - it will not change "bad values" (like string) to be NaN for you.
仅当您知道所有必需列仅包含数字时才有用 - 它不会将“错误值”(如字符串)更改为NaN。