从pandas DataFrame中删除非数字列

时间:2021-07-21 22:54:49

In my application I load text files that are structured as follows:

在我的应用程序中,我加载了如下结构的文本文件:

  • First non numeric column (ID)
  • 第一个非数字列(ID)
  • A number of non-numeric columns (strings)
  • 许多非数字列(字符串)
  • A number of numeric columns (floats)
  • 许多数字列(浮点数)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

非数字列的数量是可变的。目前我将数据加载到DataFrame中,如下所示:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

我想一下子丢弃所有非数字列,而不知道他们的名字或索引,因为这可以读取他们的dtype。这可能与熊猫有关,还是我必须自己做点什么?

3 个解决方案

#1


28  

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

要避免使用私有方法,您还可以使用select_dtypes,您可以在其中包含或排除所需的dtypes。

Ran into it on this post on the exact same thing.

在这篇文章中就完全相同的事情进入它。

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

或者在你的情况下,特别是:source.select_dtypes(['number'])或source.select_dtypes([np.number]

#2


31  

It`s a private method, but it will do the trick: source._get_numeric_data()

它是一个私有方法,但它可以解决这个问题:source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

#3


0  

I'va also another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our dataframe

我还有另一种可能的解决方案,用于删除具有2行代码的分类值的列,定义具有分类值列(第1行)的列表,并使用第2行删除它们。 df是我们的数据帧

df before dropping: 从pandas DataFrame中删除非数字列

df在放弃之前:

  list=pd.DataFrame(df.categorical).columns
  df= df.drop(list,axis=1)

df after dropping: 从pandas DataFrame中删除非数字列

df放弃后:

#1


28  

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

要避免使用私有方法,您还可以使用select_dtypes,您可以在其中包含或排除所需的dtypes。

Ran into it on this post on the exact same thing.

在这篇文章中就完全相同的事情进入它。

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

或者在你的情况下,特别是:source.select_dtypes(['number'])或source.select_dtypes([np.number]

#2


31  

It`s a private method, but it will do the trick: source._get_numeric_data()

它是一个私有方法,但它可以解决这个问题:source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

#3


0  

I'va also another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our dataframe

我还有另一种可能的解决方案,用于删除具有2行代码的分类值的列,定义具有分类值列(第1行)的列表,并使用第2行删除它们。 df是我们的数据帧

df before dropping: 从pandas DataFrame中删除非数字列

df在放弃之前:

  list=pd.DataFrame(df.categorical).columns
  df= df.drop(list,axis=1)

df after dropping: 从pandas DataFrame中删除非数字列

df放弃后: