获取熊猫DataFrame列标题的列表。

时间:2021-01-02 08:01:28

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

我想从熊猫数据存储器中获取列标题的列表。DataFrame将来自用户输入,因此我不知道会有多少列,也不知道会调用什么列。

For example, if I'm given a DataFrame like this:

例如,如果给我一个这样的DataFrame:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

我想要一个这样的清单:

>>> header_list
[y, gdp, cap]

14 个解决方案

#1


905  

You can get the values as a list by doing:

您可以通过以下操作获取列表中的值:

list(my_dataframe.columns.values)

Also you can simply use:

你也可以简单地使用:

list(my_dataframe)

#2


261  

There is a built in method which is the most performant:

有一种内建的方法是最有效的:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function to return a list.

.columns返回一个索引。值返回一个数组,这个数组有一个帮助函数来返回一个列表。

EDIT

编辑

For those who hate typing this is probably the shortest method:

对于那些讨厌打字的人来说,这可能是最短的方法:

list(df)

#3


61  

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

做了一些快速的测试,也许并不奇怪,使用dataframe.columns.values.tolist()的内置版本是最快的:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)

(我还是很喜欢这个列表(dataframe),谢谢EdChum!)

#4


35  

Its gets even simpler (by pandas 0.16.0) :

它变得更简单(熊猫0.16.0):

df.columns.tolist()

will give you the column names in a nice list.

会给出一个很好的列表中的列名。

#5


27  

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

若要在调试器模式下列出dataframe的列,请使用列表理解:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

顺便说一下,你可以通过排序得到一个排序后的列表:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

#6


17  

That's available as my_dataframe.columns.

这是作为my_dataframe.columns可用。

#7


13  

It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

这很有趣,但是df.columns.value。tolist()的速度几乎是df.columns.tolist()的3倍,但我认为它们是相同的:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

#8


9  

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

对于IPython笔记本中的数据探索,我最喜欢的方式是:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

这将产生一个易于阅读的字母排序的列表。

In a code repository

In code I find it more explicit to do

在代码中,我发现这样做更加明确

df.columns

Because it tells others reading your code what you are doing.

因为它告诉其他人正在阅读你的代码。

#9


9  

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

DataFrame遵循遍历对象的“键”的类似命令的约定。

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and pythonic way

创建一个键/列的列表——对象方法to_list()和python方法

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels

DataFrame上的基本迭代返回列标签

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

不要将DataFrame转换为列表,只需要获取列标签。在寻找方便的代码示例时不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

#10


2  

I feel question deserves additional explanation.

我觉得这个问题值得进一步解释。

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

正如@fixxxer所指出的,答案取决于您在项目中使用的熊猫版本。你可以通过pd得到。__version__命令。

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

如果你因为某些原因像我一样(在debian jessie I使用了0.14.1)使用较老版本的熊猫而不是0.16.0,那么你需要使用:

df.keys().tolist() because there is no df.columns method implemented yet.

df.keys().tolist()因为没有df。列方法实现。

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.

这种密钥方法的优点是,即使在更新版本的熊猫中也可以使用,因此它更加通用。

#11


2  

as answered by Simeon Visser...you could do

正如西米恩·维瑟的回答……你可以做

list(my_dataframe.columns.values) 

or

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

但我认为最甜蜜的地方是:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.

它是显式的,同时也不是不必要的长。

#12


1  

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

#13


1  

This solution lists all the columns of your object my_dataframe:

这个解决方案列出了对象my_dataframe的所有列:

print(list(my_dataframe))

#14


0  

can use index attributes

可以使用索引属性

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                 index=['a', 'b', 'c'])

#1


905  

You can get the values as a list by doing:

您可以通过以下操作获取列表中的值:

list(my_dataframe.columns.values)

Also you can simply use:

你也可以简单地使用:

list(my_dataframe)

#2


261  

There is a built in method which is the most performant:

有一种内建的方法是最有效的:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function to return a list.

.columns返回一个索引。值返回一个数组,这个数组有一个帮助函数来返回一个列表。

EDIT

编辑

For those who hate typing this is probably the shortest method:

对于那些讨厌打字的人来说,这可能是最短的方法:

list(df)

#3


61  

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

做了一些快速的测试,也许并不奇怪,使用dataframe.columns.values.tolist()的内置版本是最快的:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)

(我还是很喜欢这个列表(dataframe),谢谢EdChum!)

#4


35  

Its gets even simpler (by pandas 0.16.0) :

它变得更简单(熊猫0.16.0):

df.columns.tolist()

will give you the column names in a nice list.

会给出一个很好的列表中的列名。

#5


27  

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

若要在调试器模式下列出dataframe的列,请使用列表理解:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

顺便说一下,你可以通过排序得到一个排序后的列表:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

#6


17  

That's available as my_dataframe.columns.

这是作为my_dataframe.columns可用。

#7


13  

It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

这很有趣,但是df.columns.value。tolist()的速度几乎是df.columns.tolist()的3倍,但我认为它们是相同的:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

#8


9  

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

对于IPython笔记本中的数据探索,我最喜欢的方式是:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

这将产生一个易于阅读的字母排序的列表。

In a code repository

In code I find it more explicit to do

在代码中,我发现这样做更加明确

df.columns

Because it tells others reading your code what you are doing.

因为它告诉其他人正在阅读你的代码。

#9


9  

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

DataFrame遵循遍历对象的“键”的类似命令的约定。

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and pythonic way

创建一个键/列的列表——对象方法to_list()和python方法

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels

DataFrame上的基本迭代返回列标签

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

不要将DataFrame转换为列表,只需要获取列标签。在寻找方便的代码示例时不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

#10


2  

I feel question deserves additional explanation.

我觉得这个问题值得进一步解释。

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

正如@fixxxer所指出的,答案取决于您在项目中使用的熊猫版本。你可以通过pd得到。__version__命令。

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

如果你因为某些原因像我一样(在debian jessie I使用了0.14.1)使用较老版本的熊猫而不是0.16.0,那么你需要使用:

df.keys().tolist() because there is no df.columns method implemented yet.

df.keys().tolist()因为没有df。列方法实现。

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.

这种密钥方法的优点是,即使在更新版本的熊猫中也可以使用,因此它更加通用。

#11


2  

as answered by Simeon Visser...you could do

正如西米恩·维瑟的回答……你可以做

list(my_dataframe.columns.values) 

or

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

但我认为最甜蜜的地方是:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.

它是显式的,同时也不是不必要的长。

#12


1  

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

#13


1  

This solution lists all the columns of your object my_dataframe:

这个解决方案列出了对象my_dataframe的所有列:

print(list(my_dataframe))

#14


0  

can use index attributes

可以使用索引属性

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                 index=['a', 'b', 'c'])