I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.
我想从熊猫数据存储器中获取列标题的列表。DataFrame将来自用户输入,因此我不知道会有多少列,也不知道会调用什么列。
For example, if I'm given a DataFrame like this:
例如,如果给我一个这样的DataFrame:
>>> my_dataframe
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
5 4 8 3
6 8 2 8
7 9 9 10
8 6 6 4
9 10 10 7
I would want to get a list like this:
我想要一个这样的清单:
>>> header_list
[y, gdp, cap]
14 个解决方案
#1
905
You can get the values as a list by doing:
您可以通过以下操作获取列表中的值:
list(my_dataframe.columns.values)
Also you can simply use:
你也可以简单地使用:
list(my_dataframe)
#2
261
There is a built in method which is the most performant:
有一种内建的方法是最有效的:
my_dataframe.columns.values.tolist()
.columns
returns an Index
, .columns.values
returns an array
and this has a helper function to return a list
.
.columns返回一个索引。值返回一个数组,这个数组有一个帮助函数来返回一个列表。
EDIT
编辑
For those who hate typing this is probably the shortest method:
对于那些讨厌打字的人来说,这可能是最短的方法:
list(df)
#3
61
Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()
is the fastest:
做了一些快速的测试,也许并不奇怪,使用dataframe.columns.values.tolist()的内置版本是最快的:
In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop
In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop
In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop
In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop
(I still really like the list(dataframe)
though, so thanks EdChum!)
(我还是很喜欢这个列表(dataframe),谢谢EdChum!)
#4
35
Its gets even simpler (by pandas 0.16.0) :
它变得更简单(熊猫0.16.0):
df.columns.tolist()
will give you the column names in a nice list.
会给出一个很好的列表中的列名。
#5
27
>>> list(my_dataframe)
['y', 'gdp', 'cap']
To list the columns of a dataframe while in debugger mode, use a list comprehension:
若要在调试器模式下列出dataframe的列,请使用列表理解:
>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']
By the way, you can get a sorted list simply by using sorted
:
顺便说一下,你可以通过排序得到一个排序后的列表:
>>> sorted(my_dataframe)
['cap', 'gdp', 'y']
#6
17
That's available as my_dataframe.columns
.
这是作为my_dataframe.columns可用。
#7
13
It's interesting but df.columns.values.tolist()
is almost 3 times faster then df.columns.tolist()
but I thought that they are the same:
这很有趣,但是df.columns.value。tolist()的速度几乎是df.columns.tolist()的3倍,但我认为它们是相同的:
In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop
In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop
#8
9
In the Notebook
For data exploration in the IPython notebook, my preferred way is this:
对于IPython笔记本中的数据探索,我最喜欢的方式是:
sorted(df)
Which will produce an easy to read alphabetically ordered list.
这将产生一个易于阅读的字母排序的列表。
In a code repository
In code I find it more explicit to do
在代码中,我发现这样做更加明确
df.columns
Because it tells others reading your code what you are doing.
因为它告诉其他人正在阅读你的代码。
#9
9
A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.
DataFrame遵循遍历对象的“键”的类似命令的约定。
my_dataframe.keys()
Create a list of keys/columns - object method to_list()
and pythonic way
创建一个键/列的列表——对象方法to_list()和python方法
my_dataframe.keys().to_list()
list(my_dataframe.keys())
Basic iteration on a DataFrame returns column labels
DataFrame上的基本迭代返回列标签
[column for column in my_dataframe]
Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.
不要将DataFrame转换为列表,只需要获取列标签。在寻找方便的代码示例时不要停止思考。
xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)
#10
2
I feel question deserves additional explanation.
我觉得这个问题值得进一步解释。
As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__
command.
正如@fixxxer所指出的,答案取决于您在项目中使用的熊猫版本。你可以通过pd得到。__version__命令。
If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:
如果你因为某些原因像我一样(在debian jessie I使用了0.14.1)使用较老版本的熊猫而不是0.16.0,那么你需要使用:
df.keys().tolist()
because there is no df.columns
method implemented yet.
df.keys().tolist()因为没有df。列方法实现。
The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.
这种密钥方法的优点是,即使在更新版本的熊猫中也可以使用,因此它更加通用。
#11
2
as answered by Simeon Visser...you could do
正如西米恩·维瑟的回答……你可以做
list(my_dataframe.columns.values)
or
或
list(my_dataframe) # for less typing.
But I think most the sweet spot is:
但我认为最甜蜜的地方是:
list(my_dataframe.columns)
It is explicit, at the same time not unnecessarily long.
它是显式的,同时也不是不必要的长。
#12
1
n = []
for i in my_dataframe.columns:
n.append(i)
print n
#13
1
This solution lists all the columns of your object my_dataframe:
这个解决方案列出了对象my_dataframe的所有列:
print(list(my_dataframe))
#14
0
can use index attributes
可以使用索引属性
df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
index=['a', 'b', 'c'])
#1
905
You can get the values as a list by doing:
您可以通过以下操作获取列表中的值:
list(my_dataframe.columns.values)
Also you can simply use:
你也可以简单地使用:
list(my_dataframe)
#2
261
There is a built in method which is the most performant:
有一种内建的方法是最有效的:
my_dataframe.columns.values.tolist()
.columns
returns an Index
, .columns.values
returns an array
and this has a helper function to return a list
.
.columns返回一个索引。值返回一个数组,这个数组有一个帮助函数来返回一个列表。
EDIT
编辑
For those who hate typing this is probably the shortest method:
对于那些讨厌打字的人来说,这可能是最短的方法:
list(df)
#3
61
Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()
is the fastest:
做了一些快速的测试,也许并不奇怪,使用dataframe.columns.values.tolist()的内置版本是最快的:
In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop
In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop
In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop
In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop
(I still really like the list(dataframe)
though, so thanks EdChum!)
(我还是很喜欢这个列表(dataframe),谢谢EdChum!)
#4
35
Its gets even simpler (by pandas 0.16.0) :
它变得更简单(熊猫0.16.0):
df.columns.tolist()
will give you the column names in a nice list.
会给出一个很好的列表中的列名。
#5
27
>>> list(my_dataframe)
['y', 'gdp', 'cap']
To list the columns of a dataframe while in debugger mode, use a list comprehension:
若要在调试器模式下列出dataframe的列,请使用列表理解:
>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']
By the way, you can get a sorted list simply by using sorted
:
顺便说一下,你可以通过排序得到一个排序后的列表:
>>> sorted(my_dataframe)
['cap', 'gdp', 'y']
#6
17
That's available as my_dataframe.columns
.
这是作为my_dataframe.columns可用。
#7
13
It's interesting but df.columns.values.tolist()
is almost 3 times faster then df.columns.tolist()
but I thought that they are the same:
这很有趣,但是df.columns.value。tolist()的速度几乎是df.columns.tolist()的3倍,但我认为它们是相同的:
In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop
In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop
#8
9
In the Notebook
For data exploration in the IPython notebook, my preferred way is this:
对于IPython笔记本中的数据探索,我最喜欢的方式是:
sorted(df)
Which will produce an easy to read alphabetically ordered list.
这将产生一个易于阅读的字母排序的列表。
In a code repository
In code I find it more explicit to do
在代码中,我发现这样做更加明确
df.columns
Because it tells others reading your code what you are doing.
因为它告诉其他人正在阅读你的代码。
#9
9
A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.
DataFrame遵循遍历对象的“键”的类似命令的约定。
my_dataframe.keys()
Create a list of keys/columns - object method to_list()
and pythonic way
创建一个键/列的列表——对象方法to_list()和python方法
my_dataframe.keys().to_list()
list(my_dataframe.keys())
Basic iteration on a DataFrame returns column labels
DataFrame上的基本迭代返回列标签
[column for column in my_dataframe]
Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.
不要将DataFrame转换为列表,只需要获取列标签。在寻找方便的代码示例时不要停止思考。
xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)
#10
2
I feel question deserves additional explanation.
我觉得这个问题值得进一步解释。
As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__
command.
正如@fixxxer所指出的,答案取决于您在项目中使用的熊猫版本。你可以通过pd得到。__version__命令。
If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:
如果你因为某些原因像我一样(在debian jessie I使用了0.14.1)使用较老版本的熊猫而不是0.16.0,那么你需要使用:
df.keys().tolist()
because there is no df.columns
method implemented yet.
df.keys().tolist()因为没有df。列方法实现。
The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.
这种密钥方法的优点是,即使在更新版本的熊猫中也可以使用,因此它更加通用。
#11
2
as answered by Simeon Visser...you could do
正如西米恩·维瑟的回答……你可以做
list(my_dataframe.columns.values)
or
或
list(my_dataframe) # for less typing.
But I think most the sweet spot is:
但我认为最甜蜜的地方是:
list(my_dataframe.columns)
It is explicit, at the same time not unnecessarily long.
它是显式的,同时也不是不必要的长。
#12
1
n = []
for i in my_dataframe.columns:
n.append(i)
print n
#13
1
This solution lists all the columns of your object my_dataframe:
这个解决方案列出了对象my_dataframe的所有列:
print(list(my_dataframe))
#14
0
can use index attributes
可以使用索引属性
df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
index=['a', 'b', 'c'])