Python熊猫——基于列名在数据存储器中重新排序列

时间:2021-12-06 15:51:31

I have a dataframe with over 200 columns (don't ask why). The issue is as they were generated the order is

我有一个超过200列的dataframe(不要问为什么)。问题是,当它们被生成时,顺序是

['Q1.3','Q6.1','Q1.2','Q1.1',......]

I need to re-order the columns as follows:

我需要重新排列列如下:

['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]

Is there some way for me to do this within python?

在python中有什么方法可以做到这一点吗?

11 个解决方案

#1


190  

df.reindex_axis(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

这假设对列名进行排序将给出您想要的顺序。如果您的列名不能按词法排序(例如,如果希望Q9.1之后出现Q10.3列),则需要进行不同的排序,但这与熊猫无关。

#2


200  

You can also do more succinctly:

你也可以做得更简洁:

df.sort_index(axis=1)

df.sort_index(轴= 1)

Edit:

编辑:

Make sure you hold the value

确保你持有价值。

df = df.sort_index(axis=1)

df = df.sort_index(轴= 1)

Or do it in place

或者在适当的地方做

df.sort_index(axis=1, inplace=True)

df。sort_index(轴= 1,原地= True)

#3


20  

You can just do:

你可以做的:

df[sorted(df.columns)]

#4


16  

Tweet's answer can be passed to BrenBarn's answer above with

推特的答案可以通过布伦仓的回答。

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

So for your example, say:

举个例子:

vals = randint(low=16, high=80, size=25).reshape(5,5)
cols = ['Q1.3', 'Q6.1', 'Q1.2', 'Q9.1', 'Q10.2']
data = DataFrame(vals, columns = cols)

You get:

你会得到:

data

    Q1.3    Q6.1    Q1.2    Q9.1    Q10.2
0   73      29      63      51      72
1   61      29      32      68      57
2   36      49      76      18      37
3   63      61      51      30      31
4   36      66      71      24      77

Then do:

然后做:

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

resulting in:

导致:

data


     Q1.2    Q1.3    Q6.1    Q9.1    Q10.2
0    2       0       1       3       4
1    7       5       6       8       9
2    2       0       1       3       4
3    2       0       1       3       4
4    2       0       1       3       4

#5


13  

Don't forget to add "inplace=True" to Wes' answer or set the result to a new DataFrame.

不要忘记在Wes的答案中添加“inplace=True”,或者将结果设置为一个新的DataFrame。

df.sort_index(axis=1, inplace=True)

#6


9  

If you need an arbitrary sequence instead of sorted sequence, you could do:

如果你需要一个任意序列而不是排序序列,你可以这样做:

sequence = ['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]
your_dataframe = your_dataframe.reindex(columns=sequence)

I tested this in 2.7.10 and it worked for me.

我在2.7.10测试过这个,它对我有用。

#7


6  

For several columns, You can put columns order what you want:

对于几个列,你可以按你想要的顺序排列:

#['A', 'B', 'C'] <-this is your columns order
df = df[['C', 'B', 'A']]

This example shows sorting and slicing columns:

这个例子显示了排序和切片列:

d = {'col1':[1, 2, 3], 'col2':[4, 5, 6], 'col3':[7, 8, 9], 'col4':[17, 18, 19]}
df = pandas.DataFrame(d)

You get:

你会得到:

col1  col2  col3  col4
 1     4     7    17
 2     5     8    18
 3     6     9    19

Then do:

然后做:

df = df[['col3', 'col2', 'col1']]

Resulting in:

导致:

col3  col2  col1
7     4     1
8     5     2
9     6     3     

#8


3  

The quickest method is:

最快的方法是:

df.sort_index(axis=1)

Be aware that this creates a new instance. Therefore you need to store the result in a new variable:

请注意这将创建一个新实例。因此需要将结果存储在一个新变量中:

sortedDf=df.sort_index(axis=1)

#9


0  

The sort method and sorted function allow you to provide a custom function to extract the key used for comparison:

排序方法和排序函数允许您提供自定义函数来提取用于比较的键:

>>> ls = ['Q1.3', 'Q6.1', 'Q1.2']
>>> sorted(ls, key=lambda x: float(x[1:]))
['Q1.2', 'Q1.3', 'Q6.1']

#10


0  

One use-case is that you have named (some of) your columns with some prefix, and you want the columns sorted with those prefixes all together and in some particular order (not alphabetical).

一个用例是,您已经用一些前缀命名了(一些)列,并且希望这些列与这些前缀一起排序,并按照某种特定的顺序(不是按字母顺序)。

For example, you might start all of your features with Ft_, labels with Lbl_, etc, and you want all unprefixed columns first, then all features, then the label. You can do this with the following function (I will note a possible efficiency problem using sum to reduce lists, but this isn't an issue unless you have a LOT of columns, which I do not):

例如,您可以使用Ft_、Lbl_等标签来启动所有特性,并且您希望所有的非前缀列首先,然后是所有特性,然后是标签。你可以用下面的函数来做这个(我将注意到使用sum来减少列表的一个可能的效率问题,但是这不是问题,除非你有很多列,我没有):

def sortedcols(df, groups = ['Ft_', 'Lbl_'] ):
    return df[ sum([list(filter(re.compile(r).search, list(df.columns).copy())) for r in (lambda l: ['^(?!(%s))' % '|'.join(l)] + ['^%s' % i  for i in l ] )(groups)   ], [])  ]

#11


-2  

print df.sort_index(by='Frequency',ascending=False)

where by is the name of the column,if you want to sort the dataset based on column

如果要根据列对数据集进行排序,列的名称在哪里

#1


190  

df.reindex_axis(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

这假设对列名进行排序将给出您想要的顺序。如果您的列名不能按词法排序(例如,如果希望Q9.1之后出现Q10.3列),则需要进行不同的排序,但这与熊猫无关。

#2


200  

You can also do more succinctly:

你也可以做得更简洁:

df.sort_index(axis=1)

df.sort_index(轴= 1)

Edit:

编辑:

Make sure you hold the value

确保你持有价值。

df = df.sort_index(axis=1)

df = df.sort_index(轴= 1)

Or do it in place

或者在适当的地方做

df.sort_index(axis=1, inplace=True)

df。sort_index(轴= 1,原地= True)

#3


20  

You can just do:

你可以做的:

df[sorted(df.columns)]

#4


16  

Tweet's answer can be passed to BrenBarn's answer above with

推特的答案可以通过布伦仓的回答。

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

So for your example, say:

举个例子:

vals = randint(low=16, high=80, size=25).reshape(5,5)
cols = ['Q1.3', 'Q6.1', 'Q1.2', 'Q9.1', 'Q10.2']
data = DataFrame(vals, columns = cols)

You get:

你会得到:

data

    Q1.3    Q6.1    Q1.2    Q9.1    Q10.2
0   73      29      63      51      72
1   61      29      32      68      57
2   36      49      76      18      37
3   63      61      51      30      31
4   36      66      71      24      77

Then do:

然后做:

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

resulting in:

导致:

data


     Q1.2    Q1.3    Q6.1    Q9.1    Q10.2
0    2       0       1       3       4
1    7       5       6       8       9
2    2       0       1       3       4
3    2       0       1       3       4
4    2       0       1       3       4

#5


13  

Don't forget to add "inplace=True" to Wes' answer or set the result to a new DataFrame.

不要忘记在Wes的答案中添加“inplace=True”,或者将结果设置为一个新的DataFrame。

df.sort_index(axis=1, inplace=True)

#6


9  

If you need an arbitrary sequence instead of sorted sequence, you could do:

如果你需要一个任意序列而不是排序序列,你可以这样做:

sequence = ['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]
your_dataframe = your_dataframe.reindex(columns=sequence)

I tested this in 2.7.10 and it worked for me.

我在2.7.10测试过这个,它对我有用。

#7


6  

For several columns, You can put columns order what you want:

对于几个列,你可以按你想要的顺序排列:

#['A', 'B', 'C'] <-this is your columns order
df = df[['C', 'B', 'A']]

This example shows sorting and slicing columns:

这个例子显示了排序和切片列:

d = {'col1':[1, 2, 3], 'col2':[4, 5, 6], 'col3':[7, 8, 9], 'col4':[17, 18, 19]}
df = pandas.DataFrame(d)

You get:

你会得到:

col1  col2  col3  col4
 1     4     7    17
 2     5     8    18
 3     6     9    19

Then do:

然后做:

df = df[['col3', 'col2', 'col1']]

Resulting in:

导致:

col3  col2  col1
7     4     1
8     5     2
9     6     3     

#8


3  

The quickest method is:

最快的方法是:

df.sort_index(axis=1)

Be aware that this creates a new instance. Therefore you need to store the result in a new variable:

请注意这将创建一个新实例。因此需要将结果存储在一个新变量中:

sortedDf=df.sort_index(axis=1)

#9


0  

The sort method and sorted function allow you to provide a custom function to extract the key used for comparison:

排序方法和排序函数允许您提供自定义函数来提取用于比较的键:

>>> ls = ['Q1.3', 'Q6.1', 'Q1.2']
>>> sorted(ls, key=lambda x: float(x[1:]))
['Q1.2', 'Q1.3', 'Q6.1']

#10


0  

One use-case is that you have named (some of) your columns with some prefix, and you want the columns sorted with those prefixes all together and in some particular order (not alphabetical).

一个用例是,您已经用一些前缀命名了(一些)列,并且希望这些列与这些前缀一起排序,并按照某种特定的顺序(不是按字母顺序)。

For example, you might start all of your features with Ft_, labels with Lbl_, etc, and you want all unprefixed columns first, then all features, then the label. You can do this with the following function (I will note a possible efficiency problem using sum to reduce lists, but this isn't an issue unless you have a LOT of columns, which I do not):

例如,您可以使用Ft_、Lbl_等标签来启动所有特性,并且您希望所有的非前缀列首先,然后是所有特性,然后是标签。你可以用下面的函数来做这个(我将注意到使用sum来减少列表的一个可能的效率问题,但是这不是问题,除非你有很多列,我没有):

def sortedcols(df, groups = ['Ft_', 'Lbl_'] ):
    return df[ sum([list(filter(re.compile(r).search, list(df.columns).copy())) for r in (lambda l: ['^(?!(%s))' % '|'.join(l)] + ['^%s' % i  for i in l ] )(groups)   ], [])  ]

#11


-2  

print df.sort_index(by='Frequency',ascending=False)

where by is the name of the column,if you want to sort the dataset based on column

如果要根据列对数据集进行排序,列的名称在哪里