在对excel的操作中,调整列的顺序以及添加一些列也是经常用到的,下面我们用pandas实现这一功能。
1、调整列的顺序
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
>>> df = pd.read_excel(r 'D:/myExcel/1.xlsx' )
>>> df
A B C D
0 bob 12 78 87
1 millor 15 92 21
>>> df.columns
Index([ 'A' , 'B' , 'C' , 'D' ], dtype = 'object' )
# 这是最简单常用的一种方法,相当于指定列名让pandas
# 从df中获取
>>> df[[ 'A' , 'D' , 'C' , 'B' ]]
A D C B
0 bob 87 78 12
1 millor 21 92 15
# 这也是可以的
>>> df[[ 'A' , 'A' , 'A' , 'A' ]]
A A A A
0 bob bob bob bob
1 millor millor millor millor
|
2、添加某一列或者某几列
(1)直接添加
1
2
3
4
5
|
>>> df[ 'E' ] = [ 1 , 2 ]
>>> df
A B C D E
0 bob 12 78 87 1
1 millor 15 92 21 2
|
(2)调用assign方法。该方法善于根据已有的列添加新的列,通过基本运算,或者调用函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
>>> df
A B C D
0 bob 12 78 87
1 millor 15 92 21
# 其中E是列名,根据B列-C列的值得到
>>> df.assign(E = df[ 'B' ] - df[ 'C' ])
A B C D E
0 bob 12 78 87 - 66
1 millor 15 92 21 - 77
# 添加两列也可以
>>> df.assign(E = df[ 'B' ] - df[ 'C' ], F = df[ 'B' ] * df[ 'C' ])
A B C D E F
0 bob 12 78 87 - 66 936
1 millor 15 92 21 - 77 1380
|
哈哈,以上就是pandas关于调整列的顺序以及新增列的用法
补充:pandas修改DataFrame中的列名&调整列的顺序
修改列名:
直接调用接口:
1
|
df.rename()
|
看一下接口中的定义:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
def rename( self , * args, * * kwargs):
"""
Alter axes labels.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don't throw an
error.
See the :ref:`user guide <basics.rename>` for more.
Parameters
----------
mapper, index, columns : dict-like or function, optional
dict-like or functions transformations to apply to
that axis' values. Use either ``mapper`` and ``axis`` to
specify the axis to target with ``mapper``, or ``index`` and
``columns``.
axis : int or str, optional
Axis to target with ``mapper``. Can be either the axis name
('index', 'columns') or number (0, 1). The default is 'index'.
copy : boolean, default True
Also copy underlying data
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is
ignored.
level : int or level name, default None
In case of a MultiIndex, only rename labels in the specified
level.
Returns
-------
renamed : DataFrame
See Also
--------
pandas.DataFrame.rename_axis
Examples
--------
``DataFrame.rename`` supports two calling conventions
* ``(index=index_mapper, columns=columns_mapper, ...)``
* ``(mapper, axis={'index', 'columns'}, ...)``
We *highly* recommend using keyword arguments to clarify your
intent.
>>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(index=str, columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6
>>> df.rename(index=str, columns={"A": "a", "C": "c"})
a B
0 1 4
1 2 5
2 3 6
Using axis-style parameters
>>> df.rename(str.lower, axis='columns')
a b
0 1 4
1 2 5
2 3 6
>>> df.rename({1: 2, 2: 4}, axis='index')
A B
0 1 4
2 2 5
4 3 6
"""
axes = validate_axis_style_args( self , args, kwargs, 'mapper' , 'rename' )
kwargs.update(axes)
# Pop these, since the values are in `kwargs` under different names
kwargs.pop( 'axis' , None )
kwargs.pop( 'mapper' , None )
return super (DataFrame, self ).rename( * * kwargs)
|
注意:
一个*,输入可以是数组、元组,会把输入的数组或元组拆分成一个个元素。
两个*,输入必须是字典格式
示例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
>>> import pandas as pd
>>>a = pd.DataFrame({ 'A' :[ 1 , 2 , 3 ], 'B' :[ 4 , 5 , 6 ], 'C' :[ 7 , 8 , 9 ]})
>>> a
A B C
0 1 4 7
1 2 5 8
2 3 6 9
#将列名A替换为列名a,B改为b,C改为c
>>>a.rename(columns = { 'A' : 'a' , 'B' : 'b' , 'C' : 'c' }, inplace = True )
>>>a
a b c
0 1 4 7
1 2 5 8
2 3 6 9
|
调整列的顺序:
如:
1
2
3
4
5
6
7
8
9
10
11
|
>>> import pandas
>>> dict_a = { 'user_id' :[ 'webbang' , 'webbang' , 'webbang' ], 'book_id' :[ '3713327' , '4074636' , '26873486' ], 'rating' :[ '4' , '4' , '4' ],
'mark_date' :[ '2017-03-07' , '2017-03-07' , '2017-03-07' ]}
>>> df = pandas.DataFrame(dict_a) # 从字典创建DataFrame
>>> df # 创建好的df列名默认按首字母顺序排序,和字典中的先后顺序并不一样,字典中'user_id','book_id','rating','mark_date'
book_id mark_date rating user_id
0 3713327 2017 - 03 - 07 4 webbang
1 4074636 2017 - 03 - 07 4 webbang
2 26873486 2017 - 03 - 07 4 webbang
|
直接修改列名:
1
2
3
4
5
6
7
|
>>> df = df[[ 'user_id' , 'book_id' , 'rating' , 'mark_date' ]] # 调整列顺序为'user_id','book_id','rating','mark_date'
>>> df
user_id book_id rating mark_date
0 webbang 3713327 4 2017 - 03 - 07
1 webbang 4074636 4 2017 - 03 - 07
2 webbang 26873486 4 2017 - 03 - 07
|
就可以了。
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/weixin_45144170/article/details/105007711