Given an existing Dataframe that is indexed.
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 -0.131666 -0.315019 0.306728 -0.642224 -0.294562
1 0.769310 -1.277065 0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571 0.544697 0.275880 -0.451564
3 0.612561 -0.540457 2.390871 -2.699741 0.534807
4 -1.504476 -2.113726 0.785208 -1.037256 -0.292959
5 0.467429 1.327839 -1.666649 1.144189 0.322896
6 -0.306556 1.668364 0.036508 0.596452 0.066755
7 -1.689779 1.469891 -0.068087 -1.113231 0.382235
8 0.028250 -2.145618 0.555973 -0.473131 -0.638056
9 0.633408 -0.791857 0.933033 1.485575 -0.021429
>>> df.set_index("a")
b c d e
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
-1.561325 -0.155571 0.544697 0.275880 -0.451564
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
How to move the 3rd row to the first row?
That says, expected result:
b c d e
-1.561325 -0.155571 0.544697 0.275880 -0.451564
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
Now the original first row should become the second row.
4 个解决方案
Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.
For example
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Now the index can be set back to range(4) without reindexing:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
if n < 1:
print 'error: 1st arg must be an int > 0'
if n == 1:
print 'nothing to do when first arg == 1'
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.
putfirst的参数是n,这是行重新定位到第一行位置的序号位置,所以如果第三行被重新定位,那么n = 3;df是包含要重新定位的行的DataFrame。
Here is a demo:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
b c d e
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
To move the third row to the first, you can create an index moving the target row to the first element. I use a conditional list comprehension to join by lists.
Then, just use iloc
to select the desired index rows.
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
a b c
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
>>> df.iloc[idx]
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
if desired, you can also reset your index.
>>> df.iloc[idx].reset_index(drop=True)
a b c
0 0.950088 -0.151357 -0.103219
1 1.764052 0.400157 0.978738
2 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Alternatively, you can just reindex the list using idx
>>> df.reindex(idx)
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
This is not elegant, but works so far:
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 1.124763 -0.416770 1.347839 -0.944334 0.738686
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 0.549966 0.357076 -0.880669 -0.187731 -0.221997
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> row = df.ix[0].copy()
>>> row
a 1.124763
b -0.416770
c 1.347839
d -0.944334
e 0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
a b c d e
0 0.549966 0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 1.124763 -0.416770 1.347839 -0.944334 0.738686
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> df.set_index('a')
b c d e
0.549966 0.357076 -0.880669 -0.187731 -0.221997
-0.348112 0.786822 -1.161970 -1.645065 -0.075205
1.124763 -0.416770 1.347839 -0.944334 0.738686
0.311057 -0.126432 -1.187644 2.151804 0.791835
-0.310849 0.753750 -1.087447 0.095884 1.449832
-0.272344 0.278788 -0.724369 -0.568442 0.164909
0.942927 -0.273203 0.203322 1.099572 -0.505160
0.526321 1.665012 0.915676 -1.174497 -2.270662
-0.959773 0.921732 1.396364 -1.383112 0.603030
-2.802902 -0.572469 -1.599550 -1.305605 0.578198
If that's what you want...
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
you can simply do the following
df.reindex([2, 0 ,1] + range(3, len(df)))
or you can do the following
pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])
# this line rearrange the first 3 rows
df.reindex([2, 0, 1])
# slice data from third row
# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])
Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.
For example
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Now the index can be set back to range(4) without reindexing:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
if n < 1:
print 'error: 1st arg must be an int > 0'
if n == 1:
print 'nothing to do when first arg == 1'
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.
putfirst的参数是n,这是行重新定位到第一行位置的序号位置,所以如果第三行被重新定位,那么n = 3;df是包含要重新定位的行的DataFrame。
Here is a demo:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
b c d e
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
To move the third row to the first, you can create an index moving the target row to the first element. I use a conditional list comprehension to join by lists.
Then, just use iloc
to select the desired index rows.
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
a b c
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
>>> df.iloc[idx]
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
if desired, you can also reset your index.
>>> df.iloc[idx].reset_index(drop=True)
a b c
0 0.950088 -0.151357 -0.103219
1 1.764052 0.400157 0.978738
2 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Alternatively, you can just reindex the list using idx
>>> df.reindex(idx)
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
This is not elegant, but works so far:
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 1.124763 -0.416770 1.347839 -0.944334 0.738686
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 0.549966 0.357076 -0.880669 -0.187731 -0.221997
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> row = df.ix[0].copy()
>>> row
a 1.124763
b -0.416770
c 1.347839
d -0.944334
e 0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
a b c d e
0 0.549966 0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 1.124763 -0.416770 1.347839 -0.944334 0.738686
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> df.set_index('a')
b c d e
0.549966 0.357076 -0.880669 -0.187731 -0.221997
-0.348112 0.786822 -1.161970 -1.645065 -0.075205
1.124763 -0.416770 1.347839 -0.944334 0.738686
0.311057 -0.126432 -1.187644 2.151804 0.791835
-0.310849 0.753750 -1.087447 0.095884 1.449832
-0.272344 0.278788 -0.724369 -0.568442 0.164909
0.942927 -0.273203 0.203322 1.099572 -0.505160
0.526321 1.665012 0.915676 -1.174497 -2.270662
-0.959773 0.921732 1.396364 -1.383112 0.603030
-2.802902 -0.572469 -1.599550 -1.305605 0.578198
If that's what you want...
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
you can simply do the following
df.reindex([2, 0 ,1] + range(3, len(df)))
or you can do the following
pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])
# this line rearrange the first 3 rows
df.reindex([2, 0, 1])
# slice data from third row
# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])