I would like to merge two DataFrames while creating a multilevel column naming scheme denoting which dataframe the rows came from. For example:
我想要合并两个DataFrames,同时创建一个多级列命名方案,表示这些行来自哪个dataframe。例如:
In [98]: A=pd.DataFrame(np.arange(9.).reshape(3,3),columns=list('abc'))
In [99]: A
Out[99]:
a b c
0 0 1 2
1 3 4 5
2 6 7 8
In [100]: B=A.copy()
If I use pd.merge()
, then I get
如果我使用pd.merge(),那么我就得到了。
In [104]: pd.merge(A,B,left_index=True,right_index=True)
Out[104]:
a_x b_x c_x a_y b_y c_y
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
Which is what I expect with that statement, what I would like (but I don't know how to get!) is:
这就是我所期望的,我希望(但我不知道如何得到!)
In [104]: <<one or more statements>>
Out[104]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
Can this be done without changing the original pd.DataFrame
calls? I am reading the data in the dataframes in from .csv files and that might be my problem.
这能在不改变原始pd的情况下完成吗?DataFrame电话吗?我正在从。csv文件中读取dataframes中的数据,这可能是我的问题。
2 个解决方案
#1
6
first case can be ordered arbitrarily among A,B (not the columns, just the order A or B) 2nd should preserve ordering
第一种情况可以在A、B之间任意排序(不是列,只是A或B)第二种情况下应该保持排序
IMHO this is pandonic!
恕我直言这是pandonic !
In [5]: concat(dict(A = A, B = B),axis=1)
Out[5]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
In [6]: concat([ A, B ], keys=['A','B'],axis=1)
Out[6]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
#2
5
Here's one way, which does change A and B:
这是改变A和B的一种方式:
In [10]: from itertools import cycle
In [11]: A.columns = pd.MultiIndex.from_tuples(zip(cycle('A'), A.columns))
In [12]: A
Out[12]:
A
a b c
0 0 1 2
1 3 4 5
2 6 7 8
In [13]: B.columns = pd.MultiIndex.from_tuples(zip(cycle('B'), B.columns))
In [14]: A.join(B)
Out[14]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
I actually think this would be a good alternative behaviour, rather than suffixes...
我认为这是一种很好的替代行为,而不是后缀。
#1
6
first case can be ordered arbitrarily among A,B (not the columns, just the order A or B) 2nd should preserve ordering
第一种情况可以在A、B之间任意排序(不是列,只是A或B)第二种情况下应该保持排序
IMHO this is pandonic!
恕我直言这是pandonic !
In [5]: concat(dict(A = A, B = B),axis=1)
Out[5]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
In [6]: concat([ A, B ], keys=['A','B'],axis=1)
Out[6]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
#2
5
Here's one way, which does change A and B:
这是改变A和B的一种方式:
In [10]: from itertools import cycle
In [11]: A.columns = pd.MultiIndex.from_tuples(zip(cycle('A'), A.columns))
In [12]: A
Out[12]:
A
a b c
0 0 1 2
1 3 4 5
2 6 7 8
In [13]: B.columns = pd.MultiIndex.from_tuples(zip(cycle('B'), B.columns))
In [14]: A.join(B)
Out[14]:
A B
a b c a b c
0 0 1 2 0 1 2
1 3 4 5 3 4 5
2 6 7 8 6 7 8
I actually think this would be a good alternative behaviour, rather than suffixes...
我认为这是一种很好的替代行为,而不是后缀。