I have a the following dataframes in python:
我在python中有以下数据帧:
dataframe 1
1 2 3 4 5
dog dog 0 1 1 0 1
fox 1 0 0 0 0
jumps 0 0 0 1 0
over 1 0 1 0 1
the 0 1 0 0 0
fox dog 0 0 1 1 1
fox 0 0 0 0 0
jumps 0 0 1 0 1
over 0 1 0 0 0
the 0 0 0 1 1
jumps dog 0 0 0 0 0
fox 0 1 0 1 1
jumps 0 0 0 0 1
over 1 0 1 0 0
the 0 0 0 0 0
over dog 0 0 1 0 0
fox 0 1 0 1 1
jumps 0 0 0 0 0
over 0 1 0 1 0
the 1 0 1 0 0
the dog 0 0 1 0 0
fox 0 0 0 0 1
jumps 0 1 0 0 0
over 0 0 1 1 0
the 0 1 1 0 1
dataframe 2
1 2 4 5
dog dog 1 0 0 0
fox 0 1 0 1
jumps 0 1 1 0
the 0 0 0 0
horse 1 0 1 0
fox dog 0 0 0 0
fox 0 1 0 1
over 0 0 0 0
the 0 1 0 1
cat 0 0 1 0
You can see that dataframe2 contains multiindexes of dataframe1 but it also contains additional multiindexes like horse and cat. Dataframe 2 also doesn't contain all the columns of dataframe 1 as you can see it misses column 3.
您可以看到dataframe2包含dataframe1的多索引,但它还包含其他多索引,如马和猫。 Dataframe 2也不包含数据帧1的所有列,因为您可以看到它错过了第3列。
I want to subtract dataframe 2 from dataframe 1 in such a way that the function only subtracts the data which is common in both and ignores the rest and the resulting dataframe is in shape of dataframe 2.
我想从数据帧1中减去数据帧2,使得函数只减去两者中常见的数据并忽略其余数据,结果数据帧的形状为数据帧2。
Does any know if pandas provides a builtin way of doing this or do I need to construct a function myself. If so, can you point me in the right direction? Any suggestions are highly appreciated. Thank you.
有没有人知道pandas是否提供了内置的方法,或者我是否需要自己构建一个函数。如果是这样,你能指出我正确的方向吗?任何建议都非常感谢。谢谢。
NOTE: This question is similar to another question I posted here apart from the fact that I am not wanting to compare these, instead wanting to do an arithmetic operation of subtraction.
注意:这个问题类似于我在这里发布的另一个问题,除了我不想比较这些问题,而是想要进行减法的算术运算。
4 个解决方案
#1
4
I believe you simply want something like:
我相信你只想要这样的东西:
In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
1 2 4 5
dog dog 1.0 -1.0 0.0 -1.0
fox -1.0 1.0 0.0 1.0
horse 1.0 0.0 1.0 0.0
jumps 0.0 1.0 0.0 0.0
the 0.0 -1.0 0.0 0.0
fox cat 0.0 0.0 1.0 0.0
dog 0.0 0.0 -1.0 -1.0
fox 0.0 1.0 0.0 1.0
over 0.0 -1.0 0.0 0.0
the 0.0 1.0 -1.0 0.0
Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.
Pandas已经自动对齐索引,这是它的神奇之处,但你必须聪明地填充/删除nans。
Edit
Whoops, you actually want df1 - df2
, but with the shape of df2
, a little bit more tricky since then fillna(df1)
would prevent us from dropping the right rows, however, you can just use multiply by -1!
哎呀,你真的想要df1 - df2,但是df2的形状,有点棘手,因为那时fillna(df1)会阻止我们放弃正确的行,但是,你可以使用乘以-1!
In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
1 2 4 5
dog dog -1.0 1.0 -0.0 1.0
fox 1.0 -1.0 -0.0 -1.0
horse -1.0 -0.0 -1.0 -0.0
jumps -0.0 -1.0 -0.0 -0.0
the -0.0 1.0 -0.0 -0.0
fox cat -0.0 -0.0 -1.0 -0.0
dog -0.0 -0.0 1.0 1.0
fox -0.0 -1.0 -0.0 -1.0
over -0.0 1.0 -0.0 -0.0
the -0.0 -1.0 1.0 -0.0
Or, if those negative zeros bother you:
或者,如果那些负面零点打扰你:
In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
1 2 4 5
dog dog -1.0 1.0 0.0 1.0
fox 1.0 -1.0 0.0 -1.0
horse -1.0 0.0 -1.0 0.0
jumps 0.0 -1.0 0.0 0.0
the 0.0 1.0 0.0 0.0
fox cat 0.0 0.0 -1.0 0.0
dog 0.0 0.0 1.0 1.0
fox 0.0 -1.0 0.0 -1.0
over 0.0 1.0 0.0 0.0
the 0.0 -1.0 1.0 0.0
#2
4
IIUC:
IIUC:
In [24]: r = d1.sub(d2, axis=0)
In [25]: r.loc[r.index.intersection(d2.index)]
Out[25]:
1 2 3 4 5
dog dog -1.0 1.0 NaN 0.0 1.0
fox 1.0 -1.0 NaN 0.0 -1.0
horse NaN NaN NaN NaN NaN
jumps 0.0 -1.0 NaN 0.0 0.0
the 0.0 1.0 NaN 0.0 0.0
fox cat NaN NaN NaN NaN NaN
dog 0.0 0.0 NaN 1.0 1.0
fox 0.0 -1.0 NaN 0.0 -1.0
over 0.0 1.0 NaN 0.0 0.0
the 0.0 -1.0 NaN 1.0 0.0
#3
3
Let us do some thing like
让我们做一些事情
id=df2.index.values.tolist()
dd=df1.loc[list(set(df1.index.values.tolist())&set(id))]
(df2-dd).combine_first(df2).dropna(1)
1 2 4 5
dog dog 1.0 -1.0 0.0 -1.0
fox -1.0 1.0 0.0 1.0
horse 1.0 0.0 1.0 0.0
jumps 0.0 1.0 0.0 0.0
the 0.0 -1.0 0.0 0.0
fox cat 0.0 0.0 1.0 0.0
dog 0.0 0.0 -1.0 -1.0
fox 0.0 1.0 0.0 1.0
over 0.0 -1.0 0.0 0.0
the 0.0 1.0 -1.0 0.0
#4
2
Use pd.DataFrame.align
with the parameter 'inner'
to reduce both dataframes to only the common indices. Then pass results to pd.DataFrame.sub
使用带有参数'inner'的pd.DataFrame.align将两个数据帧都减少为只有公共索引。然后将结果传递给pd.DataFrame.sub
pd.DataFrame.sub(*df1.align(df2, 'inner'))
1 2 4 5
dog dog -1 1 0 1
fox 1 -1 0 -1
jumps 0 -1 0 0
the 0 1 0 0
fox dog 0 0 1 1
fox 0 -1 0 -1
over 0 1 0 0
the 0 -1 1 0
Written in two lines
写成两行
a, b = df1.align(df2, 'inner')
a - b
#1
4
I believe you simply want something like:
我相信你只想要这样的东西:
In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
1 2 4 5
dog dog 1.0 -1.0 0.0 -1.0
fox -1.0 1.0 0.0 1.0
horse 1.0 0.0 1.0 0.0
jumps 0.0 1.0 0.0 0.0
the 0.0 -1.0 0.0 0.0
fox cat 0.0 0.0 1.0 0.0
dog 0.0 0.0 -1.0 -1.0
fox 0.0 1.0 0.0 1.0
over 0.0 -1.0 0.0 0.0
the 0.0 1.0 -1.0 0.0
Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.
Pandas已经自动对齐索引,这是它的神奇之处,但你必须聪明地填充/删除nans。
Edit
Whoops, you actually want df1 - df2
, but with the shape of df2
, a little bit more tricky since then fillna(df1)
would prevent us from dropping the right rows, however, you can just use multiply by -1!
哎呀,你真的想要df1 - df2,但是df2的形状,有点棘手,因为那时fillna(df1)会阻止我们放弃正确的行,但是,你可以使用乘以-1!
In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
1 2 4 5
dog dog -1.0 1.0 -0.0 1.0
fox 1.0 -1.0 -0.0 -1.0
horse -1.0 -0.0 -1.0 -0.0
jumps -0.0 -1.0 -0.0 -0.0
the -0.0 1.0 -0.0 -0.0
fox cat -0.0 -0.0 -1.0 -0.0
dog -0.0 -0.0 1.0 1.0
fox -0.0 -1.0 -0.0 -1.0
over -0.0 1.0 -0.0 -0.0
the -0.0 -1.0 1.0 -0.0
Or, if those negative zeros bother you:
或者,如果那些负面零点打扰你:
In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
1 2 4 5
dog dog -1.0 1.0 0.0 1.0
fox 1.0 -1.0 0.0 -1.0
horse -1.0 0.0 -1.0 0.0
jumps 0.0 -1.0 0.0 0.0
the 0.0 1.0 0.0 0.0
fox cat 0.0 0.0 -1.0 0.0
dog 0.0 0.0 1.0 1.0
fox 0.0 -1.0 0.0 -1.0
over 0.0 1.0 0.0 0.0
the 0.0 -1.0 1.0 0.0
#2
4
IIUC:
IIUC:
In [24]: r = d1.sub(d2, axis=0)
In [25]: r.loc[r.index.intersection(d2.index)]
Out[25]:
1 2 3 4 5
dog dog -1.0 1.0 NaN 0.0 1.0
fox 1.0 -1.0 NaN 0.0 -1.0
horse NaN NaN NaN NaN NaN
jumps 0.0 -1.0 NaN 0.0 0.0
the 0.0 1.0 NaN 0.0 0.0
fox cat NaN NaN NaN NaN NaN
dog 0.0 0.0 NaN 1.0 1.0
fox 0.0 -1.0 NaN 0.0 -1.0
over 0.0 1.0 NaN 0.0 0.0
the 0.0 -1.0 NaN 1.0 0.0
#3
3
Let us do some thing like
让我们做一些事情
id=df2.index.values.tolist()
dd=df1.loc[list(set(df1.index.values.tolist())&set(id))]
(df2-dd).combine_first(df2).dropna(1)
1 2 4 5
dog dog 1.0 -1.0 0.0 -1.0
fox -1.0 1.0 0.0 1.0
horse 1.0 0.0 1.0 0.0
jumps 0.0 1.0 0.0 0.0
the 0.0 -1.0 0.0 0.0
fox cat 0.0 0.0 1.0 0.0
dog 0.0 0.0 -1.0 -1.0
fox 0.0 1.0 0.0 1.0
over 0.0 -1.0 0.0 0.0
the 0.0 1.0 -1.0 0.0
#4
2
Use pd.DataFrame.align
with the parameter 'inner'
to reduce both dataframes to only the common indices. Then pass results to pd.DataFrame.sub
使用带有参数'inner'的pd.DataFrame.align将两个数据帧都减少为只有公共索引。然后将结果传递给pd.DataFrame.sub
pd.DataFrame.sub(*df1.align(df2, 'inner'))
1 2 4 5
dog dog -1 1 0 1
fox 1 -1 0 -1
jumps 0 -1 0 0
the 0 1 0 0
fox dog 0 0 1 1
fox 0 -1 0 -1
over 0 1 0 0
the 0 -1 1 0
Written in two lines
写成两行
a, b = df1.align(df2, 'inner')
a - b