I have two dataframes: df1
and df2
. If the Display Name
from df2
is in df1
's Display Name
column, I want to assign df1
s Type
, Format
, Behavior
, Datatype
values to df2
s values.
我有两个数据帧:df1和df2。如果df2的显示名称在df1的显示名称列中,我想将df1s类型,格式,行为,数据类型值分配给df2s值。
I've tried merge
every way I could think of. I think loc
is my best hope but I can't seem to get the assignment syntax right. Also, I'm looking for a succinct answer - preferably a one-liner.
我尝试过各种我能想到的方式。我认为loc是我最大的希望,但我似乎无法正确获得赋值语法。此外,我正在寻找一个简洁的答案 - 最好是一个单行。
Something like this:
像这样的东西:
df2.loc[df2['Display Name'].isin(df1['Display Name']), /
['Type', 'Format', 'Behavior', 'Datatype']] = ???
My Code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
{'Behavior': ['Attribute', 'Attribute', 'Attribute', 'Attribute', 'Attribute',
'Attribute', 'Attribute', 'Metric', 'Metric', 'Metric', 'Metric',
'Metric', 'Metric', 'Metric', 'Metric'],
'Datatype': ['object', 'object', 'object', 'object', 'object', 'object',
'object', 'int64', 'int64', 'int64', 'int64', 'float64',
'float64', 'float64', 'float64'],
'Display Name': ['Campaign', 'Campaign ID', 'Campaign ID', 'Campaign state',
'Campaign state', 'Currency', 'Currency', 'Impressions',
'Impressions', 'Clicks', 'Clicks', 'CTR', 'CTR', 'Avg. CPC',
'Avg. CPC'],
'Format': ['{}', '{}', '{}', '{}', '{}', '{}', '{}', '{:,.0f}', '{:,.0f}',
'{:,.0f}', '{:,.0f}', '{:.2f}%', '{:.2f}%', '${:,.2f}', '${:,.2f}'],
'Type': ['String', 'String', 'String', 'String', 'String', 'String', 'String',
'Integer', 'Integer', 'Integer', 'Integer', 'Percent', 'Percent',
'Currency', 'Currency']},
columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])
df2 = pd.DataFrame(
{ 'Behavior': [ 'Attribute', 'Metric', 'Metric', 'Metric', 'Attribute',
'Metric', 'Metric', 'Attribute', 'Metric', 'Metric', 'Metric'],
'Datatype': [ 'object', 'float64', 'float64', 'float64', 'object', 'int64',
'int64', 'object', 'float64', 'float64', 'float64'],
'Display Name': [ 'Match type', 'Destination URL', 'Final URL',
'Mobile final URL', 'Labels', 'Impressions', 'Clicks',
'CTR', 'Avg. CPC', 'Cost', 'Avg. position'],
'Format': [ '{}', '{:.2f}', '{:.2f}', '{:.2f}', '{}', '{:,.0f}', '{:,.0f}',
'{}', '{:.2f}', '{:.2f}', '{:.2f}'],
'Type': [ 'String', 'Float', 'Float', 'Float', 'String', 'Integer',
'Integer', 'String', 'Float', 'Float', 'Float']},
columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])
df2_vals_in_df1 = df2.loc[df2['Display Name'].isin(df1['Display Name']), df2.columns[:]]
df1_vals_in_df2 = df1.loc[df1['Display Name'].isin(df2['Display Name']), df1.columns[:]]
What it looks like:
它看起来像什么:
>>> df1
Display Name Type Format Behavior Datatype
0 Campaign String {} Attribute object
1 Campaign ID String {} Attribute object
2 Campaign ID String {} Attribute object
3 Campaign state String {} Attribute object
4 Campaign state String {} Attribute object
5 Currency String {} Attribute object
6 Currency String {} Attribute object
7 Impressions Integer {:,.0f} Metric int64
8 Impressions Integer {:,.0f} Metric int64
9 Clicks Integer {:,.0f} Metric int64
10 Clicks Integer {:,.0f} Metric int64
11 CTR Percent {:.2f}% Metric float64
12 CTR Percent {:.2f}% Metric float64
13 Avg. CPC Currency ${:,.2f} Metric float64
14 Avg. CPC Currency ${:,.2f} Metric float64
>>> df2
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR String {} Attribute object
8 Avg. CPC Float {:.2f} Metric float64
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
>>> df2_vals_in_df1
Display Name Type Format Behavior Datatype
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR String {} Attribute object
8 Avg. CPC Float {:.2f} Metric float64
>>> df1_vals_in_df2
Display Name Type Format Behavior Datatype
7 Impressions Integer {:,.0f} Metric int64
8 Impressions Integer {:,.0f} Metric int64
9 Clicks Integer {:,.0f} Metric int64
10 Clicks Integer {:,.0f} Metric int64
11 CTR Percent {:.2f}% Metric float64
12 CTR Percent {:.2f}% Metric float64
13 Avg. CPC Currency ${:,.2f} Metric float64
14 Avg. CPC Currency ${:,.2f} Metric float64
Note how df1_vals_in_df2
Display Name
might have the same name multiple times. Their Type
, Format
, Behavior
, Datatype
values will always be the same values in both rows.
请注意df1_vals_in_df2显示名称可能多次具有相同的名称。它们的类型,格式,行为,数据类型值在两行中始终是相同的值。
Expected Output of df2
:
df2的预期输出:
>>> df2
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64 <-- same
6 Clicks Integer {:,.0f} Metric int64 <-- same
7 CTR Percent {:.2f}% Metric float64 <-- changed
8 Avg. CPC Currency ${:,.2f} Metric float64 <-- changed
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
Takeaway #1: rows 5, 6 are the same because they are the same in both df1
and df2
.
外卖#1:第5,6行是相同的,因为它们在df1和df2中都是相同的。
Takeaway #2: row 7, changed from String, {}, Attribute, object
to Percent, {:.2f}%, Metric, float64
- the row values from df1
because Display Name
from df2
was found in Display Name
in df1
.
外卖#2:第7行,从String,{},Attribute,object更改为Percent,{:。2f}%,Metric,float64 - 来自df1的行值,因为df2中的显示名称在df1的显示名称中找到。
Takeaway #3: row 8, changed for the same reasons noted in takeaway #2.
外卖#3:第8行,因外卖#2中注明的相同原因而改变。
Tried:
Q1:Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?
Q1:Python Pandas:由另一个合并或过滤DataFrame。有没有更好的办法?
Doesn't address this question because I'm not trying to create a new dataframe; I'm trying to replace values in an existing dataframe from another.
没有解决这个问题,因为我不想创建一个新的数据帧;我正在尝试将现有数据框中的值替换为另一个。
Q2:Replace column values based on another dataframe python pandas - better way?
Q2:根据另一个数据帧python pandas替换列值 - 更好的方法?
Doesn't address this question because that example contains one df with correct values whereas my situation is a df with correct and incorrect values.
不解决这个问题,因为该示例包含一个具有正确值的df,而我的情况是具有正确和不正确值的df。
Apologies that this is such a long question. I just wanted to provide enough context.
抱歉这是一个很长的问题。我只是想提供足够的背景。
2 个解决方案
#1
2
I think that combine_first
will be an elegant solution, as per JohnE, provided you set Display Name
as an index. This brings me to another point. I think that your task is well-defined only if 'Display Name' corresponds to exactly one set of attributes within each table. Assuming that, you can drop duplicates, set index and use .update
like so:
我认为combine_first将是一个优雅的解决方案,根据JohnE,您可以将显示名称设置为索引。这让我想到另一点。我认为只有当“显示名称”对应于每个表中的一组属性时,您的任务才能明确定义。假设你可以删除重复项,设置索引并使用.update,如下所示:
df1 = df1.drop_duplicates()
df1 = df1.set_index('Display Name')
df2 = df2.set_index('Display Name')
df2_c = df2.copy()
df2.update(df1)
df1.update(df2_c)
del df2_c
You can reset the dimensions of df1
with an auxiliary index if you like.
如果您愿意,可以使用辅助索引重置df1的尺寸。
#2
0
Not ideal but I was able to recreate my expected output. Problem is, I wanted to avoid making df3
and want to do the replacement all within df2
so this isn't ideal.
不理想,但我能够重新创建我的预期输出。问题是,我想避免制作df3并想在df2内进行全部更换,所以这并不理想。
df2
before:
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR String {} Attribute object
8 Avg. CPC Float {:.2f} Metric float64
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
df3
after:
df3 = df2.combine_first(df1).drop_duplicates('Display Name', keep='last')
df3 = df3.set_index(df3['Display Name'].map(dict(zip(df2['Display Name'], df2.index)))).sort_index().reset_index(drop=True)
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR Percent {:.2f}% Metric float64
8 Avg. CPC Currency ${:,.2f} Metric float64
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
df2
before and after comparison:
比较前后的df2:
Display Name Type Format Behavior Datatype
0 True True True True True
1 True True True True True
2 True True True True True
3 True True True True True
4 True True True True True
5 True True True True True
6 True True True True True
7 True False False False False
8 True False False True True
9 True True True True True
10 True True True True True
#1
2
I think that combine_first
will be an elegant solution, as per JohnE, provided you set Display Name
as an index. This brings me to another point. I think that your task is well-defined only if 'Display Name' corresponds to exactly one set of attributes within each table. Assuming that, you can drop duplicates, set index and use .update
like so:
我认为combine_first将是一个优雅的解决方案,根据JohnE,您可以将显示名称设置为索引。这让我想到另一点。我认为只有当“显示名称”对应于每个表中的一组属性时,您的任务才能明确定义。假设你可以删除重复项,设置索引并使用.update,如下所示:
df1 = df1.drop_duplicates()
df1 = df1.set_index('Display Name')
df2 = df2.set_index('Display Name')
df2_c = df2.copy()
df2.update(df1)
df1.update(df2_c)
del df2_c
You can reset the dimensions of df1
with an auxiliary index if you like.
如果您愿意,可以使用辅助索引重置df1的尺寸。
#2
0
Not ideal but I was able to recreate my expected output. Problem is, I wanted to avoid making df3
and want to do the replacement all within df2
so this isn't ideal.
不理想,但我能够重新创建我的预期输出。问题是,我想避免制作df3并想在df2内进行全部更换,所以这并不理想。
df2
before:
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR String {} Attribute object
8 Avg. CPC Float {:.2f} Metric float64
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
df3
after:
df3 = df2.combine_first(df1).drop_duplicates('Display Name', keep='last')
df3 = df3.set_index(df3['Display Name'].map(dict(zip(df2['Display Name'], df2.index)))).sort_index().reset_index(drop=True)
Display Name Type Format Behavior Datatype
0 Match type String {} Attribute object
1 Destination URL Float {:.2f} Metric float64
2 Final URL Float {:.2f} Metric float64
3 Mobile final URL Float {:.2f} Metric float64
4 Labels String {} Attribute object
5 Impressions Integer {:,.0f} Metric int64
6 Clicks Integer {:,.0f} Metric int64
7 CTR Percent {:.2f}% Metric float64
8 Avg. CPC Currency ${:,.2f} Metric float64
9 Cost Float {:.2f} Metric float64
10 Avg. position Float {:.2f} Metric float64
df2
before and after comparison:
比较前后的df2:
Display Name Type Format Behavior Datatype
0 True True True True True
1 True True True True True
2 True True True True True
3 True True True True True
4 True True True True True
5 True True True True True
6 True True True True True
7 True False False False False
8 True False False True True
9 True True True True True
10 True True True True True