Pandas:使用多列的函数

时间:2021-03-13 22:34:36

My dataframe df1:

我的数据帧df1:

date, country, category, score, value
2017-01-01, US, 123, 555, 232.02
2017-01-01, US, 223, 10, 22.02

I have a lookup dataframe df2:

我有一个查找数据帧df2:

category, factor_score_0_100, factor_score_101_500, factor_score_501_1000
123, 2.0, 3.0, 4.0
223, 5.4, 4.3, 3.2

Based on the category and score of a row in df1, I need to get the factor_score from df2. If the score in df1 for a particular category is between 0 and 100, I need to return factor_score_0_100 for that category and so on.

基于df1中行的类别和分数,我需要从df2获取factor_score。如果特定类别的df1得分在0到100之间,我需要为该类别返回factor_score_0_100,依此类推。

So far I've been able to convert df2 into a dictionary of the form

到目前为止,我已经能够将df2转换为表单的字典

category: [factor_score_0_100, factor_score_101_500, factor_score_501_1000]

And I was attempting to write a function and then apply it via a lambda, but I'm not sure how to use 2 columns as an input.

我试图编写一个函数,然后通过lambda应用它,但我不知道如何使用2列作为输入。

How can I proceed here? TIA

我该怎么办? TIA

1 个解决方案

#1


0  

A little bit hack to get that using IntervalIndex + lookup

使用IntervalIndex +查找有点破解

df2=df2.set_index('category')
df2.columns=df2.columns.str.split('_',expand=True)
idx=pd.IntervalIndex.from_arrays(df2.columns.get_level_values(2).astype(int),df2.columns.get_level_values(3).astype(int),closed='both')
df2.columns=idx

df2.lookup(df1[' category'],df1[' score'])
Out[171]: array([4. , 5.4])

After assign it back

分配后

df1['NEW']=df2.lookup(df1[' category'],df1[' score'])
df1
Out[173]: 
         date  country   category   score   value  NEW
0  2017-01-01       US        123     555  232.02  4.0
1  2017-01-01       US        223      10   22.02  5.4

#1


0  

A little bit hack to get that using IntervalIndex + lookup

使用IntervalIndex +查找有点破解

df2=df2.set_index('category')
df2.columns=df2.columns.str.split('_',expand=True)
idx=pd.IntervalIndex.from_arrays(df2.columns.get_level_values(2).astype(int),df2.columns.get_level_values(3).astype(int),closed='both')
df2.columns=idx

df2.lookup(df1[' category'],df1[' score'])
Out[171]: array([4. , 5.4])

After assign it back

分配后

df1['NEW']=df2.lookup(df1[' category'],df1[' score'])
df1
Out[173]: 
         date  country   category   score   value  NEW
0  2017-01-01       US        123     555  232.02  4.0
1  2017-01-01       US        223      10   22.02  5.4