分配到pandas DataFrames的切片

时间:2022-08-26 18:42:35

I am trying to work out an effective date for any given date. The dataframe has a column which is populated by the BMonthEnd (last business day of month taking into account holidays - calculated by code not shown here)

我正在努力确定任何特定日期的生效日期。数据框有一个由BMonthEnd填充的列(考虑到假期的月份的最后一个工作日 - 由此处未显示的代码计算)

the partial dataframe shown below has the EffectiveDate equal to the Date as 1st step

下面显示的部分数据框的EffectiveDate等于作为第1步的日期

            Date        BMonthEnd   EffectiveDate
2014-08-24  2014-08-24  2014-08-29  2014-08-24
2014-08-25  2014-08-25  2014-08-29  2014-08-25
2014-08-26  2014-08-26  2014-08-29  2014-08-26
2014-08-27  2014-08-27  2014-08-29  2014-08-27
2014-08-28  2014-08-28  2014-08-29  2014-08-28
2014-08-29  2014-08-29  2014-08-29  2014-08-29
2014-08-30  2014-08-30  2014-08-29  2014-08-30
2014-08-31  2014-08-31  2014-08-29  2014-08-31

I now try to select out the data that need to be changed with:

我现在尝试选择需要更改的数据:

df[~(df.Date<df.BMonthEnd)].EffectiveDate  # giving the expected slice
# but 
df[~(df.Date<df.BMonthEnd)].EffectiveDate = 1
# gives error

SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
self[name] = value

following the warning i tried the alternate method i tried:

在警告之后,我尝试了我尝试的替代方法:

df.loc[~(df.Date<df.BMonthEnd)].EffectiveDate = 1

this also gives the same error. (note the 1 used in assignment is just placeholder for another function) and the assignment does not reflect on the original dataframe. I understand that I am effectively assigning to a copy so that it does not change the original dataframe as intended.

这也会产生同样的错误。 (注意赋值中使用的1只是另一个函数的占位符),赋值不反映原始数据帧。我理解我正在有效地分配副本,以便它不会按预期更改原始数据帧。

How do I however achieve my goal of using the selecting syntax to assign. I really do not want to have to iterate over the dataframe.

但是,我如何实现使用选择语法分配的目标。我真的不想迭代数据帧。

1 个解决方案

#1


0  

Figured it out. Selecting out the Series in the Dataframe effectively allows me to assign to it and the original dataframe. this allows me to use the slicing syntac to apply logic influencing the results:

弄清楚了。在数据框中选择系列有效地允许我分配它和原始数据帧。这允许我使用切片语法来应用影响结果的逻辑:

# not all methods, classes shown
def effective_date(dr):
    df = pd.DataFrame(dr, index=dr, columns=['Date'])
    df['BMonthEnd'] = df.Date.apply(h.last_business_day)
    df['MonthEnd'] = df.Date.apply(h.month_end)
    df['EffectiveDate'] = df.Date
    # df.EffectiveDate[~(df.Date<df.BMonthEnd)] = df.MonthEnd
    df.loc[~(df.Date<df.BMonthEnd),'EffectiveDate'] = df.MonthEnd
    return df.EffectiveDate

Have Updated it with Jeff's suggestion. See now why chain indexing can get you into trouble. Have done a few timeits and they seem to be faster, but when assigning to the dataframe .loc is the better option.

用杰夫的建议更新了它。现在看看为什么链索引会让你陷入困境。做了一些时间并且它们看起来更快,但是在分配到数据帧时.loc是更好的选择。

#1


0  

Figured it out. Selecting out the Series in the Dataframe effectively allows me to assign to it and the original dataframe. this allows me to use the slicing syntac to apply logic influencing the results:

弄清楚了。在数据框中选择系列有效地允许我分配它和原始数据帧。这允许我使用切片语法来应用影响结果的逻辑:

# not all methods, classes shown
def effective_date(dr):
    df = pd.DataFrame(dr, index=dr, columns=['Date'])
    df['BMonthEnd'] = df.Date.apply(h.last_business_day)
    df['MonthEnd'] = df.Date.apply(h.month_end)
    df['EffectiveDate'] = df.Date
    # df.EffectiveDate[~(df.Date<df.BMonthEnd)] = df.MonthEnd
    df.loc[~(df.Date<df.BMonthEnd),'EffectiveDate'] = df.MonthEnd
    return df.EffectiveDate

Have Updated it with Jeff's suggestion. See now why chain indexing can get you into trouble. Have done a few timeits and they seem to be faster, but when assigning to the dataframe .loc is the better option.

用杰夫的建议更新了它。现在看看为什么链索引会让你陷入困境。做了一些时间并且它们看起来更快,但是在分配到数据帧时.loc是更好的选择。