Pandas将来自一个数据帧的月度数据与另一个数据帧中的每日数据合并

I have a csv file (say A.csv) with an index from 1980-01-01 to 2018-02-28 (increased by a day) and one data column (say everyday's stock price).

我有一个csv文件(比如A.csv),其索引从1980-01-01到2018-02-28(增加了一天)和一个数据列(比如每天的股票价格)。

I have another csv file (say B.csv) with an index from 1980-01 to 2018-02 (increased by a month) and one data colum (say monthly trade balance).

我有另一个csv文件(比如B.csv),其索引从1980-01到2018-02(增加一个月)和一个数据列(比如月贸易余额)。

In such case, how do merge B.csv to A.csv (by maintaining daily index)? i.e., daily index + one columm for daily stock price + another column for monthly trade balance (I need to expand monthly trade balance to daily trade balance by maintaining the same trade balance values for each days in a month).

在这种情况下,如何将B.csv合并到A.csv(通过维护每日索引)?即每日指数+每日股票价格的一个柱子+每月交易余额的另一列(我需要通过维持一个月中每天的相同贸易平衡值来将月度贸易余额扩大到每日贸易平衡)。

1 个解决方案

#1

You can do this with pandas.

你可以用熊猫做到这一点。

One way to do this is to convert both date columns to datetime objects, and use pd.Series.map to perform the mapping from one table to the other.

一种方法是将两个日期列转换为datetime对象,并使用pd.Series.map执行从一个表到另一个表的映射。

Since day is not specified for your monthly data, for our mapping we normalise to the first day of the month.

由于没有为您的月度数据指定日期,因此对于我们的映射,我们将标准化为该月的第一天。

import pandas as pd

# first read in the 2 tables into dataframes
# df_daily = pd.read_csv('daily.csv')
# df_monthly = pd.read_csv('monthly.csv')

df_daily = pd.DataFrame({'Date': ['1980-01-01', '1980-01-02', '1980-01-03'],
                         'Value': [1, 2, 3]})

df_monthly = pd.DataFrame({'Month': ['1979-12', '1980-01', '1980-03'],
                           'Value': [100, 200, 300]})

# convert to datetime objects
df_daily['Date'] = pd.to_datetime(df_daily['Date'])
df_monthly['Month'] = pd.to_datetime(df_monthly['Month']+'-01')

# perform mapping after normalising to first day of month
df_daily['MonthValue'] = df_daily['Date'].map(lambda x: x.replace(day=1))\
                                         .map(df_monthly.set_index('Month')['Value'])

#         Date  Value  MonthValue
# 0 1980-01-01      1         200
# 1 1980-01-02      2         200
# 2 1980-01-03      3         200

#1