I have a lot of Excel plains
and I load them using pandas
, process the data and as an output it writes all data in a Excel plain
that is my "database".
我有很多Excel平原,我使用熊猫加载它们,处理数据,作为输出,它将所有数据写入Excel平原,这是我的“数据库”。
The Database has to follow a pattern in the date index, e.g. 2017-01-01 (yyyy-mm-dd)
, 2017-01-02, 2017-01-03 ... 2017-12-31 ... and so on.
数据库必须遵循日期索引中的模式,例如:2017-01-01 (yyyyy -mm-dd), 2017-01-02, 2017-01-03…2017-12-31……等等。
But the plains that are my inputs do not follow a rule with the date. My processing deals with it and do the correctly match with the input plain and output database indexes creating a new file: pd.to_excel('database\databaseFinal.xlsx')
. My problem is adding new values to the existing database and still process the indexes to respect the pattern.
但我输入的平原并不遵循日期的规则。我的处理会处理它,并与创建新文件的输入纯数据库和输出数据库索引进行正确匹配:pd.to_excel('database\databaseFinal.xlsx')。我的问题是向现有数据库添加新值,并仍然处理索引以尊重模式。
for example:
例如:
DATABASE.xlsx:
DATABASE.xlsx:
date Name1 Name2
2017-01-01 23.2 18.4
2017-01-02 21.5 27.7
2017-01-03 0 0
2017-01-04 0 0
plain input to update the database:
用于更新数据库的纯输入:
date Name1
2017-01-04 32.5
process data... after merging data:
处理数据……合并后的数据:
date Name1_x Name2 Name1_y
2017-01-01 23.2 18.4 0
2017-01-02 21.5 27.7 0
2017-01-03 0 0 0
2017-01-04 0 0 32.5
What I want:
我想要的:
date Name1 Name2
2017-01-01 23.2 18.4
2017-01-02 21.5 27.7
2017-01-03 0 0
2017-01-04 32.5 0
In this problem I must have as output an excel file
. I know that must be an easy and efficient way of dealing with this, but I dont want to my work was in vain
在这个问题中,我必须输出一个excel文件。我知道这是一个简单有效的方法来处理这个问题,但是我不想我的工作是徒劳的
2 个解决方案
#1
1
# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]])
df.columns = ["date","Name1","Name2"]
df.index = df["date"]
df = df.drop("date",axis=1)
# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5
#2
1
Instead of using merge you can simple append and fill the NAN values with zero.
不用合并,您可以简单的追加和填充NAN值为零。
df1
date Name1 Name2
0 2017-01-01 23.2 18.4
1 2017-01-02 21.5 27.7
2 2017-01-03 0.0 0.0
3 2017-01-04 0.0 0.0
df2
date Name1
0 2017-01-04 32.5
df1.append(df2).fillna(0)
Name1 Name2 date
0 23.2 18.4 2017-01-01
1 21.5 27.7 2017-01-02
2 0.0 0.0 2017-01-03
3 0.0 0.0 2017-01-04
0 32.5 0.0 2017-01-04
If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:
如果您总是想保留第二个dataframe中的值,可以使用drop_duplicate,并将日期作为子集:
df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
Name1 Name2 date
0 23.2 18.4 2017-01-01
1 21.5 27.7 2017-01-02
2 0.0 0.0 2017-01-03
0 32.5 0.0 2017-01-04
#1
1
# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]])
df.columns = ["date","Name1","Name2"]
df.index = df["date"]
df = df.drop("date",axis=1)
# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5
#2
1
Instead of using merge you can simple append and fill the NAN values with zero.
不用合并,您可以简单的追加和填充NAN值为零。
df1
date Name1 Name2
0 2017-01-01 23.2 18.4
1 2017-01-02 21.5 27.7
2 2017-01-03 0.0 0.0
3 2017-01-04 0.0 0.0
df2
date Name1
0 2017-01-04 32.5
df1.append(df2).fillna(0)
Name1 Name2 date
0 23.2 18.4 2017-01-01
1 21.5 27.7 2017-01-02
2 0.0 0.0 2017-01-03
3 0.0 0.0 2017-01-04
0 32.5 0.0 2017-01-04
If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:
如果您总是想保留第二个dataframe中的值,可以使用drop_duplicate,并将日期作为子集:
df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
Name1 Name2 date
0 23.2 18.4 2017-01-01
1 21.5 27.7 2017-01-02
2 0.0 0.0 2017-01-03
0 32.5 0.0 2017-01-04