熊猫数据库:添加新数据。

时间:2021-12-07 07:10:39

I have a lot of Excel plains and I load them using pandas, process the data and as an output it writes all data in a Excel plain that is my "database".

我有很多Excel平原,我使用熊猫加载它们,处理数据,作为输出,它将所有数据写入Excel平原,这是我的“数据库”。

The Database has to follow a pattern in the date index, e.g. 2017-01-01 (yyyy-mm-dd), 2017-01-02, 2017-01-03 ... 2017-12-31 ... and so on.

数据库必须遵循日期索引中的模式,例如:2017-01-01 (yyyyy -mm-dd), 2017-01-02, 2017-01-03…2017-12-31……等等。

But the plains that are my inputs do not follow a rule with the date. My processing deals with it and do the correctly match with the input plain and output database indexes creating a new file: pd.to_excel('database\databaseFinal.xlsx'). My problem is adding new values to the existing database and still process the indexes to respect the pattern.

但我输入的平原并不遵循日期的规则。我的处理会处理它,并与创建新文件的输入纯数据库和输出数据库索引进行正确匹配:pd.to_excel('database\databaseFinal.xlsx')。我的问题是向现有数据库添加新值,并仍然处理索引以尊重模式。

for example:

例如:

DATABASE.xlsx:

DATABASE.xlsx:

    date         Name1  Name2
    2017-01-01   23.2   18.4
    2017-01-02   21.5   27.7
    2017-01-03   0      0
    2017-01-04   0      0

plain input to update the database:

用于更新数据库的纯输入:

    date         Name1  
    2017-01-04   32.5

process data... after merging data:

处理数据……合并后的数据:

    date         Name1_x  Name2  Name1_y
    2017-01-01   23.2     18.4   0
    2017-01-02   21.5     27.7   0
    2017-01-03   0        0      0
    2017-01-04   0        0      32.5

What I want:

我想要的:

    date         Name1  Name2  
    2017-01-01   23.2   18.4  
    2017-01-02   21.5   27.7   
    2017-01-03   0      0      
    2017-01-04   32.5   0     

In this problem I must have as output an excel file. I know that must be an easy and efficient way of dealing with this, but I dont want to my work was in vain

在这个问题中,我必须输出一个excel文件。我知道这是一个简单有效的方法来处理这个问题,但是我不想我的工作是徒劳的

2 个解决方案

#1


1  

# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]]) 
df.columns = ["date","Name1","Name2"] 
df.index = df["date"] 
df = df.drop("date",axis=1)

# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5

#2


1  

Instead of using merge you can simple append and fill the NAN values with zero.

不用合并,您可以简单的追加和填充NAN值为零。

df1
         date  Name1  Name2
0  2017-01-01   23.2   18.4
1  2017-01-02   21.5   27.7
2  2017-01-03    0.0    0.0
3  2017-01-04    0.0    0.0
df2
         date  Name1
0  2017-01-04   32.5

df1.append(df2).fillna(0)
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
3    0.0    0.0  2017-01-04
0   32.5    0.0  2017-01-04

If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:

如果您总是想保留第二个dataframe中的值,可以使用drop_duplicate,并将日期作为子集:

df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
0   32.5    0.0  2017-01-04

#1


1  

# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]]) 
df.columns = ["date","Name1","Name2"] 
df.index = df["date"] 
df = df.drop("date",axis=1)

# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5

#2


1  

Instead of using merge you can simple append and fill the NAN values with zero.

不用合并,您可以简单的追加和填充NAN值为零。

df1
         date  Name1  Name2
0  2017-01-01   23.2   18.4
1  2017-01-02   21.5   27.7
2  2017-01-03    0.0    0.0
3  2017-01-04    0.0    0.0
df2
         date  Name1
0  2017-01-04   32.5

df1.append(df2).fillna(0)
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
3    0.0    0.0  2017-01-04
0   32.5    0.0  2017-01-04

If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:

如果您总是想保留第二个dataframe中的值,可以使用drop_duplicate,并将日期作为子集:

df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
0   32.5    0.0  2017-01-04