用于更新pandas数据帧的SQL值

时间:2021-04-29 00:01:48

i am doing a lot of sql to pandas and i have run in to the following challenge.

我正在为熊猫做很多sql,我已经遇到了以下挑战。

I have a dataframe, that looks like

我有一个数据帧,看起来像

UserID, AccountNo, AccountName
123,    12345,     'Some name'
...

What i would like to do is for each account number, i would like to add a column called total revenue which is gotten from a mysql database, som i am thinking of something like,

我想要做的是每个帐号,我想添加一个名为总收入的列,这是从一个mysql数据库获得的,我想的是,我想的是,

for accountno in df['AccountNo']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)

And i need to expand the the dataframe such that

我需要扩展数据帧

UserID, AccountNo, AccountName, TotalRevenue
123,    12345,     'Some name', df1
...

The code that i have so far (and is not working casts a getitem error)

到目前为止我所拥有的代码(并且不起作用会导致getitem错误)

sets3 = []
i=0
for accountno in df5['kna1_kunnr']:
    df1 = pd.read_sql(('select sum(VBRK_NETWR) as sum from sapdata2016.orders where VBAK_BSARK="ZEDI" and VBRK_KUNAG = %s;') % accountno, conn)
    df2 = pd.DataFrame([(df5['userid'][i], df5['kna1_kunnr'][i], accountno, df5['kna1_name1'][i], df1['sum'][0])], columns=['User ID', 'AccountNo', 'tjeck', 'AccountName', 'Revenue'])
    sets3.append(df2)
    i += 1

df6 = pd.concat(sets3)

This idea/code is not pretty, and i wonder if there is a better/nicer way to do it, any ideas?

这个想法/代码并不漂亮,我想知道是否有更好/更好的方法来做到这一点,任何想法?

1 个解决方案

#1


1  

Consider exporting pandas data to MySQL as a temp table then run an SQL query that joins your pandas data and an aggregate query for TotalRevenue. Then, read resultset into pandas dataframe. This approach avoids any looping.

考虑将pandas数据作为临时表导出到MySQL,然后运行一个SQL查询,该查询连接您的pandas数据和TotalRevenue的聚合查询。然后,将结果集读入pandas数据帧。这种方法避免了任何循环。

from sqlalchemy import create_engine
...

# SQL ALCHEMY CONNECTION (PREFERRED OVER RAW CONNECTION)
engine = create_engine('mysql://user:pwd@localhost/database')
# engine = create_engine("mysql+pymysql://user:pwd@hostname:port/database") # load pymysql

df1.to_sql("mypandastemptable", con=engine, if_exists='replace')

sql = """SELECT t.UserID, t.AccountNo, t.AccountName, agg.TotalRevenue
         FROM mypandastemptable t
         LEFT JOIN 
            (SELECT VBRK_KUNAG as AccountNo
                    SUM(VBRK_NETWR) as TotalRevenue
             FROM sapdata2016.orders 
             WHERE VBAK_BSARK='ZEDI'
             GROUP BY VBRK_KUNAG) agg
         ON t.AccountNo = agg.AccountNo) 
"""

newdf = pd.read_sql(sql, con=engine)

Of course the converse is true as well, merging on two pandas dataframes of existing dataframe and the grouped aggregate query resultset:

当然,反过来也是正确的,合并现有数据帧的两个pandas数据帧和分组聚合查询结果集:

sql = """SELECT VBRK_KUNAG as AccountNo
                SUM(VBRK_NETWR) as TotalRevenue
         FROM sapdata2016.orders 
         WHERE VBAK_BSARK='ZEDI'
         GROUP BY VBRK_KUNAG 
"""

df2 = pd.read_sql(sql, con=engine)

newdf = df1.merge(df2, on='AccountNo', how='left')

#1


1  

Consider exporting pandas data to MySQL as a temp table then run an SQL query that joins your pandas data and an aggregate query for TotalRevenue. Then, read resultset into pandas dataframe. This approach avoids any looping.

考虑将pandas数据作为临时表导出到MySQL,然后运行一个SQL查询,该查询连接您的pandas数据和TotalRevenue的聚合查询。然后,将结果集读入pandas数据帧。这种方法避免了任何循环。

from sqlalchemy import create_engine
...

# SQL ALCHEMY CONNECTION (PREFERRED OVER RAW CONNECTION)
engine = create_engine('mysql://user:pwd@localhost/database')
# engine = create_engine("mysql+pymysql://user:pwd@hostname:port/database") # load pymysql

df1.to_sql("mypandastemptable", con=engine, if_exists='replace')

sql = """SELECT t.UserID, t.AccountNo, t.AccountName, agg.TotalRevenue
         FROM mypandastemptable t
         LEFT JOIN 
            (SELECT VBRK_KUNAG as AccountNo
                    SUM(VBRK_NETWR) as TotalRevenue
             FROM sapdata2016.orders 
             WHERE VBAK_BSARK='ZEDI'
             GROUP BY VBRK_KUNAG) agg
         ON t.AccountNo = agg.AccountNo) 
"""

newdf = pd.read_sql(sql, con=engine)

Of course the converse is true as well, merging on two pandas dataframes of existing dataframe and the grouped aggregate query resultset:

当然,反过来也是正确的,合并现有数据帧的两个pandas数据帧和分组聚合查询结果集:

sql = """SELECT VBRK_KUNAG as AccountNo
                SUM(VBRK_NETWR) as TotalRevenue
         FROM sapdata2016.orders 
         WHERE VBAK_BSARK='ZEDI'
         GROUP BY VBRK_KUNAG 
"""

df2 = pd.read_sql(sql, con=engine)

newdf = df1.merge(df2, on='AccountNo', how='left')