如何使用Python将批处理插入到Oracle数据库中?

时间:2022-09-27 20:37:46

I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?

我有一些每月的天气数据,我想将它们插入到Oracle数据库表中,但是我想将相应的记录插入到一个批处理中,以提高效率。有人能建议我如何用Python来做这件事吗?

For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.

例如,假设我的表有四个字段:一个站ID、一个日期和两个值字段。记录由站点ID和日期字段(组合键)惟一标识。我必须为每个站点插入的值将保存在一个列表中,其中包含X个完整年份的数据,例如,如果有两年的值,那么值列表将包含24个值。

I assume that below is the way I'd do this if I wanted to insert the records one at a time:

我假设下面的方法是这样的如果我想一次插入一个记录:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
    for j in range(12):
        # make a date for the first day of the month
        date_value = datetime.date(start_year + i, j + 1, 1)
        index = (i * 12) + j
        sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
        cursor.execute(sql_insert)
connection.commit()

Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.

是否有一种方法可以实现我上面所做的,但是这种方法可以执行批量插入以提高效率?顺便说一句,我的经验是使用Java/JDBC/Hibernate,所以如果有人能给出一个与Java方法相比的解释/例子,那将会特别有帮助。

EDIT: Perhaps I need to use cursor.executemany() as described here?

编辑:也许我需要使用这里描述的cursor.executemany() ?

Thanks in advance for any suggestions, comments, etc.

感谢您的建议、意见等。

5 个解决方案

#1


16  

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

以下是我所提出的似乎很有效的方法(但请评论一下是否有方法可以改善这一点):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()

#2


7  

Use Cursor.prepare() and Cursor.executemany().

使用Cursor.prepare()和Cursor.executemany()。

From the cx_Oracle documentation:

从cx_Oracle文档:

Cursor.prepare(statement[, tag])

游标。准备(语句[标记])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

可以在调用execute()来定义将要执行的语句之前使用它。完成此操作后,当对execute()的调用使用None或与语句相同的string对象时,不会执行prepare阶段。[…]

Cursor.executemany(statement, parameters)

游标。executemany(声明,参数)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

准备对数据库执行的语句,然后对序列参数中找到的所有参数映射或序列执行语句。语句的管理方式与execute()方法的管理方式相同。

Thus, using the above two functions, your code becomes:

因此,使用上述两个函数,您的代码变成:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Python series of articles.

还可以参阅Oracle的精通Oracle+Python系列文章。

#3


3  

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

正如其中一条评论所说,考虑使用INSERT ALL。据说它比使用executemany()要快得多。

For example:

例如:

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

http://www.techonthenet.com/oracle/questions/insert_rows.php

#4


2  

I would create a large SQL insert statement using union:

我将使用union创建一个大型SQL insert语句:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

您可以在python中构建字符串,并将其作为一条语句交给oracle执行。

#5


1  

fyi my test result:

通知你我的测试结果:

I insert into 5000 rows. 3 columns per row.

插入到5000行。3列/行。

  1. run insert 5000 times, it costs 1.24 minutes.
  2. 运行插入5000次,花费1.24分钟。
  3. run with executemany, it costs 0.125 seconds.
  4. 使用executemany运行,花费0.125秒。
  5. run with a insert all code: it costs 4.08 minutes.
  6. 使用插入所有代码运行:花费4.08分钟。

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

python代码,它设置sql如插入所有到t(a,b,c)中选择:1,:2,:3从双联盟中选择:4,:5:6从daul…

The python code to setup this long sql, it cost 0.145329 seconds.

python代码设置这么长的sql,花费0.145329秒。

I test my code on a very old sun machine. cpu: 1415 MH.

我在一台非常旧的sun机器上测试代码。cpu:1415 MH。

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

在第三种情况下,我检查了数据库端,等待事件是“SQL*Net more data from client”。这意味着服务器正在等待来自客户端的更多数据。

The result of the third method is unbelievable for me without the test.

第三种方法的结果对我来说是难以置信的。

so the short suggestion from me is just to use executemany.

所以我的简短建议就是使用executemany。

#1


16  

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

以下是我所提出的似乎很有效的方法(但请评论一下是否有方法可以改善这一点):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()

#2


7  

Use Cursor.prepare() and Cursor.executemany().

使用Cursor.prepare()和Cursor.executemany()。

From the cx_Oracle documentation:

从cx_Oracle文档:

Cursor.prepare(statement[, tag])

游标。准备(语句[标记])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

可以在调用execute()来定义将要执行的语句之前使用它。完成此操作后,当对execute()的调用使用None或与语句相同的string对象时,不会执行prepare阶段。[…]

Cursor.executemany(statement, parameters)

游标。executemany(声明,参数)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

准备对数据库执行的语句,然后对序列参数中找到的所有参数映射或序列执行语句。语句的管理方式与execute()方法的管理方式相同。

Thus, using the above two functions, your code becomes:

因此,使用上述两个函数,您的代码变成:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Python series of articles.

还可以参阅Oracle的精通Oracle+Python系列文章。

#3


3  

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

正如其中一条评论所说,考虑使用INSERT ALL。据说它比使用executemany()要快得多。

For example:

例如:

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

http://www.techonthenet.com/oracle/questions/insert_rows.php

#4


2  

I would create a large SQL insert statement using union:

我将使用union创建一个大型SQL insert语句:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

您可以在python中构建字符串,并将其作为一条语句交给oracle执行。

#5


1  

fyi my test result:

通知你我的测试结果:

I insert into 5000 rows. 3 columns per row.

插入到5000行。3列/行。

  1. run insert 5000 times, it costs 1.24 minutes.
  2. 运行插入5000次,花费1.24分钟。
  3. run with executemany, it costs 0.125 seconds.
  4. 使用executemany运行,花费0.125秒。
  5. run with a insert all code: it costs 4.08 minutes.
  6. 使用插入所有代码运行:花费4.08分钟。

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

python代码,它设置sql如插入所有到t(a,b,c)中选择:1,:2,:3从双联盟中选择:4,:5:6从daul…

The python code to setup this long sql, it cost 0.145329 seconds.

python代码设置这么长的sql,花费0.145329秒。

I test my code on a very old sun machine. cpu: 1415 MH.

我在一台非常旧的sun机器上测试代码。cpu:1415 MH。

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

在第三种情况下,我检查了数据库端,等待事件是“SQL*Net more data from client”。这意味着服务器正在等待来自客户端的更多数据。

The result of the third method is unbelievable for me without the test.

第三种方法的结果对我来说是难以置信的。

so the short suggestion from me is just to use executemany.

所以我的简短建议就是使用executemany。