python sqlite3,我多久要提交一次?

时间:2020-12-15 22:59:26

I have a for loop that is making many changes to a database with a sqlite manager class I wrote, but I am unsure about how often I have to commit...

我有一个for循环,正在许多变化与一个SQLite管理器类我写了一个数据库,但我不确定我有多久犯...

for i in list:
    c.execute('UPDATE table x=y WHERE foo=bar')
    conn.commit()
    c.execute('UPDATE table x=z+y WHERE foo=bar')
    conn.commit()

Basically my question is whether I have to call commit twice there, or if I can just call it once after I have made both changes?

基本上我的问题是我是否必须在那里调用两次提交,或者我是否可以在完成两次更改之后调用它一次?

1 个解决方案

#1


15  

Whether you call conn.commit() once at the end of the procedure of after every single database change depends on several factors.

是否在每次数据库更改后的过程结束时调用conn.commit()一次取决于几个因素。

What concurrent readers see

This is what everybody thinks of at first sight: When a change to the database is committed, it becomes visible for other connections. Unless it is committed, it remains visible only locally for the connection to which the change was done. Because of the limited concurrency features of sqlite, the database can only be read while a transaction is open.

这是每个人第一眼就想到的:当提交对数据库的更改时,对于其他连接,它变得可见。除非它已提交,否则它仅在本地可见,用于进行更改的连接。由于sqlite的并发功能有限,因此只能在事务打开时读取数据库。

You can investigate what happens by running the following script and investigating its output:

您可以通过运行以下脚本并调查其输出来调查发生的情况:

import os
import sqlite3

_DBPATH = "./q6996603.sqlite"

def fresh_db():
    if os.path.isfile(_DBPATH):
        os.remove(_DBPATH)
    with sqlite3.connect(_DBPATH) as conn:
        cur = conn.cursor().executescript("""
            CREATE TABLE "mytable" (
                "id" INTEGER PRIMARY KEY AUTOINCREMENT, -- rowid
                "data" INTEGER
            );
            """)
    print "created %s" % _DBPATH

# functions are syntactic sugar only and use global conn, cur, rowid

def select():
    sql = 'select * from "mytable"'
    rows = cur.execute(sql).fetchall()
    print "   same connection sees", rows
    # simulate another script accessing tha database concurrently
    with sqlite3.connect(_DBPATH) as conn2:
        rows = conn2.cursor().execute(sql).fetchall()
    print "   other connection sees", rows

def count():
    print "counting up"
    cur.execute('update "mytable" set data = data + 1 where "id" = ?', (rowid,))

def commit():
    print "commit"
    conn.commit()

# now the script
fresh_db()
with sqlite3.connect(_DBPATH) as conn:
    print "--- prepare test case"
    sql = 'insert into "mytable"(data) values(17)'
    print sql
    cur = conn.cursor().execute(sql)
    rowid = cur.lastrowid
    print "rowid =", rowid
    commit()
    select()
    print "--- two consecutive w/o commit"
    count()
    select()
    count()
    select()
    commit()
    select()
    print "--- two consecutive with commit"
    count()
    select()
    commit()
    select()
    count()
    select()
    commit()
    select()

Output:

$ python try.py 
created ./q6996603.sqlite
--- prepare test case
insert into "mytable"(data) values(17)
rowid = 1
commit
   same connection sees [(1, 17)]
   other connection sees [(1, 17)]
--- two consecutive w/o commit
counting up
   same connection sees [(1, 18)]
   other connection sees [(1, 17)]
counting up
   same connection sees [(1, 19)]
   other connection sees [(1, 17)]
commit
   same connection sees [(1, 19)]
   other connection sees [(1, 19)]
--- two consecutive with commit
counting up
   same connection sees [(1, 20)]
   other connection sees [(1, 19)]
commit
   same connection sees [(1, 20)]
   other connection sees [(1, 20)]
counting up
   same connection sees [(1, 21)]
   other connection sees [(1, 20)]
commit
   same connection sees [(1, 21)]
   other connection sees [(1, 21)]
$

So it depends whether you can live with the situation that a cuncurrent reader, be it in the same script or in another program, will be off by two at times.

所以这取决于你是否能够忍受这样一种情况:一个当前的读者,无论是在同一个剧本中,还是在另一个程序中,有时会被两个人关闭。

When a large number of changes is to be done, two other aspects enter the scene:

当要进行大量更改时,另外两个方面将进入场景:

Performance

The performance of database changes dramatically depends on how you do them. It is already noted as a FAQ:

数据库更改的性能显着取决于您的操作方式。它已作为常见问题解答:

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. [...]

实际上,SQLite在普通台式计算机上每秒可轻松执行50,000或更多INSERT语句。但它每秒只会进行几十次交易。 [...]

It is absolutely helpful to understand the details here, so do not hesitate to follow the link and dive in. Also see this awsome analysis. It's written in C, but the results would be similar would one do the same in Python.

了解这里的细节绝对有帮助,所以不要犹豫,请关注链接并深入了解。另请参阅此分析。它是用C语言编写的,但结果与Python中的相同。

Note: While both resources refer to INSERT, the situation will be very much the same for UPDATE for the same arguments.

注意:虽然两个资源都引用INSERT,但对于相同的参数,UPDATE的情况将大致相同。

Exclusively locking the database

As already mentioned above, an open (uncommitted) transaction will block changes from concurrent connections. So it makes sense to bundle many changes to the database into a single transaction by executing them and the jointly committing the whole bunch of them.

如上所述,open(未提交)事务将阻止并发连接的更改。因此,通过执行它们并共同提交它们的整个组合,将对数据库的许多更改捆绑到单个事务中是有意义的。

Unfortunately, sometimes, computing the changes may take some time. When concurrent access is an issue you will not want to lock your database for that long. Because it can become rather tricky to collect pending UPDATE and INSERT statements somehow, this will usually leave you with a tradeoff between performance and exclusive locking.

不幸的是,有时,计算这些变化可能需要一些时间。当并发访问是一个问题时,您不希望长时间锁定数据库。因为以某种方式收集挂起的UPDATE和INSERT语句会变得相当棘手,这通常会让您在性能和独占锁定之间进行权衡。

#1


15  

Whether you call conn.commit() once at the end of the procedure of after every single database change depends on several factors.

是否在每次数据库更改后的过程结束时调用conn.commit()一次取决于几个因素。

What concurrent readers see

This is what everybody thinks of at first sight: When a change to the database is committed, it becomes visible for other connections. Unless it is committed, it remains visible only locally for the connection to which the change was done. Because of the limited concurrency features of sqlite, the database can only be read while a transaction is open.

这是每个人第一眼就想到的:当提交对数据库的更改时,对于其他连接,它变得可见。除非它已提交,否则它仅在本地可见,用于进行更改的连接。由于sqlite的并发功能有限,因此只能在事务打开时读取数据库。

You can investigate what happens by running the following script and investigating its output:

您可以通过运行以下脚本并调查其输出来调查发生的情况:

import os
import sqlite3

_DBPATH = "./q6996603.sqlite"

def fresh_db():
    if os.path.isfile(_DBPATH):
        os.remove(_DBPATH)
    with sqlite3.connect(_DBPATH) as conn:
        cur = conn.cursor().executescript("""
            CREATE TABLE "mytable" (
                "id" INTEGER PRIMARY KEY AUTOINCREMENT, -- rowid
                "data" INTEGER
            );
            """)
    print "created %s" % _DBPATH

# functions are syntactic sugar only and use global conn, cur, rowid

def select():
    sql = 'select * from "mytable"'
    rows = cur.execute(sql).fetchall()
    print "   same connection sees", rows
    # simulate another script accessing tha database concurrently
    with sqlite3.connect(_DBPATH) as conn2:
        rows = conn2.cursor().execute(sql).fetchall()
    print "   other connection sees", rows

def count():
    print "counting up"
    cur.execute('update "mytable" set data = data + 1 where "id" = ?', (rowid,))

def commit():
    print "commit"
    conn.commit()

# now the script
fresh_db()
with sqlite3.connect(_DBPATH) as conn:
    print "--- prepare test case"
    sql = 'insert into "mytable"(data) values(17)'
    print sql
    cur = conn.cursor().execute(sql)
    rowid = cur.lastrowid
    print "rowid =", rowid
    commit()
    select()
    print "--- two consecutive w/o commit"
    count()
    select()
    count()
    select()
    commit()
    select()
    print "--- two consecutive with commit"
    count()
    select()
    commit()
    select()
    count()
    select()
    commit()
    select()

Output:

$ python try.py 
created ./q6996603.sqlite
--- prepare test case
insert into "mytable"(data) values(17)
rowid = 1
commit
   same connection sees [(1, 17)]
   other connection sees [(1, 17)]
--- two consecutive w/o commit
counting up
   same connection sees [(1, 18)]
   other connection sees [(1, 17)]
counting up
   same connection sees [(1, 19)]
   other connection sees [(1, 17)]
commit
   same connection sees [(1, 19)]
   other connection sees [(1, 19)]
--- two consecutive with commit
counting up
   same connection sees [(1, 20)]
   other connection sees [(1, 19)]
commit
   same connection sees [(1, 20)]
   other connection sees [(1, 20)]
counting up
   same connection sees [(1, 21)]
   other connection sees [(1, 20)]
commit
   same connection sees [(1, 21)]
   other connection sees [(1, 21)]
$

So it depends whether you can live with the situation that a cuncurrent reader, be it in the same script or in another program, will be off by two at times.

所以这取决于你是否能够忍受这样一种情况:一个当前的读者,无论是在同一个剧本中,还是在另一个程序中,有时会被两个人关闭。

When a large number of changes is to be done, two other aspects enter the scene:

当要进行大量更改时,另外两个方面将进入场景:

Performance

The performance of database changes dramatically depends on how you do them. It is already noted as a FAQ:

数据库更改的性能显着取决于您的操作方式。它已作为常见问题解答:

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. [...]

实际上,SQLite在普通台式计算机上每秒可轻松执行50,000或更多INSERT语句。但它每秒只会进行几十次交易。 [...]

It is absolutely helpful to understand the details here, so do not hesitate to follow the link and dive in. Also see this awsome analysis. It's written in C, but the results would be similar would one do the same in Python.

了解这里的细节绝对有帮助,所以不要犹豫,请关注链接并深入了解。另请参阅此分析。它是用C语言编写的,但结果与Python中的相同。

Note: While both resources refer to INSERT, the situation will be very much the same for UPDATE for the same arguments.

注意:虽然两个资源都引用INSERT,但对于相同的参数,UPDATE的情况将大致相同。

Exclusively locking the database

As already mentioned above, an open (uncommitted) transaction will block changes from concurrent connections. So it makes sense to bundle many changes to the database into a single transaction by executing them and the jointly committing the whole bunch of them.

如上所述,open(未提交)事务将阻止并发连接的更改。因此,通过执行它们并共同提交它们的整个组合,将对数据库的许多更改捆绑到单个事务中是有意义的。

Unfortunately, sometimes, computing the changes may take some time. When concurrent access is an issue you will not want to lock your database for that long. Because it can become rather tricky to collect pending UPDATE and INSERT statements somehow, this will usually leave you with a tradeoff between performance and exclusive locking.

不幸的是,有时,计算这些变化可能需要一些时间。当并发访问是一个问题时,您不希望长时间锁定数据库。因为以某种方式收集挂起的UPDATE和INSERT语句会变得相当棘手,这通常会让您在性能和独占锁定之间进行权衡。