I am retrieving data from a remote web application and I store it in a local sqlite DB. From time to time I perform a commit()
call on the connection, which results in committing about 50 inserts to 8 tables. There are about 1000000 records in most tables. I don't use explicit BEGIN
and END
commands.
我正从远程Web应用程序检索数据,并将其存储在本地sqlite数据库中。我不时在连接上执行commit()调用,这会导致向8个表提交大约50个插入。大多数表中大约有1000000条记录。我不使用显式的BEGIN和END命令。
I measured commit()
call time and got 100-150 ms. But during the commit, my PC freezes for ~5-15 seconds. The INSERT
s themselves are naive (i.e. one execute() call per insert) but are performed fast enough (their rate is limited by the speed of records retrieval, anyways, which is fairly low). I'm using Arch Linux x64 on a PC with AMD FX 6200 CPU, 8 GB RAM and a SATA HDD, Python 3.4.1, sqlite 3.8.4.3.
我测量了commit()调用时间,得到了100-150毫秒。但在提交过程中,我的PC冻结了大约5-15秒。 INSERT本身是天真的(即每次插入一次执行()调用)但执行得足够快(它们的速率受到记录检索速度的限制,无论如何,这相当低)。我在装有AMD FX 6200 CPU,8 GB RAM和SATA HDD,Python 3.4.1,sqlite 3.8.4.3的PC上使用Arch Linux x64。
Does anyone have an idea why this could happen? I guess it has something to do with HDD caching. If so, is there something I could optimize?
有谁知道为什么会发生这种情况?我想这与HDD缓存有关。如果是这样,有什么我可以优化的吗?
UPD: switched to WAL and synchronous=1, no improvements.
UPD:切换到WAL并且同步= 1,没有改进。
UPD2: I have seriously underestimated number of INSERT
s per commit. I measured it using sqlite3.Connection
's total_changes
property, and it appears there are 30000-60000 changes per commit. Is it possible to optimize inserts, or maybe it is about time to switch to postgres?
UPD2:我严重低估了每次提交的INSERT数量。我使用sqlite3.Connection的total_changes属性测量它,看起来每次提交有30000-60000个更改。是否可以优化插入,或者可能是时候切换到postgres了?
1 个解决方案
#1
1
If the call itself is quick enough, as you say, it surely sounds like an IO problem. You could use tools such as iotop
to check this maybe. If possible I would suggest that you divide the inserts into smaller and more frequent ones instead of large chunks. If that is not possible you should consider investing in an SSD disk instead of traditional harddisk, due to normally quicker write speeds.
如果调用本身足够快,正如你所说,它肯定听起来像一个IO问题。您可以使用iotop等工具来检查这个。如果可能的话,我建议你将插入分成更小和更频繁的插入而不是大块。如果不可能,您应该考虑投资SSD磁盘而不是传统的硬盘,因为写入速度通常更快。
There sure could be system parameters to investigate. You should at least make sure you mount your disk with noatime
and nodiratime
flags. You could also try data=writeback
as parameter. See the following for more details:
肯定可以有系统参数来调查。您至少应该确保使用noatime和nodiratime标志安装磁盘。您还可以尝试data = writebackas参数。有关详细信息,请参阅以下内容
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
#1
1
If the call itself is quick enough, as you say, it surely sounds like an IO problem. You could use tools such as iotop
to check this maybe. If possible I would suggest that you divide the inserts into smaller and more frequent ones instead of large chunks. If that is not possible you should consider investing in an SSD disk instead of traditional harddisk, due to normally quicker write speeds.
如果调用本身足够快,正如你所说,它肯定听起来像一个IO问题。您可以使用iotop等工具来检查这个。如果可能的话,我建议你将插入分成更小和更频繁的插入而不是大块。如果不可能,您应该考虑投资SSD磁盘而不是传统的硬盘,因为写入速度通常更快。
There sure could be system parameters to investigate. You should at least make sure you mount your disk with noatime
and nodiratime
flags. You could also try data=writeback
as parameter. See the following for more details:
肯定可以有系统参数来调查。您至少应该确保使用noatime和nodiratime标志安装磁盘。您还可以尝试data = writebackas参数。有关详细信息,请参阅以下内容
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
https://www.kernel.org/doc/Documentation/filesystems/ext4.txt