A few days ago, I ran into an unexpected performance problem with a pretty standard Django setup. For an upcoming feature, we have to regenerate a table hourly, containing about 100k rows of data, 9M on the disk, 10M indexes according to pgAdmin.
几天前,我在一个相当标准的Django设置中遇到了一个意外的性能问题。对于即将推出的特性,我们必须每小时重新生成一个表,其中包含大约100k行数据、9米磁盘、10米索引。
The problem is that inserting them by whatever method literally takes ages, up to 3 minutes of 100% disk busy time. That's not something you want on a production site. It doesn't matter if the inserts were in a transaction, issued via plain insert, multi-row insert, COPY FROM or even INSERT INTO t1 SELECT * FROM t2.
问题是,无论使用什么方法插入它们,都需要很长时间,最多3分钟的100%磁盘繁忙时间。这不是你想在生产现场看到的。插入是否在事务中(通过普通插入、多行插入、从t2复制甚至插入到t1 SELECT *中)都没有关系。
After noticing this isn't Django's fault, I followed a trial and error route, and hey, the problem disappeared after dropping all foreign keys! Instead of 3 minutes, the INSERT INTO SELECT FROM took less than a second to execute, which isn't too surprising for a table <= 20M on the disk. What is weird is that PostgreSQL manages to slow down inserts by 180x just by using 3 foreign keys.
在注意到这不是Django的错误之后,我遵循了一个尝试和错误的路线,嘿,问题在丢失所有的外键之后消失了!而不是3分钟,INSERT INTO SELECT从用时不到1秒执行,这对于一个表<= 20M的磁盘来说并不奇怪。奇怪的是,PostgreSQL仅通过使用3个外键就将插入速度降低了180x。
Oh, disk activity was pure writing, as everything is cached in RAM; only writes go to the disks. It looks like PostgreSQL is working very hard to touch every row in the referred tables, as 3MB/sec * 180s is way more data than the 20MB this new table takes on disk. No WAL for the 180s case, I was testing in psql directly, in Django, add ~50% overhead for WAL logging. Tried @commit_on_success, same slowness, I had even implemented multi row insert and COPY FROM with psycopg2. That's another weird thing, how can 10M worth of inserts generate > 10x 16M log segments?
噢,磁盘活动是纯写的,因为所有东西都缓存在RAM中;只写到磁盘。看起来PostgreSQL非常努力地处理引用表中的每一行,因为3MB/sec * 180比20MB的数据要多得多。对于180年代的情况,我没有使用WAL,而是直接在psql中测试,在Django中,为WAL日志记录增加了大约50%的开销。尝试@commit_on_success,同样缓慢,我甚至用psycopg2实现了多行插入和复制。这是另一个奇怪的事情,价值10米的插入如何生成> 10x 16M的日志段?
Table layout: id serial primary, a bunch of int32, 3 foreign keys to
表布局:id串行为主,一串int32, 3个外键。
- small table, 198 rows, 16k on disk
- 小表,198行,16k磁盘
- large table, 1.2M rows, 59 data + 89 index MB on disk
- 大表,1.2M行,59数据+ 89索引MB在磁盘上
- large table, 2.2M rows, 198 + 210MB
- 大表,220万行,198 + 210MB。
So, am I doomed to either drop the foreign keys manually or use the table in a very un-Django way by defining saving bla_id x3 and skip using models.ForeignKey? I'd love to hear about some magical antidote / pg setting to fix this.
那么,我注定要么手动删除外键,要么使用非django的方式,定义save bla_id x3并使用model . foreignkey ?我很想听一些神奇的解药/ pg设置来解决这个问题。
4 个解决方案
#1
2
100.000 FK checks should take about 2-5 seconds if it doesn't have to wait for IO reads. Much slower than inserting into the table, but much faster than the time you got.
如果不需要等待IO读取,100.000次FK检查大约需要2-5秒。比插入到表中要慢得多,但是比插入表的时间快得多。
Check that all your foreign keys are INDEXED :
检查所有外键是否被索引:
(I'm talking about an index on the referenced column, not the referencing column, got it ?)
(我说的是引用列上的索引,而不是引用列,明白了吗?)
If products.category_id REFERENCES category(id), and there is no index on category.id, every time it needs to check a FK it will have to scan the table.
如果产品。category_id引用类别(id),并且没有关于类别的索引。id,每次它需要检查一个FK它就必须扫描表格。
To find which isn't, do your insert with 1 FK, then 2 FKs... you'll find which one is responsible.
要发现哪个不是,用1 FK做插入,然后2个FKs…你会发现谁该负责。
And yes, if you truncate the table, it's faster to also drop all constraints and indexes and rebuild them after bulk insertion.
是的,如果您截断该表,那么在批量插入之后也可以更快地删除所有约束和索引并重新构建它们。
#2
0
This seems like normal behavior to me. When bulk inserting into a database, if the table has indexes, foreign keys or triggers, they have to be checked row-by-row. So typically you want to drop them, perform the inserts (using copy if possible), and then recreate indexes, FKs and triggers.
这在我看来是正常的行为。当批量插入数据库时,如果表具有索引、外键或触发器,则必须逐行检查它们。因此,通常您想要删除它们,执行插入(如果可能的话),然后重新创建索引、FKs和触发器。
This page on the docs has more details about autocommit, maintenance_work_mem and checkpoint_segments that you can tune: http://www.postgresql.org/docs/8.4/interactive/populate.html
文档的这个页面有更多关于自动提交、maintenance_work_mem和checkpoint_segment的细节,您可以对这些部分进行优化:http://www.postgresql.org/docs/8.4/interactive/populate.html
#3
0
Maybe you have a trigger on your table, you do not know of or remember, that fires on every row inserted/deleted. Can you connect to a database using "psql"? If yes, then analyze the output of "\d+ table_name" for all your tables.
可能您的表上有一个触发器,您不知道或不记得,它会在插入/删除的每一行上触发。可以使用“psql”连接到数据库吗?如果是,那么分析所有表的“\d+ table_name”的输出。
You can also dump your database, do import, dump a database again. Compare dumps to check if any other table contents has changed.
您还可以转储数据库,导入,再转储数据库。比较转储文件,检查是否有其他表内容发生了更改。
#4
0
I had forgotten that EXPLAIN ANALYZE INSERT INTO bleh ... will show you the timing of all insert triggers.
我忘了解释分析插入到bleh…将显示所有插入触发器的时间。
#1
2
100.000 FK checks should take about 2-5 seconds if it doesn't have to wait for IO reads. Much slower than inserting into the table, but much faster than the time you got.
如果不需要等待IO读取,100.000次FK检查大约需要2-5秒。比插入到表中要慢得多,但是比插入表的时间快得多。
Check that all your foreign keys are INDEXED :
检查所有外键是否被索引:
(I'm talking about an index on the referenced column, not the referencing column, got it ?)
(我说的是引用列上的索引,而不是引用列,明白了吗?)
If products.category_id REFERENCES category(id), and there is no index on category.id, every time it needs to check a FK it will have to scan the table.
如果产品。category_id引用类别(id),并且没有关于类别的索引。id,每次它需要检查一个FK它就必须扫描表格。
To find which isn't, do your insert with 1 FK, then 2 FKs... you'll find which one is responsible.
要发现哪个不是,用1 FK做插入,然后2个FKs…你会发现谁该负责。
And yes, if you truncate the table, it's faster to also drop all constraints and indexes and rebuild them after bulk insertion.
是的,如果您截断该表,那么在批量插入之后也可以更快地删除所有约束和索引并重新构建它们。
#2
0
This seems like normal behavior to me. When bulk inserting into a database, if the table has indexes, foreign keys or triggers, they have to be checked row-by-row. So typically you want to drop them, perform the inserts (using copy if possible), and then recreate indexes, FKs and triggers.
这在我看来是正常的行为。当批量插入数据库时,如果表具有索引、外键或触发器,则必须逐行检查它们。因此,通常您想要删除它们,执行插入(如果可能的话),然后重新创建索引、FKs和触发器。
This page on the docs has more details about autocommit, maintenance_work_mem and checkpoint_segments that you can tune: http://www.postgresql.org/docs/8.4/interactive/populate.html
文档的这个页面有更多关于自动提交、maintenance_work_mem和checkpoint_segment的细节,您可以对这些部分进行优化:http://www.postgresql.org/docs/8.4/interactive/populate.html
#3
0
Maybe you have a trigger on your table, you do not know of or remember, that fires on every row inserted/deleted. Can you connect to a database using "psql"? If yes, then analyze the output of "\d+ table_name" for all your tables.
可能您的表上有一个触发器,您不知道或不记得,它会在插入/删除的每一行上触发。可以使用“psql”连接到数据库吗?如果是,那么分析所有表的“\d+ table_name”的输出。
You can also dump your database, do import, dump a database again. Compare dumps to check if any other table contents has changed.
您还可以转储数据库,导入,再转储数据库。比较转储文件,检查是否有其他表内容发生了更改。
#4
0
I had forgotten that EXPLAIN ANALYZE INSERT INTO bleh ... will show you the timing of all insert triggers.
我忘了解释分析插入到bleh…将显示所有插入触发器的时间。