I have to update my Doctrine entities to match records inside of (potentionaly very large) XML file. I have to also update ManyToMany associations according to data in the XML. This is what I do inside of a loop:
我必须更新我的Doctrine实体以匹配(potentionaly very large)XML文件中的记录。我还必须根据XML中的数据更新ManyToMany关联。这就是我在循环中所做的事情:
- get data from XML
- 从XML获取数据
- get entity from DB (if does not exist create new)
- 从DB获取实体(如果不存在则创建新)
- set new entity properties
- 设置新的实体属性
- get current entity associations (getter returns
ArrayCollection
object) - 获取当前实体关联(getter返回ArrayCollection对象)
- clear all associations (by calling the
ArrayCollection::clear()
) - 清除所有关联(通过调用ArrayCollection :: clear())
- set new associations (by calling
ArrayCollection::add()
in sub-loop) - 设置新关联(通过在子循环中调用ArrayCollection :: add())
- persist entity by EntityManager
- 由EntityManager持久化实体
After the loop I call EntityManager::flush()
.
在循环之后,我调用EntityManager :: flush()。
The problem is that flushing generates large amount of queries instead of updating/inserting/deleting multiple rows at once. For every entity are executed following queries:
问题是刷新会产生大量查询,而不是一次更新/插入/删除多行。对于每个实体在查询后执行:
- SELECT to get entity from DB
- SELECT从DB获取实体
- UPDATE to update entity properties (this is actually skipped now as no properties changed ... yet)
- UPDATE更新实体属性(现在实际上已跳过,因为没有更改属性...)
- DELETE to clear previous associations
- 删除以清除以前的关联
- INSERT to insert new associations
- INSERT以插入新关联
So in total for 305 records in XML i get 915 queries (I guess it could go up to 1220 queries if all entities would changed) which makes the import very slow.
因此,对于XML中的305条记录,我总共获得了915条查询(我想如果所有实体都会更改,它可能会达到1220条查询),这使得导入速度非常慢。
I could take advantage of IdentityMap and pre-fetch entities before loop, but there are still the UPDATE/DELETE/INSERT queries.
我可以在循环之前利用IdentityMap和预取实体,但仍然存在UPDATE / DELETE / INSERT查询。
- Is there a way to let the flush method better optimize queries (use multi-insert, WHERE IN instead of multiple DELETE queries, etc.)?
- 有没有办法让flush方法更好地优化查询(使用多插入,WHERE IN而不是多个DELETE查询等)?
- Is this normal behaviour of flush method or am I doing something wrong?
- 这是冲洗方法的正常行为还是我做错了什么?
- Perhaps there is problem in the way how I update the associations of entity. Is there better way how to do this? (instead of "get/clear/add" method)
- 也许我更新实体关联的方式存在问题。有更好的方法如何做到这一点? (而不是“获取/清除/添加”方法)
- I am aware of that Doctrine is not intended for mass betch processing, but I think using it for XML imports is the best way how to avoid DB inconsitencies which could appear with a not-ORM approach. Is that right?
- 我知道Doctrine并不适用于大规模的betch处理,但我认为将它用于XML导入是避免DB notxitence的最佳方法,这种不一致可能出现在非ORM方法中。是对的吗?
- If the approach above is wrong, how should I solve the problem?
- 如果上面的方法是错误的,我应该如何解决问题?
2 个解决方案
#1
30
You're doing it right -- it's just slow, because the added abstraction of the ORM means you can't make the sorts of optimizations you'd like.
你做得对 - 它只是很慢,因为ORM的增加抽象意味着你不能进行你想要的各种优化。
That said, the EntityManager does get slow on transactions that large. If you don't absolutely need them all in one big transaction, you can probably get more performant code by flush()ing and then clear()ing the EM every 20-200 iterations of your loop.
也就是说,EntityManager在大的事务上确实变慢了。如果你在一个大事务中并不是绝对需要它们,你可以通过flush()获得更高性能的代码,然后每20-200次迭代循环清除()EM。
If that doesn't get you enough performance, the only alternative that I can think of is to revert to custom code that runs custom SQL directly against your DBMS.
如果这不能提供足够的性能,我能想到的唯一替代方法是恢复到直接针对您的DBMS运行自定义SQL的自定义代码。
I know this isn't a great answer, but at least I can tell you that you're not crazy.
我知道这不是一个很好的答案,但至少我可以告诉你,你并不疯狂。
------ edit ------
From official Doctrine2 article on Batch processing:
------编辑------关于批处理的官方Doctrine2文章:
Some people seem to be wondering why Doctrine does not use multi-inserts (insert into (...) values (...), (...), (...), ...
有些人似乎想知道为什么Doctrine不使用多插入(插入(...)值(...),(...),(...),...
First of all, this syntax is only supported on mysql and newer postgresql versions. Secondly, there is no easy way to get hold of all the generated identifiers in such a multi-insert when using AUTO_INCREMENT or SERIAL and an ORM needs the identifiers for identity management of the objects. Lastly, insert performance is rarely the bottleneck of an ORM. Normal inserts are more than fast enough for most situations and if you really want to do fast bulk inserts, then a multi-insert is not the best way anyway, i.e. Postgres COPY or Mysql LOAD DATA INFILE are several orders of magnitude faster.
首先,只有mysql和更新的postgresql版本支持此语法。其次,当使用AUTO_INCREMENT或SERIAL并且ORM需要用于对象的身份管理的标识符时,没有简单的方法来获得这样的多插入中的所有生成的标识符。最后,插入性能很少是ORM的瓶颈。正常插入对于大多数情况来说足够快,如果你真的想做快速批量插入,那么多插入不是最好的方法,即Postgres COPY或Mysql LOAD DATA INFILE快几个数量级。
These are the reasons why it is not worth the effort to implement an abstraction that performs multi-inserts on mysql and postgresql in an ORM.
这就是为什么不值得努力实现在ORM中对mysql和postgresql执行多插入的抽象的原因。
Also there is a significant difference in performance when using remote vs local database as overhead of sending each query to remote server is quite large. The overhead is much lower while using local database thanks to transactions and DB optimizations. (e.g. 70sec lowered to 300ms in the case of example in the question)
使用远程与本地数据库时,性能也存在显着差异,因为将每个查询发送到远程服务器的开销非常大。由于事务和数据库优化,使用本地数据库时开销要低得多。 (例如在问题中的示例情况下,70秒降低到300ms)
#2
3
Not sure if this directly answers the question posed by the original poster but hopefully this will help others with Doctrine speed issues when flushing.
不确定这是否直接回答了原始海报提出的问题,但希望这可以帮助其他人在冲洗时解决Doctrine速度问题。
...With regards to flush speed make sure your xdebug profiler is not on.
...关于冲洗速度,请确保您的xdebug探测器未打开。
[php.ini]
; PROFILING
;xdebug.profiler_enable = 1
;xdebug.profiler_output_name = "cachegrind.out.%t.%s.%p"
;xdebug.profiler_output_dir = "C:\xampp\tmp"
As an example of how much this affected a Doctrine flush operation in my case, it was 55 seconds for 3000 records whereas with the profiler turned off it was 5 seconds!
作为一个例子,在我的情况下这对Doctrine刷新操作有多大影响,3000条记录为55秒,而关闭探测器则为5秒!
#1
30
You're doing it right -- it's just slow, because the added abstraction of the ORM means you can't make the sorts of optimizations you'd like.
你做得对 - 它只是很慢,因为ORM的增加抽象意味着你不能进行你想要的各种优化。
That said, the EntityManager does get slow on transactions that large. If you don't absolutely need them all in one big transaction, you can probably get more performant code by flush()ing and then clear()ing the EM every 20-200 iterations of your loop.
也就是说,EntityManager在大的事务上确实变慢了。如果你在一个大事务中并不是绝对需要它们,你可以通过flush()获得更高性能的代码,然后每20-200次迭代循环清除()EM。
If that doesn't get you enough performance, the only alternative that I can think of is to revert to custom code that runs custom SQL directly against your DBMS.
如果这不能提供足够的性能,我能想到的唯一替代方法是恢复到直接针对您的DBMS运行自定义SQL的自定义代码。
I know this isn't a great answer, but at least I can tell you that you're not crazy.
我知道这不是一个很好的答案,但至少我可以告诉你,你并不疯狂。
------ edit ------
From official Doctrine2 article on Batch processing:
------编辑------关于批处理的官方Doctrine2文章:
Some people seem to be wondering why Doctrine does not use multi-inserts (insert into (...) values (...), (...), (...), ...
有些人似乎想知道为什么Doctrine不使用多插入(插入(...)值(...),(...),(...),...
First of all, this syntax is only supported on mysql and newer postgresql versions. Secondly, there is no easy way to get hold of all the generated identifiers in such a multi-insert when using AUTO_INCREMENT or SERIAL and an ORM needs the identifiers for identity management of the objects. Lastly, insert performance is rarely the bottleneck of an ORM. Normal inserts are more than fast enough for most situations and if you really want to do fast bulk inserts, then a multi-insert is not the best way anyway, i.e. Postgres COPY or Mysql LOAD DATA INFILE are several orders of magnitude faster.
首先,只有mysql和更新的postgresql版本支持此语法。其次,当使用AUTO_INCREMENT或SERIAL并且ORM需要用于对象的身份管理的标识符时,没有简单的方法来获得这样的多插入中的所有生成的标识符。最后,插入性能很少是ORM的瓶颈。正常插入对于大多数情况来说足够快,如果你真的想做快速批量插入,那么多插入不是最好的方法,即Postgres COPY或Mysql LOAD DATA INFILE快几个数量级。
These are the reasons why it is not worth the effort to implement an abstraction that performs multi-inserts on mysql and postgresql in an ORM.
这就是为什么不值得努力实现在ORM中对mysql和postgresql执行多插入的抽象的原因。
Also there is a significant difference in performance when using remote vs local database as overhead of sending each query to remote server is quite large. The overhead is much lower while using local database thanks to transactions and DB optimizations. (e.g. 70sec lowered to 300ms in the case of example in the question)
使用远程与本地数据库时,性能也存在显着差异,因为将每个查询发送到远程服务器的开销非常大。由于事务和数据库优化,使用本地数据库时开销要低得多。 (例如在问题中的示例情况下,70秒降低到300ms)
#2
3
Not sure if this directly answers the question posed by the original poster but hopefully this will help others with Doctrine speed issues when flushing.
不确定这是否直接回答了原始海报提出的问题,但希望这可以帮助其他人在冲洗时解决Doctrine速度问题。
...With regards to flush speed make sure your xdebug profiler is not on.
...关于冲洗速度,请确保您的xdebug探测器未打开。
[php.ini]
; PROFILING
;xdebug.profiler_enable = 1
;xdebug.profiler_output_name = "cachegrind.out.%t.%s.%p"
;xdebug.profiler_output_dir = "C:\xampp\tmp"
As an example of how much this affected a Doctrine flush operation in my case, it was 55 seconds for 3000 records whereas with the profiler turned off it was 5 seconds!
作为一个例子,在我的情况下这对Doctrine刷新操作有多大影响,3000条记录为55秒,而关闭探测器则为5秒!