After realizing that an application suffer of the N+1 problem because the ORM, I would like to have more information about the improvements that can be performed and the statistics with the time compared before the improvements (with the N+1 problem) and after.
So what is the time difference before and after such improvements ?
Can anyone give me a link to some paper that analyze the problem and retrieve statisics on that?
在认识到一个应用程序由于ORM而遭受N+1问题之后,我希望获得更多关于可以执行的改进的信息,以及在改进之前(在N+1问题中)和改进之后与时间进行比较的统计信息。那么这些改进前后的时差是多少呢?谁能给我一个关于分析这个问题的论文的链接,并检索关于这个问题的静态学?
2 个解决方案
#1
4
You really don't need statistical data for this, just math. N+1 (or better 1+N) stands for
你真的不需要统计数据,只需要数学。N+1(或更好的1+N)代表
- 1 query to get a record, and
- 查询获取记录,并
- N queries to get all records associated with it
- N个查询以获取与之相关的所有记录
The bigger N is, the more a performance hit this becomes, particularly if your queries are sent across the network to a remote database. That's why N+1 problems keep cropping up in production - they're usually insignificant in development mode with little data in the DB, but as your data grows in production to thousands or millions of rows, your queries will slowly choke your server.
更大的N是,性能越受影响,特别是如果您的查询通过网络发送到远程数据库。这就是为什么N+1问题在生产中不断出现的原因——在开发模式中,在DB中很少有数据,它们通常是无关紧要的,但是当您的数据在生产中增长到数千或数百万行时,您的查询将缓慢地阻塞您的服务器。
You can instead use
您可以使用
- a single query (via a join) or
- 一个查询(通过连接)或
- 2 queries (one for the primary record, one for all associated records
- 查询(一个用于主记录,一个用于所有相关记录
The first query will return more data than strictly needed (the data of the first record will be duplicated in each row), but that's usually a good tradeoff to make. The second query might get a bit cumbersome for large data sets since all foreign keys are passed in as a single range, but again, it's usually a tradeoff worth making.
第一个查询将返回比严格需要的更多的数据(第一个记录的数据将在每一行中重复),但是这通常是一个很好的权衡。对于大型数据集,第二个查询可能会有点麻烦,因为所有外键都作为单个范围传入,但同样,这通常是值得进行的权衡。
The actual numbers depend on too many variables for statistics to be meaningful. Number or records, DB version, hardware etc. etc.
实际的数字依赖于太多的变量,以至于统计数据没有意义。数字或记录,数据库版本,硬件等。
Since you tagged this question with rails, ActiveRecord does a good job avoiding N+1 queries if you know how to use it. Check out the explanation of eager loading.
由于您使用rails标记了这个问题,如果您知道如何使用的话,ActiveRecord在避免N+1查询方面做得很好。查看热切装载的解释。
#2
0
The time difference would depend on how many additional selects were performed because of the N+1 problem. Here's a quote from an answer given to another * question regarding N+1 -
时差将取决于由于N+1问题而执行了多少额外的选择。这是另一个关于N+1 -的*问题的答案
Quote Start
报价开始
SELECT * FROM Cars;
/* for each car */
SELECT * FROM Wheel WHERE CarId = ?
In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.
换句话说,你有一个汽车的选择,然后N个额外的选择,其中N是汽车的总数。
Quote End
报价结束
In the example above the time difference would depend on how many car records were in the database and how long it took to query the 'Wheel' table each time the code/ORM fetched a new record. If you only had 2 car records then the difference after removing the N+1 problem would be negligible, but if you have a million car records then it would have a significant affect.
在上面的示例中,时间差将取决于数据库中有多少汽车记录,以及每次代码/ORM获取新记录时查询“Wheel”表所需的时间。如果你只有2辆车的记录那么去除N+1问题后的差异可以忽略不计,但是如果你有100万辆车的记录那么它就会有显著的影响。
#1
4
You really don't need statistical data for this, just math. N+1 (or better 1+N) stands for
你真的不需要统计数据,只需要数学。N+1(或更好的1+N)代表
- 1 query to get a record, and
- 查询获取记录,并
- N queries to get all records associated with it
- N个查询以获取与之相关的所有记录
The bigger N is, the more a performance hit this becomes, particularly if your queries are sent across the network to a remote database. That's why N+1 problems keep cropping up in production - they're usually insignificant in development mode with little data in the DB, but as your data grows in production to thousands or millions of rows, your queries will slowly choke your server.
更大的N是,性能越受影响,特别是如果您的查询通过网络发送到远程数据库。这就是为什么N+1问题在生产中不断出现的原因——在开发模式中,在DB中很少有数据,它们通常是无关紧要的,但是当您的数据在生产中增长到数千或数百万行时,您的查询将缓慢地阻塞您的服务器。
You can instead use
您可以使用
- a single query (via a join) or
- 一个查询(通过连接)或
- 2 queries (one for the primary record, one for all associated records
- 查询(一个用于主记录,一个用于所有相关记录
The first query will return more data than strictly needed (the data of the first record will be duplicated in each row), but that's usually a good tradeoff to make. The second query might get a bit cumbersome for large data sets since all foreign keys are passed in as a single range, but again, it's usually a tradeoff worth making.
第一个查询将返回比严格需要的更多的数据(第一个记录的数据将在每一行中重复),但是这通常是一个很好的权衡。对于大型数据集,第二个查询可能会有点麻烦,因为所有外键都作为单个范围传入,但同样,这通常是值得进行的权衡。
The actual numbers depend on too many variables for statistics to be meaningful. Number or records, DB version, hardware etc. etc.
实际的数字依赖于太多的变量,以至于统计数据没有意义。数字或记录,数据库版本,硬件等。
Since you tagged this question with rails, ActiveRecord does a good job avoiding N+1 queries if you know how to use it. Check out the explanation of eager loading.
由于您使用rails标记了这个问题,如果您知道如何使用的话,ActiveRecord在避免N+1查询方面做得很好。查看热切装载的解释。
#2
0
The time difference would depend on how many additional selects were performed because of the N+1 problem. Here's a quote from an answer given to another * question regarding N+1 -
时差将取决于由于N+1问题而执行了多少额外的选择。这是另一个关于N+1 -的*问题的答案
Quote Start
报价开始
SELECT * FROM Cars;
/* for each car */
SELECT * FROM Wheel WHERE CarId = ?
In other words, you have one select for the Cars, and then N additional selects, where N is the total number of cars.
换句话说,你有一个汽车的选择,然后N个额外的选择,其中N是汽车的总数。
Quote End
报价结束
In the example above the time difference would depend on how many car records were in the database and how long it took to query the 'Wheel' table each time the code/ORM fetched a new record. If you only had 2 car records then the difference after removing the N+1 problem would be negligible, but if you have a million car records then it would have a significant affect.
在上面的示例中,时间差将取决于数据库中有多少汽车记录,以及每次代码/ORM获取新记录时查询“Wheel”表所需的时间。如果你只有2辆车的记录那么去除N+1问题后的差异可以忽略不计,但是如果你有100万辆车的记录那么它就会有显著的影响。