I inherited an e-commerce software project written in PHP. As I was inspecting the codebase, I found a lot of SQL statements all over the code. There are a lot of classes like Product, Category, User, Customer, etc. and every class has a lot of database queries.
我继承了一个用PHP编写的电子商务软件项目。在检查代码基时,我发现代码中到处都是SQL语句。有很多类,比如产品、类别、用户、客户等等,每个类都有很多数据库查询。
I didn't know how to treat this situation and decided to count the total queries of a single page visit. I encapsulated the MySQL query function and increased a counter.
我不知道如何处理这种情况,于是决定计算一次页面访问的总查询数。我封装了MySQL查询函数并增加了一个计数器。
I was a little shocked at the result. To visit the index-page alone, there were 1633 (!) MySQL select queries executed. Listing the products of a category triggered almost 2000 queries.
我对结果有点震惊。单是访问索引页,就有1633 (!)MySQL select查询执行。列出一个类别的产品引发了近2000个查询。
I piped the queries into a text file to analyze them. Over 90% are single select statements of maybe one or two values. Now what should I do to clean up this mess? What is your advice? I enabled caching at the MySQL server. Loading the page takes about 490ms.
我将查询导入一个文本文件进行分析。超过90%是一个或两个值的单选择语句。现在我该怎么做才能把这些乱七八糟的东西清理干净呢?你的建议是什么?我在MySQL服务器上启用了缓存。加载页面大约需要490ms。
Additional detail
For example, there is a class called Product. Inside this class there are 8 single small SQL select statements.
例如,有一个类叫做Product。在这个类中有8个单独的小SQL select语句。
When you now open the category listing to display the products, the original programmer used one select statement to get get a list of the needed products and then created a product object for each of them.
当您现在打开类别列表以显示产品时,最初的程序员使用一个select语句获取所需产品的列表,然后为每个产品创建一个产品对象。
Let's say this result gives us 20 products:
假设这个结果给了我们20个产品:
select id from products where price <= 10;
then he iterates through the results and creates a product object for every entry:
然后他遍历结果,为每个条目创建一个产品对象:
$qresult = query("select id from products where price <= 10");
$products = array();
foreach ($qresult as $prod) {
$products[] = new Product($prod['id']);
}
This alone generates 20 * 8 SQL queries just for the products. And the same method is used for other classes too (User, Customer, Category and so on).
仅这一项就为产品生成了20 * 8的SQL查询。同样的方法也适用于其他类(用户、客户、类别等)。
Some Time Ago
前一段时间
Now, after some weeks/months have passed, I wanted to share my solutions I did so far.
现在,几个星期/几个月过去了,我想分享我到目前为止的解决方案。
I could cut down the queries to < ~50 per page visit and reduced the page loading time to under 400ms.
我可以将每个页面访问的查询减少到< ~50,并将页面加载时间减少到400ms以下。
I did it very easily. I tried to identified the hotspots and build a table cache class. Each visit this static class loads the whole table content to memory and each table request from now on will served out of the memory from the static class. Well very dirty and not that nice but it works, is faster, reduces the total queries and spares the server hardware.
我做得很轻松。我尝试识别热点并构建一个表缓存类。每次访问这个静态类时,都会将整个表内容加载到内存中,从现在开始,每个表请求都会从静态类的内存中提供。非常脏,不是很好,但它工作起来更快,减少了总查询,并节省了服务器硬件。
I guess we also will throw hardware to this problem als long as the user count increases in that ratio as it did by now.
我想我们也会向这个问题扔硬件只要用户数量在这个比率中增加,就像现在一样。
If we come to the point to replace the application by another, we will definitely head for a database-query-nice solution
如果我们想要用另一种方法来替换应用程序,我们一定会去寻找一个数据库查询的解决方案。
thanks all for your advices
谢谢大家的建议
4 个解决方案
#1
1
Well, for starters the first thing you need to identify lies in the php realm, and is how many of those calls are executed more than once. That alone will cut down your numbers.
首先,首先要识别的是php领域,其中有多少调用是多次执行的。单这一点就能减少你的数量。
Other than that you can do two things.
除此之外,你还可以做两件事。
- If you have several queries that use the same parameters (say productId, and you bring the product table, the product category table and so forth, you can always join those queries and bring a single result with all you need (or make views of the most common queries joined and work with that)
- 如果有多个查询,使用相同的参数(productId说,你把产品表,产品类别表等等,你可以加入这些查询,将与你所需要的一个结果(或最常见的查询视图的加入并使用)
- If you have a settings table, the best thing to do is load all the table into memory and read all values from there, so you don't have to go to the DB anytime you need a setting
- 如果有一个settings表,最好的做法是将所有的表加载到内存中,然后从其中读取所有的值,这样在需要设置的时候就不需要去DB了
#2
1
Ha, this one made me smile - I have the same problem with a legacy app I work on, together with the same level of query inefficiency! In the short term, we just throw hardware at the problem, but at least new work going forward is better written. Sounds like a custom ORM without the necessary features, or one that isn't used in an optimal way (historically, we have that too!).
哈,这个让我笑了——我有同样的问题,我在一个遗留应用程序工作,和同样的水平的查询效率!在短期内,我们只是把硬件扔到这个问题上,但至少未来的新工作写得更好。听起来像是一个没有必要功能的定制ORM,或者是一个没有以最佳方式使用的ORM(历史上,我们也有这样的功能!)
Try refactoring this:
尝试重构:
$products = array(); foreach ($products as $prod) { $products[] = new Products($prod['id']); }
Into this:
到这个:
// Assuming products is an array of arrays, with each inner array // containing all the values that make up an array $products = array(); foreach ($products as $prod) { $products[] = Products::convertToObject($prod['id']); }
That'll save you N queries per loop right there. If you ensure that each ORM class (Products
etc) inherits from something like BaseORM
then the common code to do this can be written just once.
这样就可以在每个循环中保存N个查询。如果您确保每个ORM类(产品等)都继承了类似BaseORM之类的东西,那么可以只编写一次这样的通用代码。
If you find that you don't have much in the way of joins, then branch the current code and have a play with replacing your home-grown ORM with something like Propel or Doctrine. Be aware that there's a learning curve to using ORM systems you don't know, but they payoff in the end generally better than writing the whole thing yourself.
如果您发现您没有太多的连接方式,那么分支当前代码,并有一个游戏,用类似于推进或原则的东西来代替您自己的ORM。要知道使用ORM系统是有学习曲线的,你不知道,但是它们最终的回报通常比你自己写整个东西要好。
#3
1
I think you're right to be concerned. 2,000 queries seems a lot.
我认为你的担心是对的。2000个查询似乎很多。
You’ve already identified two good reasons for refactoring, the amount of queries and the page response time – both measureable and suitable for refactoring.
您已经确定了重构的两个好理由:查询的数量和页面响应时间——这两个理由都是可以度量的,并且适合重构。
Although it is true MySQL has a query cache, and it will usually not query the underlying data each time if it can fulfil the query from the cache there could still be a network cost of talking to MySQL.
虽然MySQL确实有一个查询缓存,并且通常不会每次都查询底层数据,但如果它能够完成来自缓存的查询,那么与MySQL的通信仍然会产生网络成本。
Have you thought about saving values to memory or using a session variable?
您是否考虑过将值保存到内存中或使用会话变量?
<?php
session_start();
// store session data
$_SESSION['sharedvalue']= "result of common MySQL query"
?>
< html>
< body>
< ?php
//retrieve session data - now each time you do this it isn't asking MySQL again
echo "Common Value=". $_SESSION['sharedvalue'];
?>
< /body>
< /html>
Another solution would be have your own cache and query common values from that, changing or expiry them for refreshing old data. This would really be dependent on the size of your application and user base.
另一种解决方案是拥有自己的缓存和查询公共值,更改或终止它们以刷新旧数据。这实际上取决于应用程序和用户基数的大小。
#4
0
There is probably one (or more) loops (foreach or while, etc) that is calling mysql_query for each iteration.
可能有一个(或多个)循环(foreach或while等)在为每次迭代调用mysql_query。
Once you find that loop you can decide how to optimise it - for example, you could load the records all at once into an array and then the loop works on the array rather than calling the database each time.
一旦找到了循环,您就可以决定如何优化它——例如,您可以一次将所有记录加载到一个数组中,然后循环在数组上工作,而不是每次都调用数据库。
#1
1
Well, for starters the first thing you need to identify lies in the php realm, and is how many of those calls are executed more than once. That alone will cut down your numbers.
首先,首先要识别的是php领域,其中有多少调用是多次执行的。单这一点就能减少你的数量。
Other than that you can do two things.
除此之外,你还可以做两件事。
- If you have several queries that use the same parameters (say productId, and you bring the product table, the product category table and so forth, you can always join those queries and bring a single result with all you need (or make views of the most common queries joined and work with that)
- 如果有多个查询,使用相同的参数(productId说,你把产品表,产品类别表等等,你可以加入这些查询,将与你所需要的一个结果(或最常见的查询视图的加入并使用)
- If you have a settings table, the best thing to do is load all the table into memory and read all values from there, so you don't have to go to the DB anytime you need a setting
- 如果有一个settings表,最好的做法是将所有的表加载到内存中,然后从其中读取所有的值,这样在需要设置的时候就不需要去DB了
#2
1
Ha, this one made me smile - I have the same problem with a legacy app I work on, together with the same level of query inefficiency! In the short term, we just throw hardware at the problem, but at least new work going forward is better written. Sounds like a custom ORM without the necessary features, or one that isn't used in an optimal way (historically, we have that too!).
哈,这个让我笑了——我有同样的问题,我在一个遗留应用程序工作,和同样的水平的查询效率!在短期内,我们只是把硬件扔到这个问题上,但至少未来的新工作写得更好。听起来像是一个没有必要功能的定制ORM,或者是一个没有以最佳方式使用的ORM(历史上,我们也有这样的功能!)
Try refactoring this:
尝试重构:
$products = array(); foreach ($products as $prod) { $products[] = new Products($prod['id']); }
Into this:
到这个:
// Assuming products is an array of arrays, with each inner array // containing all the values that make up an array $products = array(); foreach ($products as $prod) { $products[] = Products::convertToObject($prod['id']); }
That'll save you N queries per loop right there. If you ensure that each ORM class (Products
etc) inherits from something like BaseORM
then the common code to do this can be written just once.
这样就可以在每个循环中保存N个查询。如果您确保每个ORM类(产品等)都继承了类似BaseORM之类的东西,那么可以只编写一次这样的通用代码。
If you find that you don't have much in the way of joins, then branch the current code and have a play with replacing your home-grown ORM with something like Propel or Doctrine. Be aware that there's a learning curve to using ORM systems you don't know, but they payoff in the end generally better than writing the whole thing yourself.
如果您发现您没有太多的连接方式,那么分支当前代码,并有一个游戏,用类似于推进或原则的东西来代替您自己的ORM。要知道使用ORM系统是有学习曲线的,你不知道,但是它们最终的回报通常比你自己写整个东西要好。
#3
1
I think you're right to be concerned. 2,000 queries seems a lot.
我认为你的担心是对的。2000个查询似乎很多。
You’ve already identified two good reasons for refactoring, the amount of queries and the page response time – both measureable and suitable for refactoring.
您已经确定了重构的两个好理由:查询的数量和页面响应时间——这两个理由都是可以度量的,并且适合重构。
Although it is true MySQL has a query cache, and it will usually not query the underlying data each time if it can fulfil the query from the cache there could still be a network cost of talking to MySQL.
虽然MySQL确实有一个查询缓存,并且通常不会每次都查询底层数据,但如果它能够完成来自缓存的查询,那么与MySQL的通信仍然会产生网络成本。
Have you thought about saving values to memory or using a session variable?
您是否考虑过将值保存到内存中或使用会话变量?
<?php
session_start();
// store session data
$_SESSION['sharedvalue']= "result of common MySQL query"
?>
< html>
< body>
< ?php
//retrieve session data - now each time you do this it isn't asking MySQL again
echo "Common Value=". $_SESSION['sharedvalue'];
?>
< /body>
< /html>
Another solution would be have your own cache and query common values from that, changing or expiry them for refreshing old data. This would really be dependent on the size of your application and user base.
另一种解决方案是拥有自己的缓存和查询公共值,更改或终止它们以刷新旧数据。这实际上取决于应用程序和用户基数的大小。
#4
0
There is probably one (or more) loops (foreach or while, etc) that is calling mysql_query for each iteration.
可能有一个(或多个)循环(foreach或while等)在为每次迭代调用mysql_query。
Once you find that loop you can decide how to optimise it - for example, you could load the records all at once into an array and then the loop works on the array rather than calling the database each time.
一旦找到了循环,您就可以决定如何优化它——例如,您可以一次将所有记录加载到一个数组中,然后循环在数组上工作,而不是每次都调用数据库。