I have some queries that are taking too long (300ms) now that the DB has grown to a few million records. Luckily for me the queries don't need to look at the majority of this data, that latest 100,000 records will be sufficient so my plan is to maintain a separate table with the most recent 100,000 records and run the queries against this. If anyone has any suggestions for a better way of doing this that would be great. My real question is what are the options if the queries did need to run against the historic data, what is the next step? Things I've thought of:
我有一些查询太长(300ms),因为DB已经增长到几百万条记录。幸运的是,查询不需要查看这些数据的大部分,最新的100,000条记录就足够了,因此我的计划是维护一个包含最近的100,000条记录的单独表,并对其运行查询。如果有人能提出更好的方法,那就太好了。我真正的问题是,如果查询确实需要针对历史数据运行,那么有哪些选项,下一步是什么?我想到的东西:
- Upgrade hardware
- 升级硬件
- Use an in memory database
- 使用内存数据库
- Cache the objects manually in your own data structure
- 在您自己的数据结构中手动缓存对象
Are these things correct and are there any other options? Do some DB providers have more functionality than others to deal with these problems, e.g. specifying a particular table/index to be entirely in memory?
这些都是正确的吗?还有其他的选择吗?某些DB提供程序在处理这些问题时是否具有比其他提供程序更多的功能,例如指定一个完全位于内存中的特定表/索引?
Sorry, I should've mentioned this, I'm using mysql.
抱歉,我应该提一下,我用的是mysql。
I forgot to mention indexing in the above. Indexing have been my only source of improvement thus far to be quite honest. In order to identify bottlenecks I've been using maatkit for the queries to show whether or not indexes are being utilised.
我忘了在上面提到索引。到目前为止,索引一直是我改进的唯一来源,说实话。为了识别瓶颈,我一直在使用maatkit进行查询,以显示是否使用了索引。
I understand I'm now getting away from what the question was intended for so maybe I should make another one. My problem is that EXPLAIN
is saying the query takes 10ms rather than 300ms which jprofiler is reporting. If anyone has any suggestions I'd really appreciate it. The query is:
我知道我现在已经摆脱了这个问题的意图所以也许我应该再做一个。我的问题是EXPLAIN是说查询需要10ms而不是jprofiler报告的300ms。如果有人有什么建议,我将非常感谢。这个查询的方法是:
select bv.*
from BerthVisit bv
inner join BerthVisitChainLinks on bv.berthVisitID = BerthVisitChainLinks.berthVisitID
inner join BerthVisitChain on BerthVisitChainLinks.berthVisitChainID = BerthVisitChain.berthVisitChainID
inner join BerthJourneyChains on BerthVisitChain.berthVisitChainID = BerthJourneyChains.berthVisitChainID
inner join BerthJourney on BerthJourneyChains.berthJourneyID = BerthJourney.berthJourneyID
inner join TDObjectBerthJourneyMap on BerthJourney.berthJourneyID = TDObjectBerthJourneyMap.berthJourneyID
inner join TDObject on TDObjectBerthJourneyMap.tdObjectID = TDObject.tdObjectID
where
BerthJourney.journeyType='A' and
bv.berthID=251860 and
TDObject.headcode='2L32' and
bv.depTime is null and
bv.arrTime > '2011-07-28 16:00:00'
and the output from EXPLAIN
is:
EXPLAIN的输出是:
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+-------------------------------------------------------+
| 1 | SIMPLE | bv | index_merge | PRIMARY,idx_berthID,idx_arrTime,idx_depTime | idx_berthID,idx_depTime | 9,9 | NULL | 117 | Using intersect(idx_berthID,idx_depTime); Using where |
| 1 | SIMPLE | BerthVisitChainLinks | ref | idx_berthVisitChainID,idx_berthVisitID | idx_berthVisitID | 8 | Network.bv.berthVisitID | 1 | Using where |
| 1 | SIMPLE | BerthVisitChain | eq_ref | PRIMARY | PRIMARY | 8 | Network.BerthVisitChainLinks.berthVisitChainID | 1 | Using where; Using index |
| 1 | SIMPLE | BerthJourneyChains | ref | idx_berthJourneyID,idx_berthVisitChainID | idx_berthVisitChainID | 8 | Network.BerthVisitChain.berthVisitChainID | 1 | Using where |
| 1 | SIMPLE | BerthJourney | eq_ref | PRIMARY,idx_journeyType | PRIMARY | 8 | Network.BerthJourneyChains.berthJourneyID | 1 | Using where |
| 1 | SIMPLE | TDObjectBerthJourneyMap | ref | idx_tdObjectID,idx_berthJourneyID | idx_berthJourneyID | 8 | Network.BerthJourney.berthJourneyID | 1 | Using where |
| 1 | SIMPLE | TDObject | eq_ref | PRIMARY,idx_headcode | PRIMARY | 8 | Network.TDObjectBerthJourneyMap.tdObjectID | 1 | Using where |
+----+-------------+-------------------------+-------------+---------------------------------------------+-------------------------+---------+------------------------------------------------+------+---------------------------------------
7 rows in set (0.01 sec)
7 个解决方案
#1
1
Considering a design change like this is not a good sign - I bet you still have plenty of performance to squeeze out using EXPLAIN, adjusting db variables and improving the indexes and queries. But you're probably past the point where "trying stuff" works very well. It's an opportunity to learn how to interpret the analyses and logs, and use what you learn for specific improvements to indexes and queries.
考虑到这样的设计更改并不是一个好兆头——我敢打赌,您仍然可以使用EXPLAIN、调整db变量、改进索引和查询来提高性能。但是你很可能已经超过了“尝试东西”的效果。这是一个学习如何解释分析和日志的机会,并利用您学到的知识对索引和查询进行特定的改进。
If your suggestion were a good one, you should be able to tell us why already. And note that this is a popular pessimization--
如果你的建议很好,你应该能告诉我们为什么。注意,这是一种流行的悲观主义
What is the most ridiculous pessimization you've seen?
你见过的最荒谬的悲观主义是什么?
#2
3
- Make sure all your indexes are optimized. Use
explain
on the query to see if it is using your indexes efficiently. - 确保所有索引都经过了优化。在查询上使用explain查看它是否有效地使用索引。
- If you are doing some heavy joins then start thinking about doing this calculation in java.
- 如果您正在进行一些繁重的连接,那么请开始考虑使用java进行这种计算。
- Think of using other DBs such NoSQL. You maybe able to do some preprocessing and put data in Memcache to help you a little.
- 考虑使用其他DBs这样的NoSQL。你也许可以做一些预处理,把数据放到缓存memto帮助你一点。
#3
1
Well, if you have optimised the database and queries, I'd say that rather than chop up the data, the next step is to look at:
好吧,如果你已经优化了数据库和查询,我要说的是,与其将数据分割开来,下一步是:
a) the mysql configuration and make sure that it is making the most of the hardware
a) mysql配置,确保充分利用硬件
b) look at the hardware. You don't say what hardware you are using. You may find that replication is an option in your case if you can buy a two or three servers to divide up the reads from the database (writes have to be done to a central server, but reads can be read from any number of slaves).
b)看硬件。你不会说你在用什么硬件。您可能会发现,如果您可以购买两个或三个服务器来划分从数据库的读操作(必须对*服务器执行写操作,但是可以从任意数量的从服务器读取读操作),那么复制就是一种选择。
#4
1
Instead of creating a separate table for latest results, think about table partitioning. MySQL has this feature built in since version 5.1
不要为最新的结果创建单独的表,考虑一下表分区。MySQL在版本5.1中内置了这个特性
Just to make it clear: I am not saying this is THE solution for your issues. Just one thing you can try
我只是想说明:我并不是说这是解决你们问题的办法。只有一件事你可以尝试
#5
0
I would start by trying to optimize the tables/indexes/queries before before taking any of the measures you listed. Have you dug into the poorly performing queries to the point where you are absolutely convinced you have reached the limit of your RDBMS's capabilities?
在采取您列出的任何措施之前,我将首先尝试优化表/索引/查询。您是否已经深入研究了性能较差的查询,以至于确信已经达到了RDBMS功能的极限?
Edit: if you are indeed properly optimized, but still have problems, consider creating a Materialized View for the exact data you need. That may or may not be a good idea based on more factors than you have provided, but I would put it at the top of the list of things to consider.
编辑:如果您确实进行了适当的优化,但仍然存在问题,请考虑为您需要的确切数据创建一个物化视图。基于比你提供的更多的因素,这也许是一个好主意,但我会把它放在需要考虑的事情的首位。
#6
0
Searching in the last 100,000 records should be terribly fast, you definitely have problems with the indexes. Use EXPLAIN and fix it.
在最近的100,000条记录中搜索应该非常快,您肯定会遇到索引问题。使用EXPLAIN和fix。
#7
0
I understand I'm now getting away from what the question was intended for so maybe I should make another one. My problem is that EXPLAIN is saying the query takes 10ms rather than 300ms which jprofiler is reporting.
我知道我现在已经摆脱了这个问题的意图所以也许我应该再做一个。我的问题是EXPLAIN是说查询需要10ms而不是jprofiler报告的300ms。
Then your problem (and solution) must be in java, right?
那么您的问题(和解决方案)必须是java,对吗?
#1
1
Considering a design change like this is not a good sign - I bet you still have plenty of performance to squeeze out using EXPLAIN, adjusting db variables and improving the indexes and queries. But you're probably past the point where "trying stuff" works very well. It's an opportunity to learn how to interpret the analyses and logs, and use what you learn for specific improvements to indexes and queries.
考虑到这样的设计更改并不是一个好兆头——我敢打赌,您仍然可以使用EXPLAIN、调整db变量、改进索引和查询来提高性能。但是你很可能已经超过了“尝试东西”的效果。这是一个学习如何解释分析和日志的机会,并利用您学到的知识对索引和查询进行特定的改进。
If your suggestion were a good one, you should be able to tell us why already. And note that this is a popular pessimization--
如果你的建议很好,你应该能告诉我们为什么。注意,这是一种流行的悲观主义
What is the most ridiculous pessimization you've seen?
你见过的最荒谬的悲观主义是什么?
#2
3
- Make sure all your indexes are optimized. Use
explain
on the query to see if it is using your indexes efficiently. - 确保所有索引都经过了优化。在查询上使用explain查看它是否有效地使用索引。
- If you are doing some heavy joins then start thinking about doing this calculation in java.
- 如果您正在进行一些繁重的连接,那么请开始考虑使用java进行这种计算。
- Think of using other DBs such NoSQL. You maybe able to do some preprocessing and put data in Memcache to help you a little.
- 考虑使用其他DBs这样的NoSQL。你也许可以做一些预处理,把数据放到缓存memto帮助你一点。
#3
1
Well, if you have optimised the database and queries, I'd say that rather than chop up the data, the next step is to look at:
好吧,如果你已经优化了数据库和查询,我要说的是,与其将数据分割开来,下一步是:
a) the mysql configuration and make sure that it is making the most of the hardware
a) mysql配置,确保充分利用硬件
b) look at the hardware. You don't say what hardware you are using. You may find that replication is an option in your case if you can buy a two or three servers to divide up the reads from the database (writes have to be done to a central server, but reads can be read from any number of slaves).
b)看硬件。你不会说你在用什么硬件。您可能会发现,如果您可以购买两个或三个服务器来划分从数据库的读操作(必须对*服务器执行写操作,但是可以从任意数量的从服务器读取读操作),那么复制就是一种选择。
#4
1
Instead of creating a separate table for latest results, think about table partitioning. MySQL has this feature built in since version 5.1
不要为最新的结果创建单独的表,考虑一下表分区。MySQL在版本5.1中内置了这个特性
Just to make it clear: I am not saying this is THE solution for your issues. Just one thing you can try
我只是想说明:我并不是说这是解决你们问题的办法。只有一件事你可以尝试
#5
0
I would start by trying to optimize the tables/indexes/queries before before taking any of the measures you listed. Have you dug into the poorly performing queries to the point where you are absolutely convinced you have reached the limit of your RDBMS's capabilities?
在采取您列出的任何措施之前,我将首先尝试优化表/索引/查询。您是否已经深入研究了性能较差的查询,以至于确信已经达到了RDBMS功能的极限?
Edit: if you are indeed properly optimized, but still have problems, consider creating a Materialized View for the exact data you need. That may or may not be a good idea based on more factors than you have provided, but I would put it at the top of the list of things to consider.
编辑:如果您确实进行了适当的优化,但仍然存在问题,请考虑为您需要的确切数据创建一个物化视图。基于比你提供的更多的因素,这也许是一个好主意,但我会把它放在需要考虑的事情的首位。
#6
0
Searching in the last 100,000 records should be terribly fast, you definitely have problems with the indexes. Use EXPLAIN and fix it.
在最近的100,000条记录中搜索应该非常快,您肯定会遇到索引问题。使用EXPLAIN和fix。
#7
0
I understand I'm now getting away from what the question was intended for so maybe I should make another one. My problem is that EXPLAIN is saying the query takes 10ms rather than 300ms which jprofiler is reporting.
我知道我现在已经摆脱了这个问题的意图所以也许我应该再做一个。我的问题是EXPLAIN是说查询需要10ms而不是jprofiler报告的300ms。
Then your problem (and solution) must be in java, right?
那么您的问题(和解决方案)必须是java,对吗?