如何根据说明结果改进此查询

I have The Following query:

我有以下查询:

SELECT DISTINCT f1.match_static_id,
                f2.comments_no,
                f2.maxtimestamp,
                users.username,
                users.id,
                matches_of_comments.localteam_name,
                matches_of_comments.visitorteam_name,
                matches_of_comments.localteam_goals,       
                matches_of_comments.visitorteam_goals,
                matches_of_comments.match_status,
                new_iddaa.iddaa_code
FROM comments AS f1
INNER JOIN (
             SELECT match_static_id,
                    MAX( TIMESTAMP ) maxtimestamp,
                    COUNT( match_static_id ) AS comments_no
             FROM comments
             GROUP BY match_static_id
          ) AS f2 ON f1.match_static_id = f2.match_static_id 
                  AND f1.timestamp = f2.maxtimestamp
INNER JOIN users ON users.id = f1.user_id
INNER JOIN matches_of_comments ON matches_of_comments.match_id = f2.match_static_id
LEFT JOIN new_iddaa ON new_iddaa.match_id = matches_of_comments.match_id
WHERE matches_of_comments.flag =1
ORDER BY f2.maxtimestamp DESC

This is the EXPLAIN plan for that query :

这是该查询的EXPLAIN计划:

+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+
| id | select_type |        table        |  type  |           possible_keys           |    key    | key_len |                   ref                    | rows  |                     extra                      |
+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+
|  1 | PRIMARY     | <derived2>          | ALL    | NULL                              | NULL      | NULL    | NULL                                     |   542 | Using temporary; Using filesort                |
|  1 | PRIMARY     | f1                  | ref    | timestamp,match_static_id,user_id | timestamp | 4       | f2.maxtimestamp                          |     1 | Using where                                    |
|  1 | PRIMARY     | users               | eq_ref | PRIMARY                           | PRIMARY   | 4       | skormix_db1.f1.user_id                   |     1 |                                                |
|  1 | PRIMARY     | matches_of_comments | ALL    | match_id                          | NULL      | NULL    | NULL                                     | 20873 | Range checked for each record (index map: 0x8) |
|  1 | PRIMARY     | new_iddaa           | ref    | match_id                          | match_id  | 4       | skormix_db1.matches_of_comments.match_id |     1 |                                                |
|  2 | DERIVED     | comments            | ALL    | NULL                              | NULL      | NULL    | NULL                                     |   933 | Using temporary; Using filesort                |
+----+-------------+---------------------+--------+-----------------------------------+-----------+---------+------------------------------------------+-------+------------------------------------------------+

I use this query to get a match information if this match has at least one comment.
I get the names of the teams , the code (iddaa code), the number of comments, the timstamp of the last commrnt, the author of the last comment.
I have a big database and it is expected to be larger in the next few monthes and I am very new with the MySQL queries and I want to be sure that I am using the optimize queries from the beginning so I want to know how to read this explain information to make the query better and faster.

如果此匹配至少有一条评论,我会使用此查询来获取匹配信息。我得到了团队的名字,代码(iddaa代码),评论数量,最后一个commrnt的timstamp,最后一条评论的作者。我有一个大型数据库,预计在接下来的几个月内会更大,我对MySQL查询非常新,我想确保我从一开始就使用优化查询,所以我想知道如何阅读这解释了使查询更好,更快的信息。

I see that there are a lot of places in the table that does not use the indexes eventhough i built them.
I also see derived in the table column and I do not know how to make this query more fast and how to get rid of the filesort because I can not make indexes for the derived queries??

我看到表中有很多地方虽然我建了它们,但它们并没有使用索引。我也看到在表列中派生,我不知道如何使这个查询更快,以及如何摆脱filesort因为我不能为派生查询制作索引?

I write down the structure of the using tables in the query with the indexes (keys) and I hope to get some hints or simple answers for my questions , thanks in advance .

我用索引(键)写下查询中使用表的结构,我希望能提前得到一些提示或简单的答案。

Comments (f1) table structure is :

注释(f1)表结构是:

CREATE TABLE `comments` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `comments` text COLLATE utf8_unicode_ci NOT NULL,
 `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `date` date NOT NULL,
 `time` time NOT NULL,
 `match_static_id` int(25) NOT NULL,
 `ip` varchar(255) CHARACTER SET latin1 NOT NULL,
 `comments_yes_or_no` int(25) NOT NULL,
 `user_id` int(25) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `timestamp` (`timestamp`),
 KEY `match_static_id` (`match_static_id`),
 KEY `user_id` (`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=935 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

users table structure is:

用户表结构是:

CREATE TABLE `users` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `username` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `password` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `email` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `gender` int(25) NOT NULL,
 `first_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `last_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `avatar` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `alert` int(25) NOT NULL,
 `daily_tahmin` int(25) NOT NULL,
 `monthly_tahmin` int(25) NOT NULL,
 `admin` int(25) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=995 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

matches_of_comments_ structure is :

matches_of_comments_结构是:

CREATE TABLE `matches_of_comments` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `en_tournament_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `tournament_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `country_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_status` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_time` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_date` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `static_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `fix_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `match_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_goals` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `localteam_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_goals` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `visitorteam_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `ht_score` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
 `flag` int(25) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `match_status` (`match_status`),
 KEY `match_date` (`match_date`),
 KEY `match_id` (`match_id`),
 KEY `localteam_id` (`localteam_id`),
 KEY `visitorteam_id` (`visitorteam_id`),
 KEY `flag` (`flag`)
) ENGINE=MyISAM AUTO_INCREMENT=237790 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

new_iddaa table structure is :

new_iddaa表结构是:

CREATE TABLE `new_iddaa` (
 `id` int(25) NOT NULL AUTO_INCREMENT,
 `match_id` int(25) NOT NULL,
 `iddaa_code` int(25) NOT NULL,
 `tv_channel` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
 `skormix_tahmin` varchar(255) CHARACTER SET utf8 NOT NULL,
 PRIMARY KEY (`id`),
 KEY `match_id` (`match_id`)
) ENGINE=MyISAM AUTO_INCREMENT=8191 DEFAULT CHARSET=latin1

5 个解决方案

#1

Starting with the more pressing issues, before discussing options.

在讨论选项之前,先从更紧迫的问题开始。

The first immediate problem is:

第一个直接问题是:

SELECT DISTINCT …

A select distinct is slow. Very, very slow: it basically compares each field of each row returned by your set. There naturally is room for optimization when there's an ID in there that is guaranteed to be unique per row, but your own query doesn't look like it offers any such possibility: at best a tuple from matches_of_comments and new_iddaa.

选择区别很慢。非常非常慢:它基本上比较了您的集合返回的每一行的每个字段。当有一个ID确保每行唯一时,自然存在优化的空间,但是你自己的查询看起来并不像它提供任何这样的可能性:最好是来自matches_of_comments和new_iddaa的元组。

To work around this, break the query in two or more parts, and only fetch what is actually needed for what you're doing. This seems to be ordering matches_of_comments by their latest comment date, and then fetching extra cosmetic data from users and new_iddaa.

要解决此问题,请在两个或多个部分中断查询,并仅获取您正在执行的操作所需的内容。这似乎是按照最新评论日期对matches_of_comments进行排序,然后从用户和new_iddaa获取额外的化妆品数据。

The next one is the biggest problem imho:

下一个是imho最大的问题:

INNER JOIN (
         SELECT match_static_id,
                MAX( TIMESTAMP ) maxtimestamp,
                COUNT( match_static_id ) AS comments_no
         FROM comments
         GROUP BY match_static_id
      ) AS f2 ON f1.match_static_id = f2.match_static_id 
              AND f1.timestamp = f2.maxtimestamp

You're joining an aggregate with a table on a (match_static_id, timestamp) tuple that has no index on it, and fetching a huge set at that. You've a guaranteed merge join down the road — not what you want.

您正在加入一个聚合,其中包含一个(match_static_id,timestamp)元组上的表,该表没有索引,并且获取了一个巨大的集合。你有一个有保障的合并加入 - 不是你想要的。

The last eye popping issue is:

最后一个引人注目的问题是:

ORDER BY f2.maxtimestamp DESC

First off, you've no limit there. This means you're going to build, sort and return an enormous set. Surely you're paginating this data, so do so in the query by adding a limit clause.

首先,你没有限制。这意味着你要构建,排序并返回一个庞大的集合。当然,您正在对这些数据进行分页,因此在查询中通过添加limit子句来执行此操作。

Once you do add a limit, you need to consider what is adding extra rows, and how they should be ordered. Based on your schema, I imagine new_iddaa does. Are you paginating things in such a way that the latter information needs to be part of that query and the number of rows it returns? I imagine not, since you're not evidently interested in how these rows are sorted.

添加限制后,您需要考虑添加额外行的内容以及应如何排序。根据您的架构,我想new_iddaa。您是否以这样的方式对事物进行分页,即后一个信息需要成为该查询的一部分以及它返回的行数?我想不到,因为你对这些行的排序方式显然不感兴趣。

After scanning your schema, this additional one pops out:

扫描您的架构后,会弹出另外一个:

`match_id` varchar(255)

The rows that reference this are integers, right? So it should be an integer as well, to avoid the overhead of casting varchars to ints or vice-versa, and to allow the use of indexes in either case.

引用它的行是整数,对吗?所以它也应该是一个整数,以避免将varchars转换为int的开销,反之亦然,并允许在任何一种情况下使用索引。

While not relevant to this particular query, the following two fields also need attention and proper casting:

虽然与此特定查询无关,但以下两个字段也需要注意并正确转换:

`tournament_id` varchar(255)
`match_time` varchar(255)
`match_date` varchar(255)
`static_id` varchar(255)
`fix_id` varchar(255)
`localteam_id` varchar(255)
`visitorteam_id` varchar(255)

Onto improving the query…

在改进查询...

As I read it, you're ordering matches_of_comments by latest comment. You also want the number of comments, so we start by doing that. Assuming you're paginating the first 10 of many, the query goes like this:

在我阅读时,您按最新评论订购了matches_of_comments。您还需要评论数量,因此我们首先要这样做。假设您正在对前十个中的前十个进行分页,则查询如下所示:

SELECT match_static_id,
       MAX( TIMESTAMP ) maxtimestamp,
       COUNT( match_static_id ) AS comments_no
FROM comments
GROUP BY match_static_id
ORDER BY maxtimestamp DESC
LIMIT 10 OFFSET 0

That's all.

It gives you 10 IDs — more if you increase the limit. Loop through them in your app and build an in (…) clause that will allow you to fetch each individual bit of data from the other tables as needed; you can do this with one or several queries, it matters little. The point is to avoid joining on that aggregate, so that indexes are available for the follow-up queries.

它为您提供10个ID - 如果您增加限制,则更多。在您的应用程序中循环它们并构建一个in(...)子句,允许您根据需要从其他表中获取每个单独的数据位;你可以用一个或几个查询做到这一点,这很重要。关键是要避免加入该聚合,以便索引可用于后续查询。

You could improve things more dramatically yet, by removing the above-mentioned query entirely.

通过完全删除上述查询,您可以更加显着地改进事物。

To do so, add three fields to matches_of_comments, namely last_comment_timestamp, last_comment_user_id, and num_comments. Maintain them using triggers, and add an index on (flag, last_comment_timestamp). This will allow you to run the following efficient query instead:

为此,请向matches_of_comments添加三个字段,即last_comment_timestamp,last_comment_user_id和num_comments。使用触发器维护它们,并在(flag,last_comment_timestamp)上添加索引。这将允许您运行以下有效查询:

SELECT matches_of_comments.static_id,
       matches_of_comments.num_comments,
       matches_of_comments.last_comment_timestamp,
       matches_of_comments.last_comment_user_id,
       matches_of_comments.localteam_name,
       matches_of_comments.visitorteam_name,
       matches_of_comments.localteam_goals,       
       matches_of_comments.visitorteam_goals,
       matches_of_comments.match_status
FROM matches_of_comments
WHERE matches_of_comments.flag = 1
ORDER BY matches_of_comments.last_comment_timestamp DESC
LIMIT 10 OFFSET 0

You then need to only select the needed data from users and new_iddaa — using separate queries with an in (…) clause as already discussed.

然后,您只需要从用户和new_iddaa中选择所需的数据 - 使用具有in(...)子句的单独查询,如前所述。

#2

Main problem of this query is number of joins. I would suggest:

此查询的主要问题是连接数。我会建议:

Take nested query f2 out and insert its output into table or temporary table, better with index. (With index it will change full table scan to index, speeding things up. Chances are high that such table will be reusable).

将嵌套查询f2输出并将其输出插入表或临时表中,最好使用索引。 (使用索引时,它会将全表扫描更改为索引,从而加快速度。很可能这样的表可以重复使用)。
lternatively to 1 or at the same time, swap f1 and f2. You need to eliminate rows as early as possible. So first get the list of matches and timestapms you need, and join only necessary date. This query takes huge comments table and filters out unnecessary rows. It is easier to take 10% than through away 90% for the same result.

或者替换为1或同时交换f1和f2。您需要尽早消除行。因此,首先获取您需要的匹配列表和时间戳,然后只加入必要的日期。此查询采用巨大的注释表并过滤掉不必要的行。对于相同的结果,采取10%比通过90%更容易。
Remove Distinct as it does nothing here: all joins are on equalities. If there any duplicates get data from all tables and find difference. Then add necessary JOIN condition to pick rows you need.

删除Distinct,因为它在这里什么都不做:所有连接都是相同的。如果有任何重复项从所有表中获取数据并找到差异。然后添加必要的JOIN条件以选择所需的行。
Consider these PRIMARY KEYs: comments: PRIMARY KEY (match_static_id) matches_of_comments: PRIMARY KEY (match_id) new_iddaa: PRIMARY KEY (match_id)

考虑这些PRIMARY KEY:注释:PRIMARY KEY(match_static_id)matches_of_comments:PRIMARY KEY(match_id)new_iddaa:PRIMARY KEY(match_id)

Reason: Primary Key should be done on an often used column with meaning. Having it on technical AUTOINCREMENTing column does not add value to the table.

原因:主键应该在经常使用的有意义的列上完成。在技术AUTOINCREMENTing列上使用它不会为表增加值。
You may consider adding flag as a first column to PRIMARY KEY. It will add an overhead when flag is changed, but will speed up this query as all flag=1 records will be in a single range and in a sequence on hdd. Hope this helps.

您可以考虑将标志作为第一列添加到PRIMARY KEY。当标志被更改时,它将增加开销,但会加速此查询,因为所有flag = 1记录将在单个范围内并且在hdd中的序列中。希望这可以帮助。

#3

What if you turned this piece:

如果你把这件作品怎么办:

     SELECT match_static_id,
            MAX( TIMESTAMP ) maxtimestamp,
            COUNT( match_static_id ) AS comments_no
     FROM comments
     GROUP BY match_static_id

into a table and you populated this with a trigger? When a new record is saved in comments, it triggers an update in a 'comments_counter' table or something like that? This should give you some performance, as you wouldn't need the Max, Counts, group by operations.

进入一个表,你用一个触发器填充这个?当一条新记录保存在注释中时,它会触发'comments_counter'表中的更新或类似的内容?这应该会给你一些性能,因为你不需要Max,Counts,group by operations。

Also, matches_of_comments.match_id = f2.match_static_id have different datatypes, the first is a varchar(255) and the second is a int(25) - having both of them be int(25) should help the overall performance.

此外,matches_of_comments.match_id = f2.match_static_id具有不同的数据类型,第一个是varchar(255),第二个是int(25) - 两者都是int(25)应该有助于整体性能。

And last, I'd make user_id in comments be a foreign key to user.id too.

最后,我将使注释中的user_id成为user.id的外键。

#4

The distinct always will be a impact in the performance, verify what part of your query is making to obtain multiple rows like the left outer join, the objective is remove the distinct. Make your columns smaller, take just the space required to reduce index space.

distinct始终会对性能产生影响,验证查询的哪一部分正在获取多个行,如左外连接,目标是删除distinct。使列更小,只需占用减少索引空间所需的空间。

Verifiy if using in the order by the date from the subquery is using the index, if not use the equivalent date from the table.

验证如果在子查询中按日期使用顺序是使用索引,如果不使用表中的等效日期。

#5

I have used an analytical function. But as there is no data for me to test, I am not very much sure whether this is optimal

我使用过分析功能。但由于没有数据供我测试,我不太确定这是否是最佳的

SELECT * 
FROM
(
    SELECT DISTINCT f1.match_static_id,
                    users.username,
                    users.id,
                    matches_of_comments.localteam_name,
                    matches_of_comments.visitorteam_name,
                    matches_of_comments.localteam_goals,       
                    matches_of_comments.visitorteam_goals,
                    matches_of_comments.match_status,
                    new_iddaa.iddaa_code,
                    @MAX_TIMESTAMP AS `FIRST_VALUE(MATCH_STATIC_ID) OVER(partition by f1.match_static_id ORDER BY F1.TIMESTAMP DESC)`,
                    @COMMENTS_NO AS `COUNT(1) OVER(partition by f1.match_static_id)`,
                    F1.TIMESTAMP
    FROM comments AS f1
    INNER JOIN users ON users.id = f1.user_id
    INNER JOIN matches_of_comments 
    ON matches_of_comments.match_id = f1.match_static_id
    AND matches_of_comments.flag =1
    LEFT JOIN new_iddaa 
    ON new_iddaa.match_id = matches_of_comments.match_id
) A
WHERE @MAX_TIMESTAMP = TIMESTAMP
ORDER BY @MAX_TIMESTAMP DESC

#1