优化跨层次数据的MySQL查询。

时间:2022-03-30 02:55:05

I have a fairly stable directed graph of order ~100k vertices and size ~1k edges. It is two-dimensional insofar as its vertices can be identified by a pair of integers (x, y) (of cardinality ~100 x ~1000) and all edges are strictly increasing in x.

我有一个相当稳定的有序的有向图~100k个顶点和大小~1k条边。它是二维的,因为它的顶点可以被一对整数(x, y)(基数~100 x ~1000)识别,并且所有的边在x中严格递增。

There is furthermore a dictionary of ~1k (key, val) pairs associated with each vertex.

此外,还有一个与每个顶点相关联的~1k (key, val)对字典。

I am currently storing the graph in a MySQL database across three (InnoDB) tables: a table of vertices (which I don't think is relevant to my question, so I have omitted to include both it and the foreign key constraints that refer to it in my extracts below); a table which holds the dictionaries; and a 'closure table' of connected vertices as described so eloquently by Bill Karwin.

我目前正在通过三个表(InnoDB)将这个图存储在MySQL数据库中:一个顶点表(我认为这个表与我的问题无关,因此我省略了它和在我下面的摘录中引用它的外键约束);保存字典的表;还有一个连接顶点的“闭表”,比尔·卡尔文对它进行了生动的描述。

The table of vertex dictionaries is defined as follows:

顶点字典表定义如下:

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);

and the closure table of connected vertices as:

连通顶点的闭表为:

CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);

There is also a dictionary of (x, key) pairs such that for each such pair, all vertices identified with that x have within their dictionaries a value for that key. This dictionary is stored in a fourth table:

还有一个(x, key)对的字典,对于每一个这样的对,与x标识的所有顶点在字典中都有这个键的值。本词典保存在第四个表中:

CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);

I often wish to extract the set of keys used in the dictionaries of all vertices having a particular x=X, together with the associated value of any SpecialKeys connected to the left:

我经常希望提取所有具有特定x= x的顶点字典中使用的键集,以及连接到左边的任何特殊键的关联值:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;

for which the EXPLAIN output is:

解释输出为:

id   select_type   table   type     possible_keys           key       key_len   ref                                rows   Extra
 1   SIMPLE        k       index    PRIMARY,xkey            xkey          154   NULL                                 40   Using index; Using temporary
 1   SIMPLE        c       ref      PRIMARY,reverse,fx,rx   PRIMARY         2   db.k.x                                1   Using where
 1   SIMPLE        v       ref      PRIMARY,dict            PRIMARY         4   const,db.c.head_y                   136   Using index
 1   SIMPLE        u       eq_ref   PRIMARY,dict            PRIMARY       156   db.c.tail_x,db.c.tail_y,db.k.key      1   Using where

But this query takes ~10s to complete. Been banging my head against a brick wall trying to improve matters, but to no avail.

但是这个查询需要大约10秒才能完成。我的头撞在一堵砖墙上,试图改善现状,但无济于事。

Can the query be improved, or should I consider a different data structure? Extremely grateful for your thoughts!

查询是否可以改进,或者是否应该考虑不同的数据结构?非常感谢你的想法!


UPDATE

更新

I'm still getting nowhere with this, although I did rebuild the tables and found the EXPLAIN output to be slightly different (as now shown above, the number of rows fetched from v had increased from 1 to 136!); the query is still taking ~10s to execute.

尽管我确实重新构建了表并发现EXPLAIN输出略有不同(如上所示,从v获取的行数从1增加到136!)查询仍然需要大约10秒才能执行。

I really don't understand what's going on here. Queries to obtain all (x, y, SpecialValue) and all (x, y, key) tuples are both very fast (~30ms and ~150ms respectively), yet essentially joining the two takes over fifty times longer than their combined time... how can I improve the time taken to perform that join?

我真的不明白这是怎么回事。获取所有(x、y、SpecialValue)和所有(x、y、key)元组的查询都非常快(分别为~30ms和~150ms),但实际上加入这两个元组的时间要比它们组合的时间长50倍……我怎样才能提高执行该连接所花费的时间?

Output of SHOW VARIABLES LIKE '%innodb%'; below:

输出显示变量,如'%innodb%';下图:

Variable_name                    Value
------------------------------------------------------------
have_innodb                      YES
ignore_builtin_innodb            ON
innodb_adaptive_flushing         ON
innodb_adaptive_hash_index       ON
innodb_additional_mem_pool_size  2097152
innodb_autoextend_increment      8
innodb_autoinc_lock_mode         1
innodb_buffer_pool_size          1179648000
innodb_change_buffering          inserts
innodb_checksums                 ON
innodb_commit_concurrency        0
innodb_concurrency_tickets       500
innodb_data_file_path            ibdata1:10M:autoextend
innodb_data_home_dir             /rdsdbdata/db/innodb
innodb_doublewrite               ON
innodb_fast_shutdown             1
innodb_file_format               Antelope
innodb_file_format_check         Barracuda
innodb_file_per_table            ON
innodb_flush_log_at_trx_commit   1
innodb_flush_method              O_DIRECT
innodb_force_recovery            0
innodb_io_capacity               200
innodb_lock_wait_timeout         50
innodb_locks_unsafe_for_binlog   OFF
innodb_log_buffer_size           8388608
innodb_log_file_size             134217728
innodb_log_files_in_group        2
innodb_log_group_home_dir        /rdsdbdata/log/innodb
innodb_max_dirty_pages_pct       75
innodb_max_purge_lag             0
innodb_mirrored_log_groups       1
innodb_old_blocks_pct            37
innodb_old_blocks_time           0
innodb_open_files                300
innodb_read_ahead_threshold      56
innodb_read_io_threads           4
innodb_replication_delay         0
innodb_rollback_on_timeout       OFF
innodb_spin_wait_delay           6
innodb_stats_method              nulls_equal
innodb_stats_on_metadata         ON
innodb_stats_sample_pages        8
innodb_strict_mode               OFF
innodb_support_xa                ON
innodb_sync_spin_loops           30
innodb_table_locks               ON
innodb_thread_concurrency        0
innodb_thread_sleep_delay        10000
innodb_use_sys_malloc            ON
innodb_version                   1.0.16
innodb_write_io_threads          4

6 个解决方案

#1


2  

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

没有花时间测试它,您提供了一个不完整的示例吗?您应该尝试重新排序已连接的表。Explain output提供了一些信息,假设key_len的排序应该是启发式最快的。第一个要过滤的表应该列在最后,以防优化器无法解决这个问题,我相信。

So, let's say 'c, v, k, u' order is the best.

假设c v k u的顺序是最好的。

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows' would suggest 'c/u, k, v' order, but that depends on data:

“row”表示“c/u、k、v”顺序,但这取决于数据:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

Hope this helps.

希望这个有帮助。

UPDATE (avoiding the varchar join):

更新(避免varchar连接):

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

#2


0  

Others might disagree, but I've had and regularly offer STRAIGHT_JOIN for queries... Once you KNOW the data and the relationships. Being that your WHERE clause is against the "V" table alias and it's "x" value, you are good with the index. Move THAT to the front position, then join from that.

其他人可能不同意,但是我曾经并且经常为查询提供STRAIGHT_JOIN…一旦你知道了数据和关系。因为WHERE子句是针对“V”表别名的,而它是“x”值,所以您可以使用索引。移动到前面的位置,然后加入。

SELECT STRAIGHT_JOIN DISTINCT
      v.`key`,
      u.`val`
   FROM
      VertexDictionary AS v 

         JOIN ConnectedVertices AS c
            ON v.x = c.head_x
            AND v.y = c.head_y

            JOIN VertexDictionary AS u 
               ON c.tail_x = u.x 
               AND c.tail_y = u.y

               JOIN SpecialKeys AS k
                  ON u.x = k.x
                  AND u.key = k.key
   WHERE
      v.x = {some value}      

Curious to know how this realignment works for you

很想知道这个重组对你有什么作用

#3


0  

Try rebuilding the query in stages; or at least give us some more points to identify where the bottlenecks are. Some combinations of the following queries should give you reasonable performance, if it is possible with out modifying the schema or data set.

尝试分阶段重新构建查询;或者至少给我们更多的点来确定瓶颈所在。如果可以通过修改模式或数据集,以下查询的某些组合应该会给您合理的性能。

What is the number of rows and exec times for the following queries for getting the list of suitable tail verticies (ie, that have a SpecialKey)

以下查询的行数和执行次数是多少?

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
    EXISTS (
        SELECT
            1
        FROM
            SpecialKeys sk
        WHERE
            vd.x = sk.x
        AND
            vd.key = sk.key
    )

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
JOIN
    SpecialKeys sk
ON
    vd.x = sk.x
AND
    vd.key = sk.key

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
(vd.x, vd.key) IN (SELECT x, key FROM SpecialKeys)
-- also could try vd.key IN (SELECT sk.key FROM SpecialKeys sk WHERE sk.x = vd.x)

I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

我希望这些回报中的一个要么是小的结果集,要么至少能迅速产生结果。如果低基数和大结果应用不同。

pick the best one from the previous two queries, and add to the next step: joining these suitable 'tails' to 'suitable heads'

从前面两个查询中选出最好的一个,并添加到下一个步骤:将这些合适的“尾”连接到“合适的头部”

SELECT -- DISTINCT
    cv.head_y as y,
    tv.val
FROM
(
    -- ADD SUB QUERY HERE also try nesting the subquery like: (select tail_x, tail_y, val from ([SUBQUERY]) as sq)

) as tv -- tail verticies
JOIN
    ConnectedVerticies cv
ON
    cv.tail_x = tv.tail_x
AND
    cv.tail_y = tv.tail_y
WHERE
    cv.head_x = X -- lets reduce the result set here.

Again, I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

同样,我希望这些回报中的一个要么是小的结果集,要么至少能迅速产生结果。如果低基数和大结果应用不同。

If it's falling over at this point, well there's not much hope of it getting faster to apply the last phase, and best to try a different approach.

如果它在这一点上掉下来了,那么没有多少希望它能更快地应用最后一个阶段,最好尝试一种不同的方法。

As head x is known from the earlier query, we now just need to join on head_y and X to get v.key

正如前面的查询所知道的,我们现在只需要加入head_y和x来获得v.key。

SELECT DISTINCT
    inner_query.val,
    head.key
FROM
(
 -- previous nested subquery behemoth here, again, try a few things that might work.

) as inner_query
JOIN
    VertexDictionary as head
ON
    head.x = X
AND
    head.y = inner_query.y

Another approach, is to get a list of head.key, tail_x, and tail_y from

另一种方法是获取一个人头列表。键,tail_x和tail_y

SELECT -- DISTINCT
    cv.tail_x as x,
    cv.tail_y as y,
    vd.key
FROM
    VertexDictionary vd
JOIN
    ConnectedVerticies cv
ON
    cv.head_x = vd.x
AND
    cv.head_y = vd.y
WHERE
    vd.head_x = X

How long does this take to execute, with & without distinct? how many results (w & w/o distinct)?

执行这个过程需要多长时间?有多少结果(w和w/o不同)?

If it's fast and/or small, try using it as a subquery and joining to either another subquery potentiall of SpecialKeys & VertexDictionary if that's small (ie one of the first three queries if they worked well).

如果是快速和/或小的查询,可以尝试将其用作子查询,如果是小的,可以连接到SpecialKeys & VertexDictionary的另一个子查询potential(即前三个查询中的一个,如果它们运行良好)。

#4


0  

I suspect your problem is everything with the syntax

我怀疑你的问题在于语法

(k.x, k.key) = (u.x, u.key)

(k。x,k.key)=(u。x,u.key)

Can you rewrite as?

你能重写是吗?

k.x = y.x and k.key = u.key

k。x = y。x和k。关键= u.key

When you have a calculation on the left hand side of a clause, the dbms cannot optimize. By setting the comparison as a straight comparison, you may improve your performance.

当在子句的左边进行计算时,dbms不能进行优化。通过将比较设置为直接比较,您可以提高性能。

e.g.

如。

year(my_date) = '2012'

(my_date)= ' 2012 '

is slower than

是低于

'2012' = year(my_date)

‘2012’=年(my_date)

I'm not sure if mysql treats the comparison as a column comparison or as a calculation.

我不确定mysql是否将比较视为列比较还是计算。

Please try modifying your query to do column value comparisons.

请尝试修改查询以进行列值比较。


Second optimization

第二次优化

Also - you are cross joining 4 tables. Multiplication is not additive - is it exponential. Are you sure this is what you intend? You may be better served getting starting with the smallest result set and then join only that result set to the next set.

另外,你是交叉连接4个表。乘法不是加法,而是指数运算。你确定这就是你想要的吗?您最好从最小的结果集开始,然后只将该结果集连接到下一个集合。

select a.c1
from (
select t1.c1
from t1
join t2 on t1.c1 = t2.c1
) a
join t3 on t3.c1 = a.c1

etc...

等等……


third optimization

第三个优化

if option 2 helps, you may want to create indexed views and work from those instead of directly from the tables.

如果选项2有帮助,您可能希望创建索引视图,并从这些视图中工作,而不是直接从表中工作。


fourth optimization

第四次优化

don't use mysql. unless you have a team of dbas constantly monitoring for performance and tweaks, you will run into bad times with mysql. mysql is fine and fast with simple things, but starts sucking very badly if you do anything moderately complex. 4 years ago, i migrated from mysql to sql server express and my 10 minute queries took <2 sec with the same tables, indices, and queries...

不要使用mysql。除非您有一个dba团队不断监视性能和调整,否则您将在使用mysql时遇到困难。mysql在简单的事情上表现得很好,速度也很快,但是如果你做一些稍微复杂的事情,它就会变得非常糟糕。4年前,我从mysql迁移到sql server express,我的10分钟查询花费了<2秒,使用相同的表、索引和查询……

if you want open source, postgres is much smarter than mysql as well

如果你想要开源,postgres也比mysql聪明得多


Create a view that incorporates the first 3 tables that is indexed on the v.key, u.val fields. Then run the query off of the 4th table and the view. Make sure the indices are built on the view before running.

创建一个视图,该视图包含了在v上建立索引的前3个表。键,u。val字段。然后从第4表和视图上运行查询。在运行之前,确保索引构建在视图上。

#5


0  

DISTINCT is often a bad friend. Try to replace it with a GROUP BY. Like this :

与众不同往往是不好的朋友。尝试用GROUP BY替换它。是这样的:

SELECT sub.key, sub.val
FROM (
    SELECT 
      v.key,
      u.val
    FROM
      ConnectedVertices AS c
      JOIN VertexDictionary  AS u ON (u.x, u.y  ) = (c.tail_x, c.tail_y)
      JOIN VertexDictionary  AS v ON (v.x, v.y  ) = (c.head_x, c.head_y)
      JOIN SpecialKeys       AS k ON (k.x, k.key) = (u.x, u.key)
    WHERE (v.x = @X)
) AS sub
GROUP BY sub.key, sub.val

UPDATE:

更新:

Then try the following query which forces the indexes to use:

然后尝试以下查询,强制使用以下索引:

SELECT DISTINCT
  v.key,
  u.val
FROM
  ConnectedVertices AS c USE INDEX (fx,rx)
  JOIN VertexDictionary  AS u USE INDEX (primary) ON (u.x, u.y  ) = (c.tail_x, c.tail_y) 
  JOIN VertexDictionary  AS v USE INDEX (primary) ON (v.x, v.y  ) = (c.head_x, c.head_y)
  JOIN SpecialKeys       AS k USE INDEX (primary) ON (k.x, k.key) = (u.x, u.key)
WHERE (v.x = @X)

If it still not better, try this one :

如果还不够好,试试这个:

SELECT DISTINCT
  v.key,
  u.val
FROM
       ConnectedVertices AS c
  JOIN VertexDictionary  AS u ON (u.x=c.tail_x) AND (u.y=c.tail_y)
  JOIN VertexDictionary  AS v ON (v.x=@X) AND (v.y=c.head_y)
  JOIN SpecialKeys       AS k ON (k.x=u.x) AND (k.key=u.key)
WHERE
  v.x = @X

#6


0  

i don't think that forcing uses of specifique indexes is a good think. the Mysql optimiser has often good estimations.

我不认为强制使用专门化索引是一个好主意。Mysql optimiser通常有很好的估计。

do you have an index on v.x ?

你有关于v的索引吗?x ?

#1


2  

Without spending time testing it, you provided an incomplete example? you should definitely try reordering of joined tables. Explain output provides some info, let's say ordering by key_len should be heuristically fastest. First table to be filtered on should be listed as last in case the optimizer is not able to figure that out, I believe.

没有花时间测试它,您提供了一个不完整的示例吗?您应该尝试重新排序已连接的表。Explain output提供了一些信息,假设key_len的排序应该是启发式最快的。第一个要过滤的表应该列在最后,以防优化器无法解决这个问题,我相信。

So, let's say 'c, v, k, u' order is the best.

假设c v k u的顺序是最好的。

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

'rows' would suggest 'c/u, k, v' order, but that depends on data:

“row”表示“c/u、k、v”顺序,但这取决于数据:

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

Hope this helps.

希望这个有帮助。

UPDATE (avoiding the varchar join):

更新(避免varchar连接):

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

#2


0  

Others might disagree, but I've had and regularly offer STRAIGHT_JOIN for queries... Once you KNOW the data and the relationships. Being that your WHERE clause is against the "V" table alias and it's "x" value, you are good with the index. Move THAT to the front position, then join from that.

其他人可能不同意,但是我曾经并且经常为查询提供STRAIGHT_JOIN…一旦你知道了数据和关系。因为WHERE子句是针对“V”表别名的,而它是“x”值,所以您可以使用索引。移动到前面的位置,然后加入。

SELECT STRAIGHT_JOIN DISTINCT
      v.`key`,
      u.`val`
   FROM
      VertexDictionary AS v 

         JOIN ConnectedVertices AS c
            ON v.x = c.head_x
            AND v.y = c.head_y

            JOIN VertexDictionary AS u 
               ON c.tail_x = u.x 
               AND c.tail_y = u.y

               JOIN SpecialKeys AS k
                  ON u.x = k.x
                  AND u.key = k.key
   WHERE
      v.x = {some value}      

Curious to know how this realignment works for you

很想知道这个重组对你有什么作用

#3


0  

Try rebuilding the query in stages; or at least give us some more points to identify where the bottlenecks are. Some combinations of the following queries should give you reasonable performance, if it is possible with out modifying the schema or data set.

尝试分阶段重新构建查询;或者至少给我们更多的点来确定瓶颈所在。如果可以通过修改模式或数据集,以下查询的某些组合应该会给您合理的性能。

What is the number of rows and exec times for the following queries for getting the list of suitable tail verticies (ie, that have a SpecialKey)

以下查询的行数和执行次数是多少?

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
    EXISTS (
        SELECT
            1
        FROM
            SpecialKeys sk
        WHERE
            vd.x = sk.x
        AND
            vd.key = sk.key
    )

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
JOIN
    SpecialKeys sk
ON
    vd.x = sk.x
AND
    vd.key = sk.key

or

SELECT -- DISTINCT
    vd.x as tail_x, vd.y as tail_y, vd.val
FROM
    VertexDictionary vd
WHERE
(vd.x, vd.key) IN (SELECT x, key FROM SpecialKeys)
-- also could try vd.key IN (SELECT sk.key FROM SpecialKeys sk WHERE sk.x = vd.x)

I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

我希望这些回报中的一个要么是小的结果集,要么至少能迅速产生结果。如果低基数和大结果应用不同。

pick the best one from the previous two queries, and add to the next step: joining these suitable 'tails' to 'suitable heads'

从前面两个查询中选出最好的一个,并添加到下一个步骤:将这些合适的“尾”连接到“合适的头部”

SELECT -- DISTINCT
    cv.head_y as y,
    tv.val
FROM
(
    -- ADD SUB QUERY HERE also try nesting the subquery like: (select tail_x, tail_y, val from ([SUBQUERY]) as sq)

) as tv -- tail verticies
JOIN
    ConnectedVerticies cv
ON
    cv.tail_x = tv.tail_x
AND
    cv.tail_y = tv.tail_y
WHERE
    cv.head_x = X -- lets reduce the result set here.

Again, I'm hoping that one of these returns either small result set, or are at least quick to produce results. if low cardinality & large results apply distinct.

同样,我希望这些回报中的一个要么是小的结果集,要么至少能迅速产生结果。如果低基数和大结果应用不同。

If it's falling over at this point, well there's not much hope of it getting faster to apply the last phase, and best to try a different approach.

如果它在这一点上掉下来了,那么没有多少希望它能更快地应用最后一个阶段,最好尝试一种不同的方法。

As head x is known from the earlier query, we now just need to join on head_y and X to get v.key

正如前面的查询所知道的,我们现在只需要加入head_y和x来获得v.key。

SELECT DISTINCT
    inner_query.val,
    head.key
FROM
(
 -- previous nested subquery behemoth here, again, try a few things that might work.

) as inner_query
JOIN
    VertexDictionary as head
ON
    head.x = X
AND
    head.y = inner_query.y

Another approach, is to get a list of head.key, tail_x, and tail_y from

另一种方法是获取一个人头列表。键,tail_x和tail_y

SELECT -- DISTINCT
    cv.tail_x as x,
    cv.tail_y as y,
    vd.key
FROM
    VertexDictionary vd
JOIN
    ConnectedVerticies cv
ON
    cv.head_x = vd.x
AND
    cv.head_y = vd.y
WHERE
    vd.head_x = X

How long does this take to execute, with & without distinct? how many results (w & w/o distinct)?

执行这个过程需要多长时间?有多少结果(w和w/o不同)?

If it's fast and/or small, try using it as a subquery and joining to either another subquery potentiall of SpecialKeys & VertexDictionary if that's small (ie one of the first three queries if they worked well).

如果是快速和/或小的查询,可以尝试将其用作子查询,如果是小的,可以连接到SpecialKeys & VertexDictionary的另一个子查询potential(即前三个查询中的一个,如果它们运行良好)。

#4


0  

I suspect your problem is everything with the syntax

我怀疑你的问题在于语法

(k.x, k.key) = (u.x, u.key)

(k。x,k.key)=(u。x,u.key)

Can you rewrite as?

你能重写是吗?

k.x = y.x and k.key = u.key

k。x = y。x和k。关键= u.key

When you have a calculation on the left hand side of a clause, the dbms cannot optimize. By setting the comparison as a straight comparison, you may improve your performance.

当在子句的左边进行计算时,dbms不能进行优化。通过将比较设置为直接比较,您可以提高性能。

e.g.

如。

year(my_date) = '2012'

(my_date)= ' 2012 '

is slower than

是低于

'2012' = year(my_date)

‘2012’=年(my_date)

I'm not sure if mysql treats the comparison as a column comparison or as a calculation.

我不确定mysql是否将比较视为列比较还是计算。

Please try modifying your query to do column value comparisons.

请尝试修改查询以进行列值比较。


Second optimization

第二次优化

Also - you are cross joining 4 tables. Multiplication is not additive - is it exponential. Are you sure this is what you intend? You may be better served getting starting with the smallest result set and then join only that result set to the next set.

另外,你是交叉连接4个表。乘法不是加法,而是指数运算。你确定这就是你想要的吗?您最好从最小的结果集开始,然后只将该结果集连接到下一个集合。

select a.c1
from (
select t1.c1
from t1
join t2 on t1.c1 = t2.c1
) a
join t3 on t3.c1 = a.c1

etc...

等等……


third optimization

第三个优化

if option 2 helps, you may want to create indexed views and work from those instead of directly from the tables.

如果选项2有帮助,您可能希望创建索引视图,并从这些视图中工作,而不是直接从表中工作。


fourth optimization

第四次优化

don't use mysql. unless you have a team of dbas constantly monitoring for performance and tweaks, you will run into bad times with mysql. mysql is fine and fast with simple things, but starts sucking very badly if you do anything moderately complex. 4 years ago, i migrated from mysql to sql server express and my 10 minute queries took <2 sec with the same tables, indices, and queries...

不要使用mysql。除非您有一个dba团队不断监视性能和调整,否则您将在使用mysql时遇到困难。mysql在简单的事情上表现得很好,速度也很快,但是如果你做一些稍微复杂的事情,它就会变得非常糟糕。4年前,我从mysql迁移到sql server express,我的10分钟查询花费了<2秒,使用相同的表、索引和查询……

if you want open source, postgres is much smarter than mysql as well

如果你想要开源,postgres也比mysql聪明得多


Create a view that incorporates the first 3 tables that is indexed on the v.key, u.val fields. Then run the query off of the 4th table and the view. Make sure the indices are built on the view before running.

创建一个视图,该视图包含了在v上建立索引的前3个表。键,u。val字段。然后从第4表和视图上运行查询。在运行之前,确保索引构建在视图上。

#5


0  

DISTINCT is often a bad friend. Try to replace it with a GROUP BY. Like this :

与众不同往往是不好的朋友。尝试用GROUP BY替换它。是这样的:

SELECT sub.key, sub.val
FROM (
    SELECT 
      v.key,
      u.val
    FROM
      ConnectedVertices AS c
      JOIN VertexDictionary  AS u ON (u.x, u.y  ) = (c.tail_x, c.tail_y)
      JOIN VertexDictionary  AS v ON (v.x, v.y  ) = (c.head_x, c.head_y)
      JOIN SpecialKeys       AS k ON (k.x, k.key) = (u.x, u.key)
    WHERE (v.x = @X)
) AS sub
GROUP BY sub.key, sub.val

UPDATE:

更新:

Then try the following query which forces the indexes to use:

然后尝试以下查询,强制使用以下索引:

SELECT DISTINCT
  v.key,
  u.val
FROM
  ConnectedVertices AS c USE INDEX (fx,rx)
  JOIN VertexDictionary  AS u USE INDEX (primary) ON (u.x, u.y  ) = (c.tail_x, c.tail_y) 
  JOIN VertexDictionary  AS v USE INDEX (primary) ON (v.x, v.y  ) = (c.head_x, c.head_y)
  JOIN SpecialKeys       AS k USE INDEX (primary) ON (k.x, k.key) = (u.x, u.key)
WHERE (v.x = @X)

If it still not better, try this one :

如果还不够好,试试这个:

SELECT DISTINCT
  v.key,
  u.val
FROM
       ConnectedVertices AS c
  JOIN VertexDictionary  AS u ON (u.x=c.tail_x) AND (u.y=c.tail_y)
  JOIN VertexDictionary  AS v ON (v.x=@X) AND (v.y=c.head_y)
  JOIN SpecialKeys       AS k ON (k.x=u.x) AND (k.key=u.key)
WHERE
  v.x = @X

#6


0  

i don't think that forcing uses of specifique indexes is a good think. the Mysql optimiser has often good estimations.

我不认为强制使用专门化索引是一个好主意。Mysql optimiser通常有很好的估计。

do you have an index on v.x ?

你有关于v的索引吗?x ?