为什么并行执行查询时MySQL的性能会下降?

时间:2022-01-15 22:45:36

The Question: Why MySQL performance goes down for queries joining nearly empty tables when executed in parallel?

问题是:为什么并行执行时,连接几乎空表的查询的MySQL性能会下降?

Below is more detailed explanation of the issue I'm facing. I have two tables in MySQL

下面是对我面临的问题的更详细的解释。我在MySQL中有两个表

CREATE TABLE first (
    num int(10) NOT NULL,
    UNIQUE KEY key_num (num)
) ENGINE=InnoDB

CREATE TABLE second (
    num int(10) NOT NULL,
    num2 int(10) NOT NULL,
    UNIQUE KEY key_num (num, num2)
) ENGINE=InnoDB

The first one contains about a thousand records. The second one is empty or contains a very few records. It also contains double index which somehow relates to the issue: the problem goes away for single index. Now I'm trying to make a lot of identical queries to those tables in parallel. Each query looks like this:

第一个包含了大约一千条记录。第二个是空的或包含很少的记录。它还包含与问题相关的双索引:单个索引的问题就不存在了。现在我试着对这些表进行很多相同的查询。每个查询如下:

SELECT first.num
FROM first
LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
  AND second_2.num IS NULL
  AND second_3.num IS NULL
  AND second_4.num IS NULL
  AND second_5.num IS NULL
  AND second_6.num IS NULL

The issue I'm getting is that instead of having a nearly linear raise in performance on 8 core machine I actually have a drop. Namely having one process, the typical number of requests per second I have is about 200. Having two processes instead of expected increase up to 300 - 400 queries per second I actually have a drop down to 150. For 10 processes I have only 70 queries per seconds. The Perl code I'm using for testing is shown below:

我得到的问题是,8核机器的性能不是直线上升而是下降。也就是说,有一个进程,每秒的请求数通常是200个。有两个进程而不是预期的增加到每秒300 - 400个查询,我实际上有一个下降到150。对于10个进程,每秒钟只有70个查询。我用于测试的Perl代码如下所示:

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use Parallel::Benchmark;
use SQL::Abstract;
use SQL::Abstract::Plugin::InsertMulti;

my $children_dbh;

foreach my $second_table_row_count (0, 1, 1000) {
    print '#' x 80, "\nsecond_table_row_count = $second_table_row_count\n";
    create_and_fill_tables(1000, $second_table_row_count);
    foreach my $concurrency (1, 2, 3, 4, 6, 8, 10, 20) {
        my $bm = Parallel::Benchmark->new(
            'benchmark' => sub {
                _run_sql();
                return 1;
            },
            'concurrency' => $concurrency,
            'time' => 3,
        );
        my $result = $bm->run();
    }
}

sub create_and_fill_tables {
    my ($first_table_row_count, $second_table_row_count) = @_;
    my $dbh = dbi_connect();
    {
        $dbh->do(q{DROP TABLE IF EXISTS first});
        $dbh->do(q{
            CREATE TABLE first (
                num int(10) NOT NULL,
                UNIQUE KEY key_num (num)
            ) ENGINE=InnoDB
        });
        if ($first_table_row_count) {
            my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
                'first',
                ['num'],
                [map {[$_]} 1 .. $first_table_row_count],
            );
            $dbh->do($stmt, undef, @bind);
        }
    }
    {
        $dbh->do(q{DROP TABLE IF EXISTS second});
        $dbh->do(q{
            CREATE TABLE second (
                num int(10) NOT NULL,
                num2 int(10) NOT NULL,
                UNIQUE KEY key_num (num, num2)
            ) ENGINE=InnoDB
        });
        if ($second_table_row_count) {
            my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
                'second',
                ['num'],
                [map {[$_]} 1 .. $second_table_row_count],
            );
            $dbh->do($stmt, undef, @bind);
        }
    }
}

sub _run_sql {
    $children_dbh ||= dbi_connect();
    $children_dbh->selectall_arrayref(q{
        SELECT first.num
        FROM first
        LEFT JOIN second AS second_1 ON second_1.num = -1
        LEFT JOIN second AS second_2 ON second_2.num = -2
        LEFT JOIN second AS second_3 ON second_3.num = -3
        LEFT JOIN second AS second_4 ON second_4.num = -4
        LEFT JOIN second AS second_5 ON second_5.num = -5
        LEFT JOIN second AS second_6 ON second_6.num = -6
        WHERE second_1.num IS NULL
          AND second_2.num IS NULL
          AND second_3.num IS NULL
          AND second_4.num IS NULL
          AND second_5.num IS NULL
          AND second_6.num IS NULL
    });
}

sub dbi_connect {
    return DBI->connect(
        'dbi:mysql:'
            . 'database=tmp'
            . ';host=localhost'
            . ';port=3306',
        'root',
        '',
    );
}

And for compare queries like this executed in concurrent with increasing performance:

对于与提高性能同时执行的类似查询进行比较:

SELECT first.num
FROM first
LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
  AND second_2.num IS NOT NULL
  AND second_3.num IS NOT NULL
  AND second_4.num IS NOT NULL
  AND second_5.num IS NOT NULL
  AND second_6.num IS NOT NULL

Testing results, cpu and disk usage measurements are here:

测试结果、cpu和磁盘使用情况测量如下:

* table `first` have 1000 rows
* table `second` have 6 rows: `[1,1],[2,2],..[6,6]`

For query:
    SELECT first.num
    FROM first
    LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key
    LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key
    LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key
    LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key
    LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key
    LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key
    WHERE second_1.num IS NULL
      AND second_2.num IS NULL
      AND second_3.num IS NULL
      AND second_4.num IS NULL
      AND second_5.num IS NULL
      AND second_6.num IS NULL

Results:
    concurrency: 1,     speed: 162.910 / sec
    concurrency: 2,     speed: 137.818 / sec
    concurrency: 3,     speed: 130.728 / sec
    concurrency: 4,     speed: 107.387 / sec
    concurrency: 6,     speed: 90.513 / sec
    concurrency: 8,     speed: 80.445 / sec
    concurrency: 10,    speed: 80.381 / sec
    concurrency: 20,    speed: 84.069 / sec

System usage after for last 60 minutes of running query in 6 processes:
    $ iostat -cdkx 60

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              74.82    0.00    0.08    0.00    0.08   25.02

    Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda1              0.00     0.00    0.00    0.12     0.00     0.80    13.71     0.00    1.43   1.43   0.02
    sdf10             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf4              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   30.00  15.00   0.05
    sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf8              0.00     0.00    0.00    0.37     0.00     1.24     6.77     0.00    5.00   3.18   0.12
    sdf6              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf9              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00    0.00   0.00   0.00
    sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf3              0.00     0.00    0.00    0.08     0.00     1.33    32.00     0.00    4.00   4.00   0.03
    sdf2              0.00     0.00    0.00    0.17     0.00     1.37    16.50     0.00    3.00   3.00   0.05
    sdf15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf14             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf1              0.00     0.00    0.00    0.05     0.00     0.40    16.00     0.00    0.00   0.00   0.00
    sdf13             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf5              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   50.00  25.00   0.08
    sdm2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdm1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf12             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf11             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf7              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    md0               0.00     0.00    0.00    0.97     0.00    13.95    28.86     0.00    0.00   0.00   0.00

################################################################################

For query:
    SELECT first.num
    FROM first
    LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key
    LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key
    LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key
    LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key
    LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key
    LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key
    WHERE second_1.num IS NOT NULL
      AND second_2.num IS NOT NULL
      AND second_3.num IS NOT NULL
      AND second_4.num IS NOT NULL
      AND second_5.num IS NOT NULL
      AND second_6.num IS NOT NULL

Results:
    concurrency: 1,     speed: 875.973 / sec
    concurrency: 2,     speed: 944.986 / sec
    concurrency: 3,     speed: 1256.072 / sec
    concurrency: 4,     speed: 1401.657 / sec
    concurrency: 6,     speed: 1354.351 / sec
    concurrency: 8,     speed: 1110.100 / sec
    concurrency: 10,    speed: 1145.251 / sec
    concurrency: 20,    speed: 1142.514 / sec

System usage after for last 60 minutes of running query in 6 processes:
    $ iostat -cdkx 60

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              74.40    0.00    0.53    0.00    0.06   25.01

    Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
    sda1              0.00     0.00    0.00    0.02     0.00     0.13    16.00     0.00    0.00   0.00   0.00
    sdf10             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf4              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf8              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf6              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00    0.00   0.00   0.00
    sdf9              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf3              0.00     0.00    0.00    0.13     0.00     2.67    40.00     0.00    3.75   2.50   0.03
    sdf2              0.00     0.00    0.00    0.23     0.00     2.72    23.29     0.00    2.14   1.43   0.03
    sdf15             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf14             0.00     0.00    0.00    0.98     0.00     0.54     1.10     0.00    2.71   2.71   0.27
    sdf1              0.00     0.00    0.00    0.08     0.00     1.47    35.20     0.00    8.00   6.00   0.05
    sdf13             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf5              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    sdm2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdm1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
    sdf11             0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00    0.00   0.00   0.00
    sdf7              0.00     0.00    0.00    0.03     0.00     1.07    64.00     0.00   10.00   5.00   0.02
    md0               0.00     0.00    0.00    1.70     0.00    15.92    18.74     0.00    0.00   0.00   0.00

################################################################################

And this server has lots of free memory. Example of top:
    top - 19:02:59 up  4:23,  4 users,  load average: 4.43, 3.03, 2.01
    Tasks: 218 total,   1 running, 217 sleeping,   0 stopped,   0 zombie
    Cpu(s): 72.8%us,  0.7%sy,  0.0%ni, 26.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.1%st
    Mem:  71701416k total, 22183980k used, 49517436k free,      284k buffers
    Swap:        0k total,        0k used,        0k free,  1282768k cached

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     2506 mysql     20   0 51.7g  17g 5920 S  590 25.8 213:15.12 mysqld
     9348 topadver  20   0 72256  11m 1428 S    2  0.0   0:01.45 perl
     9349 topadver  20   0 72256  11m 1428 S    2  0.0   0:01.44 perl
     9350 topadver  20   0 72256  11m 1428 S    2  0.0   0:01.45 perl
     9351 topadver  20   0 72256  11m 1428 S    1  0.0   0:01.44 perl
     9352 topadver  20   0 72256  11m 1428 S    1  0.0   0:01.44 perl
     9353 topadver  20   0 72256  11m 1428 S    1  0.0   0:01.44 perl
     9346 topadver  20   0 19340 1504 1064 R    0  0.0   0:01.89 top

Does anyone have an idea why performance decreased for query with non-existent keys?

有人知道为什么用不存在的键进行查询会降低性能吗?

2 个解决方案

#1


8  

Well written question, that shows some research.

写得很好,这显示了一些研究。

Out of curiosity, I tried MySQL 5.6 to see what the tooling there has to say about these queries.

出于好奇,我尝试了MySQL 5.6,看看那里的工具对这些查询有什么看法。

First, note that the queries are different:

首先,注意查询是不同的:

  • changing the value from "1" to "-1" for the existent / non existent key case is one thing
  • 将现有/不存在的键的值从“1”更改为“-1”是一回事
  • changing "second_1.num IS NOT NULL" to "second_1.num IS NULL" in the WHERE clause is another.
  • 改变“second_1。num不是空的“到”second_1。num在WHERE子句中是NULL。

Using EXPLAIN gives different plans:

使用EXPLAIN会给出不同的计划:

EXPLAIN SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  first   index   NULL    key_num 4       NULL    1000    Using index
1       SIMPLE  second_1        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_2        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_3        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_4        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_5        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_6        ref     key_num key_num 4       const   1       Using where; Not exists; Using index

as opposed to

而不是

EXPLAIN SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
AND second_2.num IS NOT NULL
AND second_3.num IS NOT NULL
AND second_4.num IS NOT NULL
AND second_5.num IS NOT NULL
AND second_6.num IS NOT NULL
;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  second_1        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_2        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_3        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_4        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_5        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_6        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  first   index   NULL    key_num 4       NULL    1000    Using index; Using join buffer (Block Nested Loop)

Using the JSON format, we have:

使用JSON格式,我们有:

EXPLAIN FORMAT=JSON SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "first",
          "access_type": "index",
          "key": "key_num",
          "key_length": "4",
          "rows": 1000,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_1",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_1), isnull(`test`.`second_1`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_2",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_2), isnull(`test`.`second_2`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_3",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_3), isnull(`test`.`second_3`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_4",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_4), isnull(`test`.`second_4`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_5",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_5), isnull(`test`.`second_5`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_6",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_6), isnull(`test`.`second_6`.`num`), true)"
        }
      }
    ]
  }
}

as opposed to

而不是

EXPLAIN FORMAT=JSON SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
AND second_2.num IS NOT NULL
AND second_3.num IS NOT NULL
AND second_4.num IS NOT NULL
AND second_5.num IS NOT NULL
AND second_6.num IS NOT NULL
;
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "second_1",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_2",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_3",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_4",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_5",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_6",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "first",
          "access_type": "index",
          "key": "key_num",
          "key_length": "4",
          "rows": 1000,
          "filtered": 100,
          "using_index": true,
          "using_join_buffer": "Block Nested Loop"
        }
      }
    ]
  }
}

Looking at the table io instrumented by the performance schema at runtime, we have:

看看运行时由性能模式检测的表io,我们有:

truncate table performance_schema.objects_summary_global_by_type;
select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   0   0   0   0   0
TABLE   test    second  0   0   0   0   0
SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
(...)
select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   1003    5705014442  1026171 5687889 87356557
TABLE   test    second  6012    271786533972    537266  45207298    1123939292

as opposed to:

而不是:

select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   1003    5211074603  969338  5195454 61066176
TABLE   test    second  24  458656783   510085  19110361    66229860

The query that scales does almost no table IO in table second. The query that does not scale does 6K table IO in table second, or 6 times the size of table first.

扩展的查询在表秒中几乎不执行表IO。不缩放的查询在表2中是6K表IO,或者是表1的6倍。

This is because the query plans are different, in turn because the queries are different (IS NOT NULL versus IS NULL).

这是因为查询计划是不同的,因为查询是不同的(不是NULL,而不是NULL)。

I think that answers the performance related question.

我认为这回答了性能相关的问题。

Note that both queries returned 1000 rows in my tests, which may not be what you want. Before tuning a query to make it faster, make sure it works as expected.

注意,这两个查询在我的测试中都返回了1000行,这可能不是您想要的。在优化查询以使其更快之前,请确保它按照预期工作。

#2


1  

I'll suggest trying an approach where each fork uses its own connection (it looks to me that right now $children_dbh, which holds a DB connection, is a shared variable). Or, even better, implement so-called connection pool, from which each client process will take a connection at a time when it's required, and will 'give it back' when it's no longer needed.

我建议尝试一种方法,让每个fork都使用自己的连接(在我看来,现在拥有DB连接的$children_dbh是一个共享变量)。或者,更好的做法是实现所谓的连接池,每个客户端进程在需要的时候都会从这个连接中获取一个连接,在不再需要的时候会“归还”它。

Check this answer for further details: the thread where it was given is about Java, but it's actually about some universal principles of MySQL organization. And this answer might be useful as well.

检查这个答案以获得更多的细节:给出它的线程是关于Java的,但是它实际上是关于MySQL组织的一些通用原则。这个答案可能也很有用。

P.S. Somewhat similar situation (I think) is described here, and there's a detailed explanation of how to organize a connection pool.

这里描述了类似的情况(我认为),并详细说明了如何组织连接池。

#1


8  

Well written question, that shows some research.

写得很好,这显示了一些研究。

Out of curiosity, I tried MySQL 5.6 to see what the tooling there has to say about these queries.

出于好奇,我尝试了MySQL 5.6,看看那里的工具对这些查询有什么看法。

First, note that the queries are different:

首先,注意查询是不同的:

  • changing the value from "1" to "-1" for the existent / non existent key case is one thing
  • 将现有/不存在的键的值从“1”更改为“-1”是一回事
  • changing "second_1.num IS NOT NULL" to "second_1.num IS NULL" in the WHERE clause is another.
  • 改变“second_1。num不是空的“到”second_1。num在WHERE子句中是NULL。

Using EXPLAIN gives different plans:

使用EXPLAIN会给出不同的计划:

EXPLAIN SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  first   index   NULL    key_num 4       NULL    1000    Using index
1       SIMPLE  second_1        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_2        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_3        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_4        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_5        ref     key_num key_num 4       const   1       Using where; Not exists; Using index
1       SIMPLE  second_6        ref     key_num key_num 4       const   1       Using where; Not exists; Using index

as opposed to

而不是

EXPLAIN SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
AND second_2.num IS NOT NULL
AND second_3.num IS NOT NULL
AND second_4.num IS NOT NULL
AND second_5.num IS NOT NULL
AND second_6.num IS NOT NULL
;
id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  second_1        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_2        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_3        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_4        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_5        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  second_6        ref     key_num key_num 4       const   1       Using index
1       SIMPLE  first   index   NULL    key_num 4       NULL    1000    Using index; Using join buffer (Block Nested Loop)

Using the JSON format, we have:

使用JSON格式,我们有:

EXPLAIN FORMAT=JSON SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "first",
          "access_type": "index",
          "key": "key_num",
          "key_length": "4",
          "rows": 1000,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_1",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_1), isnull(`test`.`second_1`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_2",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_2), isnull(`test`.`second_2`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_3",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_3), isnull(`test`.`second_3`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_4",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_4), isnull(`test`.`second_4`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_5",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_5), isnull(`test`.`second_5`.`num`), true)"
        }
      },
      {
        "table": {
          "table_name": "second_6",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "not_exists": true,
          "using_index": true,
          "attached_condition": "<if>(found_match(second_6), isnull(`test`.`second_6`.`num`), true)"
        }
      }
    ]
  }
}

as opposed to

而不是

EXPLAIN FORMAT=JSON SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN `second` AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN `second` AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN `second` AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN `second` AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN `second` AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
AND second_2.num IS NOT NULL
AND second_3.num IS NOT NULL
AND second_4.num IS NOT NULL
AND second_5.num IS NOT NULL
AND second_6.num IS NOT NULL
;
EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "nested_loop": [
      {
        "table": {
          "table_name": "second_1",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_2",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_3",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_4",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_5",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "second_6",
          "access_type": "ref",
          "possible_keys": [
            "key_num"
          ],
          "key": "key_num",
          "key_length": "4",
          "ref": [
            "const"
          ],
          "rows": 1,
          "filtered": 100,
          "using_index": true
        }
      },
      {
        "table": {
          "table_name": "first",
          "access_type": "index",
          "key": "key_num",
          "key_length": "4",
          "rows": 1000,
          "filtered": 100,
          "using_index": true,
          "using_join_buffer": "Block Nested Loop"
        }
      }
    ]
  }
}

Looking at the table io instrumented by the performance schema at runtime, we have:

看看运行时由性能模式检测的表io,我们有:

truncate table performance_schema.objects_summary_global_by_type;
select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   0   0   0   0   0
TABLE   test    second  0   0   0   0   0
SELECT `first`.num
FROM `first`
LEFT JOIN `second` AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN `second` AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN `second` AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN `second` AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN `second` AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN `second` AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
;
(...)
select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   1003    5705014442  1026171 5687889 87356557
TABLE   test    second  6012    271786533972    537266  45207298    1123939292

as opposed to:

而不是:

select * from performance_schema.objects_summary_global_by_type
where OBJECT_NAME in ("first", "second");
OBJECT_TYPE OBJECT_SCHEMA   OBJECT_NAME COUNT_STAR  SUM_TIMER_WAIT  MIN_TIMER_WAIT  AVG_TIMER_WAIT  MAX_TIMER_WAIT
TABLE   test    first   1003    5211074603  969338  5195454 61066176
TABLE   test    second  24  458656783   510085  19110361    66229860

The query that scales does almost no table IO in table second. The query that does not scale does 6K table IO in table second, or 6 times the size of table first.

扩展的查询在表秒中几乎不执行表IO。不缩放的查询在表2中是6K表IO,或者是表1的6倍。

This is because the query plans are different, in turn because the queries are different (IS NOT NULL versus IS NULL).

这是因为查询计划是不同的,因为查询是不同的(不是NULL,而不是NULL)。

I think that answers the performance related question.

我认为这回答了性能相关的问题。

Note that both queries returned 1000 rows in my tests, which may not be what you want. Before tuning a query to make it faster, make sure it works as expected.

注意,这两个查询在我的测试中都返回了1000行,这可能不是您想要的。在优化查询以使其更快之前,请确保它按照预期工作。

#2


1  

I'll suggest trying an approach where each fork uses its own connection (it looks to me that right now $children_dbh, which holds a DB connection, is a shared variable). Or, even better, implement so-called connection pool, from which each client process will take a connection at a time when it's required, and will 'give it back' when it's no longer needed.

我建议尝试一种方法,让每个fork都使用自己的连接(在我看来,现在拥有DB连接的$children_dbh是一个共享变量)。或者,更好的做法是实现所谓的连接池,每个客户端进程在需要的时候都会从这个连接中获取一个连接,在不再需要的时候会“归还”它。

Check this answer for further details: the thread where it was given is about Java, but it's actually about some universal principles of MySQL organization. And this answer might be useful as well.

检查这个答案以获得更多的细节:给出它的线程是关于Java的,但是它实际上是关于MySQL组织的一些通用原则。这个答案可能也很有用。

P.S. Somewhat similar situation (I think) is described here, and there's a detailed explanation of how to organize a connection pool.

这里描述了类似的情况(我认为),并详细说明了如何组织连接池。