简单的子查询在巨大的桌子上减速

时间:2022-05-06 01:08:57

so I got a mysql database with 2 tables, one (sd_clients) with about 24k entries:

所以我得到了一个带有2个表的mysql数据库,其中一个(sd_clients)有大约24k个条目:

CREATE TABLE `sd_clients` (
  `ms_id` varchar(10) NOT NULL,
  `ms_share_id` varchar(10) NOT NULL,
  `short_name` varchar(25) DEFAULT NULL,
  `standard_name` varchar(75) DEFAULT NULL,
  `legal_name` varchar(150) DEFAULT NULL,
  `country` varchar(4) DEFAULT NULL,
  `status` tinyint(1) DEFAULT NULL COMMENT '1=Paid Client | 2=Non-Paid Client',
  `user_id` int(11) DEFAULT NULL,
  `summary` text,
  `sector` int(4) DEFAULT NULL,
  `sub_sector` int(4) DEFAULT NULL,
  `business_country` char(3) DEFAULT NULL,
  `created_at` date DEFAULT NULL,
  `is_paid` int(1) NOT NULL DEFAULT '0' COMMENT '0 = Non-Paid Client | 1=Paid Client',
  `description_en` text,
  `description_zh-hans` text,
  `description_zh-hant` text,
  `highlights_en` text,
  `highlights_zh-hans` text,
  `highlights_zh-hant` text,
  `logo` varchar(255) DEFAULT NULL,
  `summary_subsection_title_en` varchar(500) DEFAULT NULL,
  `summary_subsection_title_zh-hans` varchar(500) DEFAULT NULL,
  `summary_subsection_title_zh-hant` varchar(500) DEFAULT NULL,
  `summary_subsection_text_en` text,
  `summary_subsection_text_zh-hans` text,
  `summary_subsection_text_zh-hant` text,
  `summary_short_en` varchar(2000) DEFAULT NULL,
  `summary_short_zh-hans` varchar(2000) DEFAULT NULL,
  `summary_short_zh-hant` varchar(2000) DEFAULT NULL,
  `other_information_en` text,
  `other_information_zh-hans` text,
  `other_information_zh-hant` text,
  `change_percentage` decimal(10,3) DEFAULT NULL,
  `id_sector` bigint(3) DEFAULT NULL,
  `id_subsector` bigint(3) DEFAULT NULL,
  `background_info_en` text,
  `background_info_zh-hans` text,
  `background_info_zh-hant` text,
  `share_id_displayed` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`ms_id`) KEY_BLOCK_SIZE=1024,
  UNIQUE KEY `ms_id` (`ms_id`) KEY_BLOCK_SIZE=1024,
  KEY `share_id_displayed` (`share_id_displayed`) KEY_BLOCK_SIZE=1024
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
SET FOREIGN_KEY_CHECKS=1;

And another called sd_clients_daily_stocks, with about 50 million entries:

另一个名为sd_clients_daily_stocks,约有5000万条:

CREATE TABLE `sd_clients_daily_stocks` (
  `ms_id` varchar(10) NOT NULL,
  `ms_share_id` varchar(10) DEFAULT NULL,
  `created_at` date DEFAULT NULL,
  `symbol` varchar(32) DEFAULT NULL,
  `exchange_id` char(5) DEFAULT NULL,
  `volume` bigint(18) DEFAULT NULL,
  `day_low` decimal(19,6) DEFAULT NULL,
  `day_high` decimal(19,6) DEFAULT NULL,
  `market_cap` bigint(18) DEFAULT NULL,
  `open_price` decimal(19,6) DEFAULT NULL,
  `close_price` decimal(19,6) DEFAULT NULL,
  `enterprise_value` bigint(18) DEFAULT NULL,
  `currency_id` char(3) DEFAULT NULL,
  `valoren` varchar(20) DEFAULT NULL,
  `cusip` char(9) DEFAULT NULL,
  `isin` varchar(12) DEFAULT NULL,
  `sedol` varchar(7) DEFAULT NULL,
  `ipo_date` date DEFAULT NULL,
  `is_depositary_receipt` tinyint(1) DEFAULT NULL,
  `depositary_receipt_ratio` decimal(9,4) DEFAULT NULL,
  `security_type` char(10) DEFAULT NULL,
  `share_class_description` varchar(1000) DEFAULT NULL,
  `share_class_status` char(1) DEFAULT NULL,
  `is_primary_share` tinyint(1) DEFAULT NULL,
  `is_dividend_reinvest` tinyint(1) DEFAULT NULL,
  `is_direct_invest` tinyint(1) DEFAULT NULL,
  `investment_id` char(10) DEFAULT NULL,
  `ipo_offer_price` decimal(19,6) DEFAULT NULL,
  `delisting_date` date DEFAULT NULL,
  `delisting_reason` varchar(100) DEFAULT NULL,
  `mic` char(10) DEFAULT NULL,
  `common_share_sub_type` varchar(32) DEFAULT NULL,
  `ipo_offer_price_range` varchar(32) DEFAULT NULL,
  `exchange_sub_market_global_id` char(10) DEFAULT NULL,
  `conversion_ratio` decimal(19,9) DEFAULT NULL,
  KEY `ms_id` (`ms_id`) USING HASH,
  KEY `ms_share_id` (`ms_share_id`) USING HASH,
  KEY `symbol` (`symbol`),
  KEY `exchange_id` (`exchange_id`),
  KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SET FOREIGN_KEY_CHECKS=1;

I'm trying to run a fairly simple query:

我正在尝试运行一个相当简单的查询:

SELECT DISTINCT
	sd_clients.ms_id, 
	sd_clients.standard_name, 
	sd_clients.is_paid, 
	sd_clients.logo,
	sd_clients.change_percentage,
	(
		SELECT 
			CONCAT(
					`exchange_id`, '|--|', 
					`symbol`, '|--|', 
					`close_price`, '|--|', 
					`day_low`, '|--|', 
					`day_high`
			) as items
		FROM sd_clients_daily_stocks 
		WHERE ms_share_id = sd_clients.share_id_displayed 
		ORDER BY created_at DESC 
		LIMIT 1
	) as company_data
FROM sd_clients 
GROUP BY ms_id 
ORDER BY sd_clients.standard_name ASC
LIMIT 10

But for some reason, it's taking way too long (like over 1 minute), to get any results, any idea why?

但出于某种原因,它花了太长时间(比如超过1分钟),得到任何结果,任何想法为什么?

BTW, it works just fine if I remove the subquery, but I need it because the rest of the data, is in another table. Also, I know I could get the results without the subquery first, but I have other queries where the subquery must be there.

顺便说一句,如果删除子查询,它的工作正常,但我需要它,因为其余的数据,在另一个表中。另外,我知道我可以在没有子查询的情况下获得结果,但我还有其他查询,其中子查询必须在那里。

I also noticed that it gets blazing fast if I use a string instead of "sd_clients.share_id_displayed" on the subquery.

我还注意到,如果我在子查询上使用字符串而不是“sd_clients.share_id_displayed”,它会变得非常快。

2 个解决方案

#1


1  

You should try an index on sd_clients_daily_stocks(ms_share_id, created_at).

您应该尝试sd_clients_daily_stocks(ms_share_id,created_at)的索引。

You can add the additional columns from the select if you want a covering index.

如果需要覆盖索引,可以从select中添加其他列。

#2


0  

You would probably be better off joining on a non-correlated subquery based off your current subquery; using this kind of technique here to find the most recent rows for each in the new subquery.

根据您当前的子查询加入非相关子查询可能会更好;在这里使用这种技术来查找新子查询中每个的最新行。


Edit: I was thinking more something like this:

编辑:我想的更像是这样的:

SELECT DISTINCT sd_clients.ms_id, sd_clients.standard_name, sd_clients.is_paid, sd_clients.logo, sd_clients.change_percentage
    , scdsB.items
FROM sd_clients 
INNER JOIN ( 
    SELECT scdsA.ms_share_id
        , CONCAT(
            scdsA.`exchange_id`, '|--|', scdsA.`symbol`, '|--|', 
            scdsA.`close_price`, '|--|', scdsA.`day_low`, '|--|', scdsA.`day_high`
        ) as items
    FROM sd_clients_daily_stocks AS scdsA
    INNER JOIN (
        SELECT ms_share_id, MAX(created_at)
        FROM sd_clients_daily_stocks
        GROUP BY ms_share_id 
    ) AS lasts
    ON scdsA.ms_share_id = lasts.ms_share_id
    AND scdsA.created_at = lasts.created_at
) scdsB 
ON sd_clients.share_id_displayed = scdsB.ms_share_id 
GROUP BY sd_clients.ms_id 
ORDER BY sd_clients.standard_name ASC 
LIMIT 10;

... But even this likely won't reduce the speed much more, if any, from the 12 seconds. At that point you are better off looking into indexes that could help. For example, the lasts grouping subquery that finds the most recent(max) created_at value for each ms_share_id value would benefit from an index on sd_clients_daily_stocks (ms_shared_id, created_at) as would the the JOIN it is used in.

...但即使这样也不会在12秒内降低速度,如果有的话。那时你最好去寻找可能有帮助的索引。例如,为每个ms_share_id值找到最新(max)created_at值的持续分组子查询将受益于sd_clients_daily_stocks(ms_shared_id,created_at)上的索引,以及使用它的JOIN。

#1


1  

You should try an index on sd_clients_daily_stocks(ms_share_id, created_at).

您应该尝试sd_clients_daily_stocks(ms_share_id,created_at)的索引。

You can add the additional columns from the select if you want a covering index.

如果需要覆盖索引,可以从select中添加其他列。

#2


0  

You would probably be better off joining on a non-correlated subquery based off your current subquery; using this kind of technique here to find the most recent rows for each in the new subquery.

根据您当前的子查询加入非相关子查询可能会更好;在这里使用这种技术来查找新子查询中每个的最新行。


Edit: I was thinking more something like this:

编辑:我想的更像是这样的:

SELECT DISTINCT sd_clients.ms_id, sd_clients.standard_name, sd_clients.is_paid, sd_clients.logo, sd_clients.change_percentage
    , scdsB.items
FROM sd_clients 
INNER JOIN ( 
    SELECT scdsA.ms_share_id
        , CONCAT(
            scdsA.`exchange_id`, '|--|', scdsA.`symbol`, '|--|', 
            scdsA.`close_price`, '|--|', scdsA.`day_low`, '|--|', scdsA.`day_high`
        ) as items
    FROM sd_clients_daily_stocks AS scdsA
    INNER JOIN (
        SELECT ms_share_id, MAX(created_at)
        FROM sd_clients_daily_stocks
        GROUP BY ms_share_id 
    ) AS lasts
    ON scdsA.ms_share_id = lasts.ms_share_id
    AND scdsA.created_at = lasts.created_at
) scdsB 
ON sd_clients.share_id_displayed = scdsB.ms_share_id 
GROUP BY sd_clients.ms_id 
ORDER BY sd_clients.standard_name ASC 
LIMIT 10;

... But even this likely won't reduce the speed much more, if any, from the 12 seconds. At that point you are better off looking into indexes that could help. For example, the lasts grouping subquery that finds the most recent(max) created_at value for each ms_share_id value would benefit from an index on sd_clients_daily_stocks (ms_shared_id, created_at) as would the the JOIN it is used in.

...但即使这样也不会在12秒内降低速度,如果有的话。那时你最好去寻找可能有帮助的索引。例如,为每个ms_share_id值找到最新(max)created_at值的持续分组子查询将受益于sd_clients_daily_stocks(ms_shared_id,created_at)上的索引,以及使用它的JOIN。