I am working on a social network tracking application. Even joins works fine with proper indexing. But when I add the order by clause the total query takes 100 times longer time to execute. The following query I used to get the twitter_users without order by clause.
我正在研究社交网络跟踪应用程序。即使连接正常索引也可以正常工作。但是当我添加order by子句时,总查询执行的时间要长100倍。以下查询用于获取没有order by子句的twitter_users。
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
LIMIT 20
Showing rows 0 - 19 (20 total, Query took 0.0714 sec)
显示0到19行(总计20行,查询占用0.0714秒)
But when I add order by clause ( on indexed column )
但是当我添加order by子句时(在索引列上)
SELECT DISTINCT `tracked_twitter`.id
FROM tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
LIMIT 20
Showing rows 0 - 19 (20 total, Query took 13.4636 sec)
显示0到19行(总共20行,查询占用13.4636秒)
说明
When I implement the order by clause in its table alone it doesn't take much time
当我单独在其表中实现order by子句时,它不会花费太多时间
SELECT * FROM `tracked_twitter` WHERE 1 order by `followers_count` desc limit 20
Showing rows 0 - 19 (20 total, Query took 0.0711 sec) [followers_count: 68236387 - 10525612]
显示行0 - 19(总共20行,查询占用0.0711秒)[followers_count:68236387 - 10525612]
The table creation query as follows
表创建查询如下
CREATE TABLE IF NOT EXISTS `tracked_twitter` (
`id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`handle` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`location` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` text COLLATE utf8_unicode_ci,
`profile_image` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`followers_count` int(11) NOT NULL,
`is_influencer` tinyint(1) NOT NULL DEFAULT '0',
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`gender` enum('Male','Female','Other') COLLATE utf8_unicode_ci
DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `followers_count` (`followers_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
So join didn't slow the query and order by working well when I execute it on its table. So how can I improve performance?
因此,当我在其表上执行时,加入并不会减慢查询和顺序。那么我该如何提高性能呢?
UPDATE 1
更新1
@GordonLinoff method solves if i only need the result set from parent table. What f I want to know the number tweets per person (count of twitter_content which match the tracked_twitter table). How can I modify it? And if I want to have math functions on tweet content how do I do it ??
@GordonLinoff方法解决了我是否只需要父表的结果集。我想知道每人的推文数量(与tracked_twitter表匹配的twitter_content计数)。我怎么修改它?如果我想在推文内容上有数学函数我该怎么办?
SELECT `tracked_twitter` . * , COUNT( * ) AS twitterContentCount, retweet_count + favourite_count + reply_count AS engagement
FROM `tracked_twitter`
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE `is_influencer` != '1'
AND `tracker_twitter_content`.`tracker_id` = '88'
AND `tracked_twitter_id` != '0'
GROUP BY `tracked_twitter`.`id`
ORDER BY twitterContentCount DESC
LIMIT 20
OFFSET 0
3 个解决方案
#1
3
Try getting rid of the distinct
. That is a performance killer. I'm not sure why your first query works quickly; perhaps MySQL is smart enough to optimize it away.
尝试摆脱不同的。这是一个性能杀手。我不确定为什么你的第一个查询很快就能运行;也许MySQL非常聪明,可以优化它。
I would try:
我会尝试:
SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
FROM twitter_content tc INNER JOIN
tracker_twitter_content ttc
ON tc.id = ttc.twitter_content_id
WHERE ttc.tracker_id = 88 AND
tt.id = tc.tracked_twitter_id
)
ORDER BY tt.followers_count DESC ;
For this version, you want indexes on: tracked_twitter(followers_count, id)
, twitter_content(tracked_twitter_id, id)
, and tracker_twitter_content(twitter_content_id, tracker_id)
.
对于此版本,您需要索引:tracked_twitter(followers_count,id),twitter_content(tracked_twitter_id,id)和tracker_twitter_content(twitter_content_id,tracker_id)。
#2
1
Parent table keep on bracket with limit
父表保持括号限制
SELECT DISTINCT `tracked_twitter`.id FROM
(SELECT id,followers_count FROM tracked_twitter ORDER BY followers_count DESC
LIMIT 20) AS tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
#3
1
The main problem is that even that you have relatively few rows, you use varchar(255) COLLATE utf8_unicode_ci
as a primary key (instead of integers) and hence as the foreign key in other tables. The same problem, I suspect, is with twitter_content.id
. This causes a lot of long string comparisons and reserving a lot of extra memory for the temporary tables.
主要问题是,即使你有相对较少的行,你使用varchar(255)COLLATE utf8_unicode_ci作为主键(而不是整数),因此作为其他表中的外键。我怀疑同样的问题是使用twitter_content.id。这会导致很多长字符串比较,并为临时表保留大量额外内存。
Concerning the query itself, yes, it should be a query that walks along the followers_count
index and checks the condition for the related tables. This could be done as Gordon Linoff suggested, or by using index hints.
关于查询本身,是的,它应该是一个遍历followers_count索引并检查相关表的条件的查询。这可以像Gordon Linoff建议的那样,或者通过使用索引提示来完成。
#1
3
Try getting rid of the distinct
. That is a performance killer. I'm not sure why your first query works quickly; perhaps MySQL is smart enough to optimize it away.
尝试摆脱不同的。这是一个性能杀手。我不确定为什么你的第一个查询很快就能运行;也许MySQL非常聪明,可以优化它。
I would try:
我会尝试:
SELECT tt.id
FROM tracked_twitter tt
WHERE EXISTS (SELECT 1
FROM twitter_content tc INNER JOIN
tracker_twitter_content ttc
ON tc.id = ttc.twitter_content_id
WHERE ttc.tracker_id = 88 AND
tt.id = tc.tracked_twitter_id
)
ORDER BY tt.followers_count DESC ;
For this version, you want indexes on: tracked_twitter(followers_count, id)
, twitter_content(tracked_twitter_id, id)
, and tracker_twitter_content(twitter_content_id, tracker_id)
.
对于此版本,您需要索引:tracked_twitter(followers_count,id),twitter_content(tracked_twitter_id,id)和tracker_twitter_content(twitter_content_id,tracker_id)。
#2
1
Parent table keep on bracket with limit
父表保持括号限制
SELECT DISTINCT `tracked_twitter`.id FROM
(SELECT id,followers_count FROM tracked_twitter ORDER BY followers_count DESC
LIMIT 20) AS tracked_twitter
INNER JOIN `twitter_content` ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN `tracker_twitter_content` ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
AND `tracker_twitter_content`.`tracker_id` = '88'
ORDER BY tracked_twitter.followers_count DESC
#3
1
The main problem is that even that you have relatively few rows, you use varchar(255) COLLATE utf8_unicode_ci
as a primary key (instead of integers) and hence as the foreign key in other tables. The same problem, I suspect, is with twitter_content.id
. This causes a lot of long string comparisons and reserving a lot of extra memory for the temporary tables.
主要问题是,即使你有相对较少的行,你使用varchar(255)COLLATE utf8_unicode_ci作为主键(而不是整数),因此作为其他表中的外键。我怀疑同样的问题是使用twitter_content.id。这会导致很多长字符串比较,并为临时表保留大量额外内存。
Concerning the query itself, yes, it should be a query that walks along the followers_count
index and checks the condition for the related tables. This could be done as Gordon Linoff suggested, or by using index hints.
关于查询本身,是的,它应该是一个遍历followers_count索引并检查相关表的条件的查询。这可以像Gordon Linoff建议的那样,或者通过使用索引提示来完成。