I'd like to optimize my queries so I look into mysql-slow.log
.
我想优化我的查询,因此我查看了mysql-slow.log。
Most of my slow queries contains ORDER BY RAND()
. I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP
-generated random row.
我的大多数缓慢查询都包含RAND()的ORDER。我找不到一个真正的解决办法来解决这个问题。在mysqlperformance博客中有一个可能的解决方案,但我认为这还不够。在优化糟糕的(或经常更新的、用户管理的)表上,它不能工作,或者在选择php生成的随机行之前,我需要运行两个或多个查询。
Is there any solution for this issue?
这个问题有什么解决办法吗?
A dummy example:
一个虚拟的例子:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
ORDER BY
RAND()
LIMIT 1
8 个解决方案
#1
62
Try this:
试试这个:
SELECT *
FROM (
SELECT @cnt := COUNT(*) + 1,
@lim := 10
FROM t_random
) vars
STRAIGHT_JOIN
(
SELECT r.*,
@lim := @lim - 1
FROM t_random r
WHERE (@cnt := @cnt - 1)
AND RAND(20090301) < @lim / @cnt
) i
This is especially efficient on MyISAM
(since the COUNT(*)
is instant), but even in InnoDB
it's 10
times more efficient than ORDER BY RAND()
.
这在MyISAM上尤其有效(因为COUNT(*)是即时的),但即使是在InnoDB中,它的效率也是RAND()的10倍。
The main idea here is that we don't sort, but instead keep two variables and calculate the running probability
of a row to be selected on the current step.
这里的主要思想是我们不排序,而是保留两个变量,并计算当前步骤中要选择的行的运行概率。
See this article in my blog for more detail:
详情请参阅我的博客:
- Selecting random rows
- 选择随机行
Update:
更新:
If you need to select but a single random record, try this:
如果您只需要选择一条随机记录,请尝试以下方法:
SELECT aco.*
FROM (
SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
FROM (
SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
FROM accomodation
) q
) q2
JOIN accomodation aco
ON aco.ac_id =
COALESCE
(
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_id > randid
AND ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
),
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
)
)
This assumes your ac_id
's are distributed more or less evenly.
这假定您的ac_id大体上是均匀分布的。
#2
13
It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.
这取决于你需要有多随机。你所链接的解在我看来非常有效。除非你的ID字段有很大的空隙,否则它仍然是随机的。
However, you should be able to do it in one query using this (for selecting a single value):
但是,您应该能够在一个查询中使用这个(用于选择一个值):
SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1
Other solutions:
其他的解决方案:
- Add a permanent float field called
random
to the table and fill it with random numbers. You can then generate a random number in PHP and do"SELECT ... WHERE rnd > $random"
- 向表中添加一个名为random的永久浮动字段,并将其填充为随机数。然后,您可以在PHP中生成一个随机数并执行“SELECT…”其中rnd >美元随机”
- Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
- 获取整个id列表并将它们缓存到一个文本文件中。读取文件并从中选择一个随机ID。
- Cache the results of the query as HTML and keep it for a few hours.
- 将查询结果缓存为HTML并保存几个小时。
#3
1
Here's how I'd do it:
我是这样做的:
SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != 'draft'
AND c.acat_slug != 'vendeglatohely'
AND a.ac_images != 'b:0;';
SET @sql := CONCAT('
SELECT a.ac_id,
a.ac_status,
a.ac_name,
a.ac_status,
a.ac_images
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != ''draft''
AND c.acat_slug != ''vendeglatohely''
AND a.ac_images != ''b:0;''
LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
#4
0
This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.
这将给您一个子查询,它将使用索引获取一个随机id,然后另一个查询将启动获取您的联接表。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)
#5
0
The solution for your dummy-example would be:
解决你的愚蠢例子的办法是:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation,
JOIN
accomodation_category
ON accomodation.ac_category = accomodation_category.acat_id
JOIN
(
SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
) AS Choices
USING (ac_id)
WHERE accomodation.ac_id >= Choices.ac_id
AND accomodation.ac_status != 'draft'
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
LIMIT 1
To read more about alternatives to ORDER BY RAND()
, you should read this article.
若要阅读RAND()命令的其他选项,您应该阅读本文。
#6
0
I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.
我正在优化我的项目中的许多现有查询。Quassnoi的解决方案帮助我加快了查询速度!但是,我发现很难将上述解决方案合并到所有查询中,特别是对于涉及多个大型表上的多个子查询的复杂查询。
So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.
我用的是一个优化程度较低的解。从根本上说,它与Quassnoi的解决方案是一样的。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size
$size * $factor / [accomodation_table_row_count]
works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size
$size * $factor / [accommodate odation_table_row_count]计算出随机选择一行的概率。rand()将生成一个随机数。如果rand()小于或等于概率,则选择行。这有效地执行了一个随机选择来限制表的大小。由于有可能它返回的值小于定义的极限计数,因此我们需要增加概率,以确保选择了足够多的行。因此,我们将$size乘以$factor(我通常设置$factor = 2,在大多数情况下是有效的)。最后,我们做限值$size
The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:
现在的问题是计算出住宿_table_row_count。如果我们知道表的大小,我们可以硬编码表的大小。这将运行最快,但显然这不是理想的。如果您正在使用Myisam,获取表计数是非常有效的。因为我使用的是innodb,所以我只是做一个简单的count+选择。在你的例子中,它看起来是这样的:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size
The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))
But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.
棘手的部分是计算出正确的概率。正如您所看到的,下面的代码实际上只计算了粗略的临时表的大小(实际上,太粗糙了!):(select(选择count(*)) * (select count(*),从servicodation_category),但是您可以优化这个逻辑以给出一个更接近的表大小的近似。注意,多选择比少选择行更好。也就是说,如果概率设置得太低,就有可能没有选择足够的行。
This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().
这个解决方案比Quassnoi的解决方案运行得慢,因为我们需要重新计算表的大小。然而,我发现这种编码更易于管理。这是准确性、性能和编码复杂性之间的权衡。尽管如此,在大型表上,这仍然比Rand()的Order快得多。
Note: If the query logic permits, perform the random selection as early as possible before any join operations.
注意:如果查询逻辑允许,请在任何连接操作之前尽早执行随机选择。
#7
0
(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)
(是的,我可能会因为这里的肉不够多而受到惩罚,但你就不能成为一天的素食主义者吗?)
Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column
Case:连续的AUTO_INCREMENT没有间隔,1行返回的情况:连续的AUTO_INCREMENT没有间隔,10行Case: AUTO_INCREMENT与间隙,1行返回的情况:额外的浮动栏,用于随机事件:UUID或MD5列。
Those 5 cases can be made very efficient for large tables. See my blog for the details.
这5个案例对于大型表来说是非常有效的。详情请见我的博客。
#8
-1
function getRandomRow(){
$id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
$res = getRowById($id);
if(!empty($res))
return $res;
return getRandomRow();
}
//rowid is a key on table
function getRowById($rowid=false){
return db select from table where rowid = $rowid;
}
#1
62
Try this:
试试这个:
SELECT *
FROM (
SELECT @cnt := COUNT(*) + 1,
@lim := 10
FROM t_random
) vars
STRAIGHT_JOIN
(
SELECT r.*,
@lim := @lim - 1
FROM t_random r
WHERE (@cnt := @cnt - 1)
AND RAND(20090301) < @lim / @cnt
) i
This is especially efficient on MyISAM
(since the COUNT(*)
is instant), but even in InnoDB
it's 10
times more efficient than ORDER BY RAND()
.
这在MyISAM上尤其有效(因为COUNT(*)是即时的),但即使是在InnoDB中,它的效率也是RAND()的10倍。
The main idea here is that we don't sort, but instead keep two variables and calculate the running probability
of a row to be selected on the current step.
这里的主要思想是我们不排序,而是保留两个变量,并计算当前步骤中要选择的行的运行概率。
See this article in my blog for more detail:
详情请参阅我的博客:
- Selecting random rows
- 选择随机行
Update:
更新:
If you need to select but a single random record, try this:
如果您只需要选择一条随机记录,请尝试以下方法:
SELECT aco.*
FROM (
SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
FROM (
SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
FROM accomodation
) q
) q2
JOIN accomodation aco
ON aco.ac_id =
COALESCE
(
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_id > randid
AND ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
),
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
)
)
This assumes your ac_id
's are distributed more or less evenly.
这假定您的ac_id大体上是均匀分布的。
#2
13
It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.
这取决于你需要有多随机。你所链接的解在我看来非常有效。除非你的ID字段有很大的空隙,否则它仍然是随机的。
However, you should be able to do it in one query using this (for selecting a single value):
但是,您应该能够在一个查询中使用这个(用于选择一个值):
SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1
Other solutions:
其他的解决方案:
- Add a permanent float field called
random
to the table and fill it with random numbers. You can then generate a random number in PHP and do"SELECT ... WHERE rnd > $random"
- 向表中添加一个名为random的永久浮动字段,并将其填充为随机数。然后,您可以在PHP中生成一个随机数并执行“SELECT…”其中rnd >美元随机”
- Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
- 获取整个id列表并将它们缓存到一个文本文件中。读取文件并从中选择一个随机ID。
- Cache the results of the query as HTML and keep it for a few hours.
- 将查询结果缓存为HTML并保存几个小时。
#3
1
Here's how I'd do it:
我是这样做的:
SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != 'draft'
AND c.acat_slug != 'vendeglatohely'
AND a.ac_images != 'b:0;';
SET @sql := CONCAT('
SELECT a.ac_id,
a.ac_status,
a.ac_name,
a.ac_status,
a.ac_images
FROM accomodation a
JOIN accomodation_category c
ON (a.ac_category = c.acat_id)
WHERE a.ac_status != ''draft''
AND c.acat_slug != ''vendeglatohely''
AND a.ac_images != ''b:0;''
LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
#4
0
This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.
这将给您一个子查询,它将使用索引获取一个随机id,然后另一个查询将启动获取您的联接表。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)
#5
0
The solution for your dummy-example would be:
解决你的愚蠢例子的办法是:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation,
JOIN
accomodation_category
ON accomodation.ac_category = accomodation_category.acat_id
JOIN
(
SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
) AS Choices
USING (ac_id)
WHERE accomodation.ac_id >= Choices.ac_id
AND accomodation.ac_status != 'draft'
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
LIMIT 1
To read more about alternatives to ORDER BY RAND()
, you should read this article.
若要阅读RAND()命令的其他选项,您应该阅读本文。
#6
0
I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.
我正在优化我的项目中的许多现有查询。Quassnoi的解决方案帮助我加快了查询速度!但是,我发现很难将上述解决方案合并到所有查询中,特别是对于涉及多个大型表上的多个子查询的复杂查询。
So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.
我用的是一个优化程度较低的解。从根本上说,它与Quassnoi的解决方案是一样的。
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size
$size * $factor / [accomodation_table_row_count]
works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size
$size * $factor / [accommodate odation_table_row_count]计算出随机选择一行的概率。rand()将生成一个随机数。如果rand()小于或等于概率,则选择行。这有效地执行了一个随机选择来限制表的大小。由于有可能它返回的值小于定义的极限计数,因此我们需要增加概率,以确保选择了足够多的行。因此,我们将$size乘以$factor(我通常设置$factor = 2,在大多数情况下是有效的)。最后,我们做限值$size
The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:
现在的问题是计算出住宿_table_row_count。如果我们知道表的大小,我们可以硬编码表的大小。这将运行最快,但显然这不是理想的。如果您正在使用Myisam,获取表计数是非常有效的。因为我使用的是innodb,所以我只是做一个简单的count+选择。在你的例子中,它看起来是这样的:
SELECT accomodation.ac_id,
accomodation.ac_status,
accomodation.ac_name,
accomodation.ac_status,
accomodation.ac_images
FROM accomodation, accomodation_category
WHERE accomodation.ac_status != 'draft'
AND accomodation.ac_category = accomodation_category.acat_id
AND accomodation_category.acat_slug != 'vendeglatohely'
AND ac_images != 'b:0;'
AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size
The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))
But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.
棘手的部分是计算出正确的概率。正如您所看到的,下面的代码实际上只计算了粗略的临时表的大小(实际上,太粗糙了!):(select(选择count(*)) * (select count(*),从servicodation_category),但是您可以优化这个逻辑以给出一个更接近的表大小的近似。注意,多选择比少选择行更好。也就是说,如果概率设置得太低,就有可能没有选择足够的行。
This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performance vs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().
这个解决方案比Quassnoi的解决方案运行得慢,因为我们需要重新计算表的大小。然而,我发现这种编码更易于管理。这是准确性、性能和编码复杂性之间的权衡。尽管如此,在大型表上,这仍然比Rand()的Order快得多。
Note: If the query logic permits, perform the random selection as early as possible before any join operations.
注意:如果查询逻辑允许,请在任何连接操作之前尽早执行随机选择。
#7
0
(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)
(是的,我可能会因为这里的肉不够多而受到惩罚,但你就不能成为一天的素食主义者吗?)
Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column
Case:连续的AUTO_INCREMENT没有间隔,1行返回的情况:连续的AUTO_INCREMENT没有间隔,10行Case: AUTO_INCREMENT与间隙,1行返回的情况:额外的浮动栏,用于随机事件:UUID或MD5列。
Those 5 cases can be made very efficient for large tables. See my blog for the details.
这5个案例对于大型表来说是非常有效的。详情请见我的博客。
#8
-1
function getRandomRow(){
$id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
$res = getRowById($id);
if(!empty($res))
return $res;
return getRandomRow();
}
//rowid is a key on table
function getRowById($rowid=false){
return db select from table where rowid = $rowid;
}