Basically I'm trying to pull a random poll question that a user has not yet responded to from a database. This query takes about 10-20 seconds to execute, which is obviously no good! The responses table is about 30K rows and the database also has about 300 questions.
基本上我正试图从用户尚未从数据库中回复的随机民意调查问题。此查询执行大约需要10-20秒,这显然不太好!响应表大约是30K行,数据库也有大约300个问题。
SELECT questions.id
FROM questions
LEFT JOIN responses ON ( questions.id = responses.questionID
AND responses.username = 'someuser' )
WHERE
responses.username IS NULL
ORDER BY RAND() ASC
LIMIT 1
PK for questions and reponses tables is 'id' if that matters.
问题和响应表的PK是'id',如果重要的话。
Any advice would be greatly appreciated.
任何建议将不胜感激。
5 个解决方案
#1
11
You most likely need an index on
你很可能需要一个索引
responses.questionID
responses.username
Without the index searching through 30k rows will always be slow.
没有索引搜索30k行总是很慢。
#2
4
Here's a different approach to the query which might be faster:
这是一种不同的查询方法,可能更快:
SELECT q.id
FROM questions q
WHERE q.id NOT IN (
SELECT r.questionID
FROM responses r
WHERE r.username = 'someuser'
)
Make sure there is an index on r.username
and that should be pretty quick.
确保r.username上有一个索引,这应该非常快。
The above will return all the unanswered questios. To choose the random one, you could go with the inefficient (but easy) ORDER BY RAND() LIMIT 1
, or use the method suggested by Tom Leys.
以上将返回所有未回答的问题。要选择随机的,你可以使用效率低(但很容易)的ORDER BY RAND()LIMIT 1,或者使用Tom Leys建议的方法。
#3
3
The problem is probably not the join, it's almost certainly sorting 30k rows by order rand()
问题可能不是连接,它几乎肯定按顺序排序30k行rand()
#4
3
See: Do not order by rand
请参阅:不要按兰特订购
He suggests (replace quotes in this example with your query)
他建议(在这个例子中用你的查询替换引号)
SELECT COUNT(*) AS cnt FROM quotes
-- generate random number between 0 and cnt-1 in your programming language and run
-- the query:
SELECT quote FROM quotes LIMIT $generated_number, 1
Of course you could probably make the first statement a subselect inside the second.
当然,您可能会在第二个语句中将第一个语句作为子选择。
#5
0
Is OP even sure the original query returns the correct result set?
OP是否确定原始查询返回正确的结果集?
I assume the "AND responses.username = 'someuser'" clause was added to join specification with intention that join will then generate null rightside columns for only the id's that someuser has not answered.
我假设添加了“AND responses.username ='someuser'”子句来加入规范,其意图是join只会为某些用户尚未回答的id生成空的右侧列。
My question: won't that join generate null rightside columns for every question.id that has not been answered by all users? The left join works such that, "If any row from the target table does not match the join expression, then NULL values are generated for all column references to the target table in the SELECT column list."
我的问题:不会加入为每个问题生成空的右侧列。所有用户都没有回答过这个问题吗?左连接的工作原理是,“如果目标表中的任何行与连接表达式不匹配,则会为SELECT列列表中目标表的所有列引用生成NULL值。”
In any case, nickf's suggestion looks good to me.
在任何情况下,nickf的建议对我来说都很好。
#1
11
You most likely need an index on
你很可能需要一个索引
responses.questionID
responses.username
Without the index searching through 30k rows will always be slow.
没有索引搜索30k行总是很慢。
#2
4
Here's a different approach to the query which might be faster:
这是一种不同的查询方法,可能更快:
SELECT q.id
FROM questions q
WHERE q.id NOT IN (
SELECT r.questionID
FROM responses r
WHERE r.username = 'someuser'
)
Make sure there is an index on r.username
and that should be pretty quick.
确保r.username上有一个索引,这应该非常快。
The above will return all the unanswered questios. To choose the random one, you could go with the inefficient (but easy) ORDER BY RAND() LIMIT 1
, or use the method suggested by Tom Leys.
以上将返回所有未回答的问题。要选择随机的,你可以使用效率低(但很容易)的ORDER BY RAND()LIMIT 1,或者使用Tom Leys建议的方法。
#3
3
The problem is probably not the join, it's almost certainly sorting 30k rows by order rand()
问题可能不是连接,它几乎肯定按顺序排序30k行rand()
#4
3
See: Do not order by rand
请参阅:不要按兰特订购
He suggests (replace quotes in this example with your query)
他建议(在这个例子中用你的查询替换引号)
SELECT COUNT(*) AS cnt FROM quotes
-- generate random number between 0 and cnt-1 in your programming language and run
-- the query:
SELECT quote FROM quotes LIMIT $generated_number, 1
Of course you could probably make the first statement a subselect inside the second.
当然,您可能会在第二个语句中将第一个语句作为子选择。
#5
0
Is OP even sure the original query returns the correct result set?
OP是否确定原始查询返回正确的结果集?
I assume the "AND responses.username = 'someuser'" clause was added to join specification with intention that join will then generate null rightside columns for only the id's that someuser has not answered.
我假设添加了“AND responses.username ='someuser'”子句来加入规范,其意图是join只会为某些用户尚未回答的id生成空的右侧列。
My question: won't that join generate null rightside columns for every question.id that has not been answered by all users? The left join works such that, "If any row from the target table does not match the join expression, then NULL values are generated for all column references to the target table in the SELECT column list."
我的问题:不会加入为每个问题生成空的右侧列。所有用户都没有回答过这个问题吗?左连接的工作原理是,“如果目标表中的任何行与连接表达式不匹配,则会为SELECT列列表中目标表的所有列引用生成NULL值。”
In any case, nickf's suggestion looks good to me.
在任何情况下,nickf的建议对我来说都很好。