I'm breaking my head over how to do this one in SQL. I have a table:
我对如何在SQL中执行此操作感到不满。我有一张桌子:
| User_id | Question_ID | Answer_ID |
| 1 | 1 | 1 |
| 1 | 2 | 10 |
| 2 | 1 | 2 |
| 2 | 2 | 11 |
| 3 | 1 | 1 |
| 3 | 2 | 10 |
| 4 | 1 | 1 |
| 4 | 2 | 10 |
It holds user answers to a particular question. A question might have multiple answers. A User cannot answer the same question twice. (Hence, there's only one Answer_ID per {User_id, Question_ID})
它保存用户对特定问题的答案。一个问题可能有多个答案。用户无法回答两次相同的问题。 (因此,每{User_id,Question_ID}只有一个Answer_ID)
I'm trying to find an answer to this query: For a particular question and answer id (Related to the same question), I want to find the most common answer given to OTHER question by users with the given answer.
我正在尝试找到这个查询的答案:对于特定的问题和答案ID(与同一问题相关),我想找到给定答案的用户给出的其他问题最常见的答案。
For example, For the above table:
例如,对于上表:
For question_id = 1 -> For Answer_ID = 1 - (Question 2 - Answer ID 10)
For Answer_ID = 2 - (Question 2 - Answer ID 11)
Is it possible to do in one query? Should it be done in one query? Shall I just use stored procedure or Java for that one?
是否可以在一个查询中执行?它应该在一个查询中完成吗?我应该只使用存储过程或Java吗?
3 个解决方案
#1
4
Though @rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
虽然@ rick-james是对的,但我不确定如果不是这样的查询通常是为MySQL编写的,那么很容易启动。
-
You need a query to find out the most common answers to questions:
您需要查询才能找到问题的最常见答案:
SELECT question_id, answer_id, COUNT(*) as cnt FROM user_answers GROUP BY 1, 2 ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
这将返回一个表,其中对于每个question_id,我们按降序输出计数。
| 1 | 1 | 3 | | 1 | 2 | 1 | | 2 | 10 | 3 | | 2 | 11 | 1 |
-
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
现在我们应该解决一个所谓的最大n组的任务。问题在于,在MySQL中为了性能,这样的任务通常不是在纯SQL中解决,而是使用黑客,这些黑客依赖于知识如何在内部处理查询。
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
在这种情况下,我们知道我们可以定义一个变量然后遍历ready表,了解前一行,这允许我们区分组中的第一行和其他行。
SELECT question_id, answer_id, cnt, IF(question_id=@q_id, NULL, @q_id:=question_id) as v FROM ( SELECT question_id, answer_id, COUNT(*) as cnt FROM user_answers GROUP BY 1, 2 ORDER BY 1, 3 DESC) cnts JOIN ( SELECT @q_id:=-1 ) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
确保已初始化变量(并在初始化时尊重其数据类型,否则可能会在以后意外地进行转换)。结果如下:
| 1 | 1 | 3 | 1 | | 1 | 2 | 1 |(null)| | 2 | 10 | 3 | 2 | | 2 | 11 | 1 |(null)|
-
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
现在我们只需要在最后一列中过滤掉NULL的行。由于实际上不需要该列,我们可以将相同的表达式移动到WHERE子句中。实际上也不需要cnt列,所以我们也可以跳过它:
SELECT question_id, answer_id FROM ( SELECT question_id, answer_id FROM user_answers GROUP BY 1, 2 ORDER BY 1, COUNT(*) DESC) cnts JOIN ( SELECT @q_id:=-1 ) as init WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;
-
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
最后一件值得一提的是,为了使查询有效,你应该有正确的索引。此查询需要以(question_id,answer_id)列开头的索引。由于您无论如何都需要UNIQUE索引,因此按以下顺序定义它是有意义的:(question_id,answer_id,user_id)。
CREATE TABLE user_answers ( user_id INTEGER, question_id INTEGER, answer_id INTEGER, UNIQUE INDEX (question_id, answer_id, user_id) ) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.
这是一个可以使用的平板电脑:http://sqlfiddle.com/#!9 / bd12ad / 20。
#2
4
Do you want a fish? Or do you want to learn how to fish?
你想要一条鱼吗?或者你想学习如何钓鱼?
Your question seems to have multiple steps.
您的问题似乎有多个步骤。
-
Fetch info about "questions by users with the given answer". Devise this
SELECT
and imagine that the results form a new table.获取有关“具有给定答案的用户的问题”的信息。设计这个SELECT并想象结果形成一个新表。
-
Apply the "OTHER" restriction. This is probably a minor
AND ... != ...
added toSELECT #1
.应用“OTHER”限制。这可能是一个小的AND ...!= ...添加到SELECT#1。
-
Now find the "most common answer". This probably involves
ORDER BY COUNT(*) DESC LIMIT 1
. It is likely to现在找到“最常见的答案”。这可能涉及ORDER BY COUNT(*)DESC LIMIT 1.很可能
use a derived table:
使用派生表:
SELECT ...
FROM ( select#2 )
#3
1
Your question is multi conditional, you have to get first Questions with their asking user from Question
table:
您的问题是多条件的,您必须在问题表中向他们的询问用户提出第一个问题:
select question_id,user_id from question
Then insert the answer to the asked question and make some checks in your Java code like (is user has answered to this same question as the user who is asking this question, is user answered this question for multiple times).
然后插入问题的答案并在您的Java代码中进行一些检查(用户已经回答了与提出此问题的用户相同的问题,用户多次回答此问题)。
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user
#1
4
Though @rick-james is right, I am not sure that it is easy to start when you do not not how the queries like this are usually written for MySQL.
虽然@ rick-james是对的,但我不确定如果不是这样的查询通常是为MySQL编写的,那么很容易启动。
-
You need a query to find out the most common answers to questions:
您需要查询才能找到问题的最常见答案:
SELECT question_id, answer_id, COUNT(*) as cnt FROM user_answers GROUP BY 1, 2 ORDER BY 1, 3 DESC
This would return a table where for each question_id we output counts in descending order.
这将返回一个表,其中对于每个question_id,我们按降序输出计数。
| 1 | 1 | 3 | | 1 | 2 | 1 | | 2 | 10 | 3 | | 2 | 11 | 1 |
-
And now we should solve a so called greatest-n-per-group task. The problem is that in MySQL for the sake of performance the tasks like this are usually solved not in pure SQL, but using hacks which rest on knowledge how the queries are processed internally.
现在我们应该解决一个所谓的最大n组的任务。问题在于,在MySQL中为了性能,这样的任务通常不是在纯SQL中解决,而是使用黑客,这些黑客依赖于知识如何在内部处理查询。
In this case we know that we can define a variable and then iterating over the ready table, have knowledge about the previous row, which allows us to distinguish between the first row in a group and the others.
在这种情况下,我们知道我们可以定义一个变量然后遍历ready表,了解前一行,这允许我们区分组中的第一行和其他行。
SELECT question_id, answer_id, cnt, IF(question_id=@q_id, NULL, @q_id:=question_id) as v FROM ( SELECT question_id, answer_id, COUNT(*) as cnt FROM user_answers GROUP BY 1, 2 ORDER BY 1, 3 DESC) cnts JOIN ( SELECT @q_id:=-1 ) as init;
Make sure that you have initialised the variable (and respect its data type on initialisation, otherwise it may be unexpectedly casted later). Here is the result:
确保已初始化变量(并在初始化时尊重其数据类型,否则可能会在以后意外地进行转换)。结果如下:
| 1 | 1 | 3 | 1 | | 1 | 2 | 1 |(null)| | 2 | 10 | 3 | 2 | | 2 | 11 | 1 |(null)|
-
Now we just need to filter out rows with NULL in the last column. Since the column is actually not needed we can move the same expression into the WHERE clause. The cnt column is actually not needed either, so we can skip it as well:
现在我们只需要在最后一列中过滤掉NULL的行。由于实际上不需要该列,我们可以将相同的表达式移动到WHERE子句中。实际上也不需要cnt列,所以我们也可以跳过它:
SELECT question_id, answer_id FROM ( SELECT question_id, answer_id FROM user_answers GROUP BY 1, 2 ORDER BY 1, COUNT(*) DESC) cnts JOIN ( SELECT @q_id:=-1 ) as init WHERE IF(question_id=@q_id, NULL, @q_id:=question_id) IS NOT NULL;
-
The last thing worth mentioning, for the query to be efficient you should have correct indexes. This query requires an index starting with (question_id, answer_id) columns. Since you anyway need a UNIQUE index, it make sense to define it in this order: (question_id, answer_id, user_id).
最后一件值得一提的是,为了使查询有效,你应该有正确的索引。此查询需要以(question_id,answer_id)列开头的索引。由于您无论如何都需要UNIQUE索引,因此按以下顺序定义它是有意义的:(question_id,answer_id,user_id)。
CREATE TABLE user_answers ( user_id INTEGER, question_id INTEGER, answer_id INTEGER, UNIQUE INDEX (question_id, answer_id, user_id) ) engine=InnoDB;
Here is an sqlfiddle to play with: http://sqlfiddle.com/#!9/bd12ad/20.
这是一个可以使用的平板电脑:http://sqlfiddle.com/#!9 / bd12ad / 20。
#2
4
Do you want a fish? Or do you want to learn how to fish?
你想要一条鱼吗?或者你想学习如何钓鱼?
Your question seems to have multiple steps.
您的问题似乎有多个步骤。
-
Fetch info about "questions by users with the given answer". Devise this
SELECT
and imagine that the results form a new table.获取有关“具有给定答案的用户的问题”的信息。设计这个SELECT并想象结果形成一个新表。
-
Apply the "OTHER" restriction. This is probably a minor
AND ... != ...
added toSELECT #1
.应用“OTHER”限制。这可能是一个小的AND ...!= ...添加到SELECT#1。
-
Now find the "most common answer". This probably involves
ORDER BY COUNT(*) DESC LIMIT 1
. It is likely to现在找到“最常见的答案”。这可能涉及ORDER BY COUNT(*)DESC LIMIT 1.很可能
use a derived table:
使用派生表:
SELECT ...
FROM ( select#2 )
#3
1
Your question is multi conditional, you have to get first Questions with their asking user from Question
table:
您的问题是多条件的,您必须在问题表中向他们的询问用户提出第一个问题:
select question_id,user_id from question
Then insert the answer to the asked question and make some checks in your Java code like (is user has answered to this same question as the user who is asking this question, is user answered this question for multiple times).
然后插入问题的答案并在您的Java代码中进行一些检查(用户已经回答了与提出此问题的用户相同的问题,用户多次回答此问题)。
select question_id,user_id from question where user_id=asking-user_id // gets all questions and show on UI
select answer_id,user_id from answer where user_id=answering-user_id // checks the answers that particular user