Let's say I have 1000 users for my app. I ask them 100 questions with answers just yes/no and I record those answers in a seperate table.
假设我的应用有1000个用户,我问他们100个问题,回答是或否,然后我把这些答案分别记录在一张表格中。
Now, I want to see people who has given the same answers to at least 20 questions.
现在,我希望看到那些对至少20个问题给出相同答案的人。
What kind of algorithm should I follow in order to do this? What are the relevant keywords for googling?
为了做到这一点,我应该遵循什么样的算法呢?谷歌搜索的关键词是什么?
P.S. I work in a WAMP environment.
附注:我在工作环境中。
1 个解决方案
#1
4
Join your answers table to itself, selecting answers which share the same question_id
and answer
but have a different user_id
. Group the rows by both user_id
s and use a HAVING
clause to exclude those with less than 20 matching answers.
将答案表连接到自己,选择共享相同的question_id和答案但拥有不同的user_id的答案。通过user_id对行进行分组,并使用一个have子句来排除那些小于20个匹配答案的行。
Example where you are looking for users similar to your user with user_id
"1":
例如,您正在寻找与user_id“1”类似的用户:
SELECT DISTINCT a2.user_id FROM answers a
INNER JOIN answers a2
ON a.question_id = a2.question_id
AND a.answer = a2.answer
AND a.user_id != a2.user_id
WHERE a.user_id = 1
GROUP BY a.user_id, a2.user_id
HAVING COUNT(*) >= 20;
Technically you don't need to group by a.user_id
in this case but I've left it there in case you want to modify the WHERE
clause to return results for more than one a.user_id
.
严格地说,你不需要按a进行分组。user_id在本例中,但是我把它留在这里,以防您希望修改WHERE子句,以便为多个a.user_id返回结果。
#1
4
Join your answers table to itself, selecting answers which share the same question_id
and answer
but have a different user_id
. Group the rows by both user_id
s and use a HAVING
clause to exclude those with less than 20 matching answers.
将答案表连接到自己,选择共享相同的question_id和答案但拥有不同的user_id的答案。通过user_id对行进行分组,并使用一个have子句来排除那些小于20个匹配答案的行。
Example where you are looking for users similar to your user with user_id
"1":
例如,您正在寻找与user_id“1”类似的用户:
SELECT DISTINCT a2.user_id FROM answers a
INNER JOIN answers a2
ON a.question_id = a2.question_id
AND a.answer = a2.answer
AND a.user_id != a2.user_id
WHERE a.user_id = 1
GROUP BY a.user_id, a2.user_id
HAVING COUNT(*) >= 20;
Technically you don't need to group by a.user_id
in this case but I've left it there in case you want to modify the WHERE
clause to return results for more than one a.user_id
.
严格地说,你不需要按a进行分组。user_id在本例中,但是我把它留在这里,以防您希望修改WHERE子句,以便为多个a.user_id返回结果。