确定值列表是否完全满足一对多关系的最有效方法(MySQL)

时间:2022-10-03 23:01:57

I have a one-to-many relationship of rooms and their occupants:

我和房间之间有一对多的关系:

Room | User
1    | 1
1    | 2
1    | 4
2    | 1
2    | 2
2    | 3
2    | 5
3    | 1
3    | 3

Given a list of users, e.g. 1, 3, what is the most efficient way to determining which room is completely/perfectly filled by them? So in this case, it should return room 3 because, although they are both in room 2, room 2 has other occupants as well, which is not a "perfect" fit.

给定用户列表,例如1,3,确定哪个房间被他们完全/完全填满的最有效方法是什么?因此,在这种情况下,它应该返回3号房间,因为虽然它们都在2号房间,但是2号房间也有其他居住者,这不是“完美”的契合。

I can think of several solutions to this, but am not sure about the efficiency. For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".

我可以想到几个解决方案,但我不确定效率。例如,我可以按用户(按升序排序)分组连接组,这将给我逗号分隔的字符串,如“1,2,4”,“1,2,3,5”和“1, 3" 。然后我可以命令我的输入列表升序并寻找与“1,3”的完美匹配。

Or I can do a count of the total number of users in a room AND containing both users 1 and 3. I will then select the room which has the count of users equal to two.

或者我可以计算一个房间中的用户总数并包含用户1和3.然后,我将选择具有等于2的用户数的房间。

Note I want to most efficient way, or at least a way that scales up to millions of users and rooms. Each room will have around 25 users. Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?

注意我想要最有效的方式,或者至少可以扩展到数百万用户和房间。每个房间将有大约25个用户。我想要考虑的另一件事是如何将此列表传递给数据库。我应该通过连接AND userid = 1 AND userid = 3 AND userid = 5来构建查询,依此类推?或者有没有办法将值作为数组传递给存储过程?

Any help would be appreciated.

任何帮助,将不胜感激。

2 个解决方案

#1


2  

For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".

例如,我可以按用户(按升序排序)分组连接组,这将给我逗号分隔的字符串,如“1,2,4”,“1,2,3,5”和“1, 3" 。然后我可以命令我的输入列表升序并寻找与“1,3”的完美匹配。

First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.

首先,提出建议,以提高您作为开发人员的功能水平。在CSV方面,不要再考虑数据和解决方案了。它限制您在电子表格术语中思考,并阻止您考虑关系数据术语。您不需要构造字符串,然后匹配字符串,当数据在数据库中时,您可以在那里匹配它。

Solution

Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.

现在,在关系数据术语中,您究竟想要什么?您希望与您的参数用户列表匹配的用户数最多的房间。那是对的吗 ?如果是这样,代码很简单。

You haven't given the tables. I will assume room, user, room_user, with deadly ids on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.

你还没有给出表格。我会假设房间,用户,room_user,前两个有致命的ID,第三个有复合键。我可以给你SQL解决方案,你将不得不弄清楚如何在非SQL中做到这一点。

Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?

我想要考虑的另一件事是如何将此列表传递给数据库。我应该通过连接AND userid = 1 AND userid = 3 AND userid = 5来构建查询,依此类推?或者有没有办法将值作为数组传递给存储过程?

  1. To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm @user_list. (Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)

    要将列表传递给存储过程,因为它需要一个调用parm,其长度是可变的,您必须创建一个CSV用户列表。我们称之为parm @user_list。 (注意,这不是考虑数据,即将列表传递给单个parm中的proc,因为否则无法将未知数量的已识别用户传递给proc。)

  2. Since you constructed the @user_list on the client, you may as well compute @user_count (the number of members in the list) while you are at it, on the client, and pass that to the proc.

    由于您在客户端上构建了@user_list,因此您可以在客户端计算@user_count(列表中的成员数),并将其传递给proc。

Something like:

就像是:

CREATE PROC room_user_match_sp (
    @user_list    CHAR(255),
    @user_count   INT
    ...
    )
AS
    -- validate parms, etc
    ...
SELECT  room_id,
        match_count,
        match_count / @user_count * 100 AS match_pct
    FROM  (
        SELECT  room_id,
                COUNT(user_id) AS match_count -- no of users matched
            FROM room_user
            WHERE user_id IN ( @user_list )
            GROUP BY room_id                  -- get one row per room
            ) AS match_room                   -- has any matched users
    WHERE match_count = MAX( match_count )    -- remove this while testing

It is not clear, if you want full matches only. In that case, use:

目前尚不清楚,如果您只想要完整匹配。在这种情况下,使用:

    WHERE match_count = @user_count

Expectation

You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.

您已经要求基于proc的解决方案,所以我已经给出了。是的,这是最快的。但请记住,对于这种需求和解决方案,您可以在客户端上构造SQL字符串,并以通常的方式在“服务器”上执行它,而不使用proc。这里的proc更快,因为编译代码并删除了该步骤,而不是每次客户端使用SQL字符串调用“服务器”时执行的步骤。

The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.

我在这里提出的观点是,使用合理的关系形式的数据,您可以使用单个SELECT语句获得您正在寻找的结果,您不必乱用工作表或临时表或中间步骤,需要一个过程。这里,proc不是必需的,因为性能原因你正在实现proc。

I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.

我提出这一点是因为从你的问题中可以清楚地看出,你对解决方案的期望是“哎呀,我不能直接得到结果,我先处理数据,我已做好准备并愿意这样做”。仅当数据不是关系数据时才需要这样的中间工作步骤。

#2


0  

Maybe not the most efficient SQL, but something like:

也许不是最有效的SQL,但是类似于:

SELECT x.room_id,
       SUM(x.occupants) AS occupants,
       SUM(x.selectees) AS selectees,
       SUM(x.selectees) / SUM(x.occupants) as percentage
  FROM ( SELECT room_id, 
                COUNT(user_id) AS occupants,
                NULL AS selectees
           FROM Rooms 
          GROUP BY room_id
         UNION
         SELECT room_id, 
                NULL AS occupants,
                COUNT(user_id) AS selectees
           FROM Rooms 
          WHERE user_id IN (1,3) 
          GROUP BY room_id
        ) x
 GROUP BY x.room_id
 ORDER BY percentage DESC

will give you a list of rooms ordered by the "best fit" percentage

会给你一个按“最合适”百分比订购的房间清单

ie. it works out a percentage of fulfilment based on the number of people in the room, and the number of people from your set who are in the room

即。它根据房间里的人数和房间里的人数来计算一定比例的履行情况

#1


2  

For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".

例如,我可以按用户(按升序排序)分组连接组,这将给我逗号分隔的字符串,如“1,2,4”,“1,2,3,5”和“1, 3" 。然后我可以命令我的输入列表升序并寻找与“1,3”的完美匹配。

First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.

首先,提出建议,以提高您作为开发人员的功能水平。在CSV方面,不要再考虑数据和解决方案了。它限制您在电子表格术语中思考,并阻止您考虑关系数据术语。您不需要构造字符串,然后匹配字符串,当数据在数据库中时,您可以在那里匹配它。

Solution

Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.

现在,在关系数据术语中,您究竟想要什么?您希望与您的参数用户列表匹配的用户数最多的房间。那是对的吗 ?如果是这样,代码很简单。

You haven't given the tables. I will assume room, user, room_user, with deadly ids on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.

你还没有给出表格。我会假设房间,用户,room_user,前两个有致命的ID,第三个有复合键。我可以给你SQL解决方案,你将不得不弄清楚如何在非SQL中做到这一点。

Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?

我想要考虑的另一件事是如何将此列表传递给数据库。我应该通过连接AND userid = 1 AND userid = 3 AND userid = 5来构建查询,依此类推?或者有没有办法将值作为数组传递给存储过程?

  1. To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm @user_list. (Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)

    要将列表传递给存储过程,因为它需要一个调用parm,其长度是可变的,您必须创建一个CSV用户列表。我们称之为parm @user_list。 (注意,这不是考虑数据,即将列表传递给单个parm中的proc,因为否则无法将未知数量的已识别用户传递给proc。)

  2. Since you constructed the @user_list on the client, you may as well compute @user_count (the number of members in the list) while you are at it, on the client, and pass that to the proc.

    由于您在客户端上构建了@user_list,因此您可以在客户端计算@user_count(列表中的成员数),并将其传递给proc。

Something like:

就像是:

CREATE PROC room_user_match_sp (
    @user_list    CHAR(255),
    @user_count   INT
    ...
    )
AS
    -- validate parms, etc
    ...
SELECT  room_id,
        match_count,
        match_count / @user_count * 100 AS match_pct
    FROM  (
        SELECT  room_id,
                COUNT(user_id) AS match_count -- no of users matched
            FROM room_user
            WHERE user_id IN ( @user_list )
            GROUP BY room_id                  -- get one row per room
            ) AS match_room                   -- has any matched users
    WHERE match_count = MAX( match_count )    -- remove this while testing

It is not clear, if you want full matches only. In that case, use:

目前尚不清楚,如果您只想要完整匹配。在这种情况下,使用:

    WHERE match_count = @user_count

Expectation

You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.

您已经要求基于proc的解决方案,所以我已经给出了。是的,这是最快的。但请记住,对于这种需求和解决方案,您可以在客户端上构造SQL字符串,并以通常的方式在“服务器”上执行它,而不使用proc。这里的proc更快,因为编译代码并删除了该步骤,而不是每次客户端使用SQL字符串调用“服务器”时执行的步骤。

The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.

我在这里提出的观点是,使用合理的关系形式的数据,您可以使用单个SELECT语句获得您正在寻找的结果,您不必乱用工作表或临时表或中间步骤,需要一个过程。这里,proc不是必需的,因为性能原因你正在实现proc。

I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.

我提出这一点是因为从你的问题中可以清楚地看出,你对解决方案的期望是“哎呀,我不能直接得到结果,我先处理数据,我已做好准备并愿意这样做”。仅当数据不是关系数据时才需要这样的中间工作步骤。

#2


0  

Maybe not the most efficient SQL, but something like:

也许不是最有效的SQL,但是类似于:

SELECT x.room_id,
       SUM(x.occupants) AS occupants,
       SUM(x.selectees) AS selectees,
       SUM(x.selectees) / SUM(x.occupants) as percentage
  FROM ( SELECT room_id, 
                COUNT(user_id) AS occupants,
                NULL AS selectees
           FROM Rooms 
          GROUP BY room_id
         UNION
         SELECT room_id, 
                NULL AS occupants,
                COUNT(user_id) AS selectees
           FROM Rooms 
          WHERE user_id IN (1,3) 
          GROUP BY room_id
        ) x
 GROUP BY x.room_id
 ORDER BY percentage DESC

will give you a list of rooms ordered by the "best fit" percentage

会给你一个按“最合适”百分比订购的房间清单

ie. it works out a percentage of fulfilment based on the number of people in the room, and the number of people from your set who are in the room

即。它根据房间里的人数和房间里的人数来计算一定比例的履行情况