如何从表中随机选择唯一的行对?

时间:2021-08-02 09:29:39

I have two tables like these:

我有两张这样的桌子:

CREATE TABLE people (
    id INT NOT NULL,
    PRIMARY KEY (id)
)

CREATE TABLE pairs (
    person_a_id INT,
    person_b_id INT,
    FOREIGN KEY (person_a_id) REFERENCES people(id),
    FOREIGN KEY (person_b_id) REFERENCES people(id) 
)

I want to select pairs of people at random from the people table, and after selecting them I add the randomly select pair to the pairs table. person_a_id always refers to the person with the lower id of the pair (since the order of the pair is not relevant).

我想从people表中随机选择对人,在选择之后,我将随机选择对添加到pair表中。person_a_id总是指具有较低id的人(因为这一对的顺序不相关)。

The thing is that I never want to select the same pair twice, so I need to check the pairs table before I return my randomly selected pair.

问题是我从来不想选择相同的一对两次,所以我需要在返回随机选择的一对之前检查对表。

Is it possible to do this using just a single SQL query in a reasonably efficient and elegant manner?

是否可以仅使用一个SQL查询,以一种合理高效和优雅的方式来实现这一点?

(I'm doing this using the Java Persistence API, but hopefully I'll be able to translate any answers into JPA code)

(我使用Java持久性API来实现这一点,但是我希望能够将任何答案转换为JPA代码)

2 个解决方案

#1


4  

select a.id, b.id
from people1 a
inner join people1 b on a.id < b.id
where not exists (
    select *
    from pairs1 c
    where c.person_a_id = a.id
      and c.person_b_id = b.id)
order by a.id * rand()
limit 1;

Limit 1 returns just one pair if you are "drawing lots" one at a time. Otherwise, up the limit to however many pairs you need.

如果你一次只画一对,限制1只会返回一对。否则,将上限提高到您需要的任何一对。

The above query assumes that you can get

上面的查询假定您可以获得

1 - 2
2 - 7

and that the pairing 2 - 7 is valid since it doesn't exist, even if 2 is featured again. If you only want a person to feature in only one pair ever, then

2 - 7配对是有效的,因为它不存在,即使2再次出现。如果你只希望一个人只出现在一对中,那么。

select a.id, b.id
from people1 a
inner join people1 b on a.id < b.id
where not exists (
    select *
    from pairs1 c
    where c.person_a_id in (a.id, b.id))
  and not exists (
    select *
    from pairs1 c
    where c.person_b_id in (a.id, b.id))
order by a.id * rand()
limit 1;

If multiple pairs are to be generated in one single query, AND the destination table is still empty, you could use this single query. Take note that LIMIT 6 returns only 3 pairs.

如果要在一个查询中生成多个对,而目标表仍然为空,那么可以使用这个查询。注意,限制6只返回3对。

select min(a) a, min(b) b
from
(
    select
      case when mod(@p,2) = 1 then id end a,
      case when mod(@p,2) = 0 then id end b,
      @p:=@p+1 grp
    from (
        select id
        from (select @p:=1) p, people1
        order by rand()
        limit 6
    ) x
) y
group by floor(grp/2)

#2


1  

This cannot be accomplished in a single-query set-based approach because your set will not have knowledge of what pairs are inserted into the pairs table.

这不能通过基于单查询集的方法实现,因为您的集合不知道将哪些对插入到pair表中。

Instead, you should loop

相反,你应该循环

WHILE EXISTS(SELECT * FROM people 
    WHERE id NOT IN (SELECT person_a_id FROM pairs) 
    AND id NOT IN (SELECT person_b_id FROM pairs) 

This will loop while there are unmatched people. Then you should two random numbers from 1 to the CNT(*) of that table which gives you the number of unmatched people... if you get the same number twice, roll again. (IF you're worried about this, randomize numbers from the two halves of the set... but then you're losing some randomness based on your sort criteria)

当有不匹配的人时,这个循环。那么你应该从1到那个表格的CNT(*)随机取两个数字,这个数字告诉你不匹配的人的数量……如果你得到相同的数字两次,再滚一次。(如果你担心这个,随机数字从两半的集合……但是根据你的排序标准你会失去一些随机性

Pair those people.

两人。

Wash, rinse, repeat.... Your only "redo" will be when you generate the same random number twice... more likely as you get few people but still only a 25% chance at most (much better than 1/n^2)

洗、漂洗、重复....您唯一的“重做”将是当您生成相同的随机数两次……更有可能得到一些人最多但仍只有25%(比1 / n ^ 2)

#1


4  

select a.id, b.id
from people1 a
inner join people1 b on a.id < b.id
where not exists (
    select *
    from pairs1 c
    where c.person_a_id = a.id
      and c.person_b_id = b.id)
order by a.id * rand()
limit 1;

Limit 1 returns just one pair if you are "drawing lots" one at a time. Otherwise, up the limit to however many pairs you need.

如果你一次只画一对,限制1只会返回一对。否则,将上限提高到您需要的任何一对。

The above query assumes that you can get

上面的查询假定您可以获得

1 - 2
2 - 7

and that the pairing 2 - 7 is valid since it doesn't exist, even if 2 is featured again. If you only want a person to feature in only one pair ever, then

2 - 7配对是有效的,因为它不存在,即使2再次出现。如果你只希望一个人只出现在一对中,那么。

select a.id, b.id
from people1 a
inner join people1 b on a.id < b.id
where not exists (
    select *
    from pairs1 c
    where c.person_a_id in (a.id, b.id))
  and not exists (
    select *
    from pairs1 c
    where c.person_b_id in (a.id, b.id))
order by a.id * rand()
limit 1;

If multiple pairs are to be generated in one single query, AND the destination table is still empty, you could use this single query. Take note that LIMIT 6 returns only 3 pairs.

如果要在一个查询中生成多个对,而目标表仍然为空,那么可以使用这个查询。注意,限制6只返回3对。

select min(a) a, min(b) b
from
(
    select
      case when mod(@p,2) = 1 then id end a,
      case when mod(@p,2) = 0 then id end b,
      @p:=@p+1 grp
    from (
        select id
        from (select @p:=1) p, people1
        order by rand()
        limit 6
    ) x
) y
group by floor(grp/2)

#2


1  

This cannot be accomplished in a single-query set-based approach because your set will not have knowledge of what pairs are inserted into the pairs table.

这不能通过基于单查询集的方法实现,因为您的集合不知道将哪些对插入到pair表中。

Instead, you should loop

相反,你应该循环

WHILE EXISTS(SELECT * FROM people 
    WHERE id NOT IN (SELECT person_a_id FROM pairs) 
    AND id NOT IN (SELECT person_b_id FROM pairs) 

This will loop while there are unmatched people. Then you should two random numbers from 1 to the CNT(*) of that table which gives you the number of unmatched people... if you get the same number twice, roll again. (IF you're worried about this, randomize numbers from the two halves of the set... but then you're losing some randomness based on your sort criteria)

当有不匹配的人时,这个循环。那么你应该从1到那个表格的CNT(*)随机取两个数字,这个数字告诉你不匹配的人的数量……如果你得到相同的数字两次,再滚一次。(如果你担心这个,随机数字从两半的集合……但是根据你的排序标准你会失去一些随机性

Pair those people.

两人。

Wash, rinse, repeat.... Your only "redo" will be when you generate the same random number twice... more likely as you get few people but still only a 25% chance at most (much better than 1/n^2)

洗、漂洗、重复....您唯一的“重做”将是当您生成相同的随机数两次……更有可能得到一些人最多但仍只有25%(比1 / n ^ 2)