如何确定表中是否存在一组数据,给定该组行中应该出现的数据?

时间:2022-03-04 23:05:17

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.

我正在将数据写到一个表中,并为所编写的每批数据分配一个“组id”。为了举例说明,请考虑下面的表。

GroupId  Value
-------  -----
      1      a
      1      b
      1      c
      2      a
      2      b
      3      a
      3      b
      3      c
      3      d

In this example, there are three groups of data, each with similar but varying values.

在这个例子中,有三组数据,每个数据都有相似但不同的值。

How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.

如何查询此表以查找包含给定值集的组?例如,如果我查询(a,b,c),结果应该是组1。类似地,对(b,a)的查询应该导致组2,对(a, b, c, e)的查询应该导致空集。

I can write a stored procedure that performs the following steps:

我可以编写执行以下步骤的存储过程:

  • select distinct GroupId from Groups -- and store locally
  • 从组中选择不同的GroupId——并在本地存储
  • for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
  • 对于每个不同的GroupId:在输入值和表值(对于组)之间执行一个set-difference(除了),反之亦然
  • return the GroupId if both set-difference operations produced empty sets
  • 如果两个集差操作产生空集,返回GroupId。

This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?

这似乎有点过分,我希望利用SQL中的其他一些命令来简化。是否有更简单的方法在此上下文中执行集比较,或选择包含查询确切输入值的组ID ?

2 个解决方案

#1


4  

This is a set-within-sets query. I like to solve it using group by and having:

这是一个集内集查询。我喜欢用group by解决,并且有:

select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
       sum(case when value = 'b' then 1 else 0 end) > 0 and
       sum(case when value = 'c' then 1 else 0 end) > 0 and
       sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;

The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.

have子句中的前三个条件检查每个元素是否存在。最后一个条件检查是否没有其他值。对于您正在查找的值的各种排除和包含条件,此方法非常灵活。

EDIT:

编辑:

If you want to pass in a list, you can use:

如果你想传递一个列表,你可以使用:

with thelist as (
      select 'a' as value union all
      select 'b' union all
      select 'c'
     )
select groupid
from GroupValues gv left outer join
     thelist
     on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
       count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);

Here the having clause counts the number of matching values and makes sure that this is the same size as the list.

这里的have子句计算匹配值的数量,并确保它与列表的大小相同。

EDIT: query compile failed because missing the table alias. updated with right table alias.

编辑:查询编译失败,因为缺少表别名。用右表别名更新。

#2


1  

This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.

这有点丑,但很管用。在更大的数据集中,我不确定性能会是什么样子,但是#GroupValues的嵌套实例会在主表中关闭GroupID,所以我认为只要在GroupID上有一个好的索引,就不会太糟糕。

If      Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create  Table #GroupValues (GroupID Int, Val Varchar(10));
Insert  #GroupValues (GroupID, Val)
Values  (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');

If      Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create  Table #FindValues (Val Varchar(10));
Insert  #FindValues (Val)
Values  ('a'),('b'),('c');

Select  Distinct gv.GroupID
From   (Select  Distinct GroupID 
        From    #GroupValues) gv
Where   Not Exists (Select  1
                    From    #FindValues fv2
                    Where   Not Exists (Select  1
                                        From    #GroupValues gv2
                                        Where   gv.GroupID = gv2.GroupID
                                        And     fv2.Val = gv2.Val))
And     Not Exists (Select  1
                    From    #GroupValues gv3
                    Where   gv3.GroupID = gv.GroupID
                    And     Not Exists (Select  1
                                        From    #FindValues fv3
                                        Where   gv3.Val = fv3.Val))

#1


4  

This is a set-within-sets query. I like to solve it using group by and having:

这是一个集内集查询。我喜欢用group by解决,并且有:

select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
       sum(case when value = 'b' then 1 else 0 end) > 0 and
       sum(case when value = 'c' then 1 else 0 end) > 0 and
       sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;

The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.

have子句中的前三个条件检查每个元素是否存在。最后一个条件检查是否没有其他值。对于您正在查找的值的各种排除和包含条件,此方法非常灵活。

EDIT:

编辑:

If you want to pass in a list, you can use:

如果你想传递一个列表,你可以使用:

with thelist as (
      select 'a' as value union all
      select 'b' union all
      select 'c'
     )
select groupid
from GroupValues gv left outer join
     thelist
     on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
       count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);

Here the having clause counts the number of matching values and makes sure that this is the same size as the list.

这里的have子句计算匹配值的数量,并确保它与列表的大小相同。

EDIT: query compile failed because missing the table alias. updated with right table alias.

编辑:查询编译失败,因为缺少表别名。用右表别名更新。

#2


1  

This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.

这有点丑,但很管用。在更大的数据集中,我不确定性能会是什么样子,但是#GroupValues的嵌套实例会在主表中关闭GroupID,所以我认为只要在GroupID上有一个好的索引,就不会太糟糕。

If      Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create  Table #GroupValues (GroupID Int, Val Varchar(10));
Insert  #GroupValues (GroupID, Val)
Values  (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');

If      Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create  Table #FindValues (Val Varchar(10));
Insert  #FindValues (Val)
Values  ('a'),('b'),('c');

Select  Distinct gv.GroupID
From   (Select  Distinct GroupID 
        From    #GroupValues) gv
Where   Not Exists (Select  1
                    From    #FindValues fv2
                    Where   Not Exists (Select  1
                                        From    #GroupValues gv2
                                        Where   gv.GroupID = gv2.GroupID
                                        And     fv2.Val = gv2.Val))
And     Not Exists (Select  1
                    From    #GroupValues gv3
                    Where   gv3.GroupID = gv.GroupID
                    And     Not Exists (Select  1
                                        From    #FindValues fv3
                                        Where   gv3.Val = fv3.Val))