加入多个/一组条目

时间:2022-11-20 19:16:13

I would like to combine the entries from the two tables below to match obj to class via attr.

我想结合下面两个表中的条目,通过attr将obj与class匹配。

I have many objects with different names, dates, and other information in a single table (not shown). Each of these objects can fall into classes based on their attributes. The mapping of objects obj to each of their attributes attr is in Table A. The mapping of classes class to attributes attr is in Table B.

我在一个表(未显示)中有许多具有不同名称,日期和其他信息的对象。这些对象中的每一个都可以根据其属性归入类中。对象obj到它们的每个属性attr的映射在表A中。类类到属性attr的映射在表B中。

I would like to map each object obj to its matching class class to give the desired output shown in Table C.

我想将每个对象obj映射到其匹配的类类,以提供表C中所示的所需输出。

In some ways, this seems like a join operation on groups. Is there a way to do this in standard sql and/or with Google BigQuery?

在某些方面,这似乎是对群组的连接操作。有没有办法在标准sql和/或Google BigQuery中执行此操作?

An important point -- both attr of a class and of an obj are not ordered.

重要的一点 - 一个类和一个obj的attr都没有被排序。

Table A:

-------------------
|  obj  |  attr   | 
-------------------
|  obj1 |     I   |
|  obj1 |     II  |  
|  obj2 |     I   |
|  obj2 |     II  |
|  obj3 |     I   |
|  obj3 |     II  |  
|  obj3 |     II  |
|  obj4 |     III |
|  obj4 |     I   |
-------------------

Table B:

-------------------
| attr  |  class  | 
-------------------
|  I    |    A    |
|  II   |    A    |  
|  I    |    B    |
|  III  |    B    |
-------------------

Desired output (Table C):

期望的输出(表C):

-----------------------
|  obj     |  class   | 
-----------------------
|  obj1    |    A     |
|  obj2    |    A     |
|  obj4    |    B     |
-----------------------

(Note that obj3 doesn't fall into class A because it has an additional II attr.)

(注意,obj3不属于A类,因为它有一个额外的II attr。)

1 个解决方案

#1


2  

Below is for BigQuery Standard SQL

以下是BigQuery Standard SQL

#standardSQL
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)  

You can test / play with it using dummy data from your question as below

您可以使用您问题中的虚拟数据进行测试/播放,如下所示

#standardSQL
WITH `project.dataset.TableA` AS (
  SELECT 'obj1' obj, 'I' attr UNION ALL
  SELECT 'obj1', 'II' UNION ALL
  SELECT 'obj2', 'I' UNION ALL
  SELECT 'obj2', 'II' UNION ALL
  SELECT 'obj3', 'I' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj4', 'III' UNION ALL
  SELECT 'obj4', 'I' 
), `project.dataset.TableB` AS (
  SELECT 'I' attr, 'A' class UNION ALL
  SELECT 'II', 'A' UNION ALL
  SELECT 'I', 'B' UNION ALL
  SELECT 'III', 'B' 
)
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)   

Output is:

obj     class    
----    -----
obj1    A    
obj2    A    
obj4    B    

#1


2  

Below is for BigQuery Standard SQL

以下是BigQuery Standard SQL

#standardSQL
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)  

You can test / play with it using dummy data from your question as below

您可以使用您问题中的虚拟数据进行测试/播放,如下所示

#standardSQL
WITH `project.dataset.TableA` AS (
  SELECT 'obj1' obj, 'I' attr UNION ALL
  SELECT 'obj1', 'II' UNION ALL
  SELECT 'obj2', 'I' UNION ALL
  SELECT 'obj2', 'II' UNION ALL
  SELECT 'obj3', 'I' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj3', 'II' UNION ALL
  SELECT 'obj4', 'III' UNION ALL
  SELECT 'obj4', 'I' 
), `project.dataset.TableB` AS (
  SELECT 'I' attr, 'A' class UNION ALL
  SELECT 'II', 'A' UNION ALL
  SELECT 'I', 'B' UNION ALL
  SELECT 'III', 'B' 
)
SELECT obj, class FROM (
  SELECT obj, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableA`
  GROUP BY obj
) JOIN (
  SELECT class, STRING_AGG(attr ORDER BY attr) attr
  FROM `project.dataset.TableB`
  GROUP BY class
) USING(attr)   

Output is:

obj     class    
----    -----
obj1    A    
obj2    A    
obj4    B