db2 表关联查询

时间:2021-07-13 05:36:07

今天在MapReduce的练习中看到了一个题目:

file:

CHILD      PARENT
---------- ----------
tom lucy
tom jack
jone lucy
jone jack
lucy mary
lucy ben
jack alice
jack jesse
terry alice
terry jesse
philip terry
philip alma
mark terry
mark alma

输出结果要求:

GRANDCHILD GRANDPARENT
---------- -----------
jone mary
jone ben
jone alice
jone jesse
mark alice
mark jesse
philip alice
philip jesse
tom mary
tom ben
tom alice
tom jesse

我在思考,这个如果是DB2的一个表,应该能通过表连接来实现这个要求。于是生成表parent:

[db2inst1@win ~]$ db2 "select * from parent"

CHILD      PARENT
---------- ----------
tom lucy
tom jack
jone lucy
jone jack
lucy mary
lucy ben
jack alice
jack jesse
terry alice
terry jesse
philip terry
philip alma
mark terry
mark alma 14 record(s) selected.

要达到这样的结果,一定要用到表的hash join。下面是我的SQL实现:

[db2inst1@win ~]$ db2 "select u.child GRANDCHILD, b.parent GRANDPARENT from (select * from parent where parent in (select child from parent)) as u ,(select * from parent where child in (select parent from parent)) as b where u.parent=b.child order by u.child"

DB2的优化器重写成这样:

Optimized Statement:
-------------------
SELECT
DISTINCT Q1.CHILD AS "GRANDCHILD",
Q3.PARENT AS "GRANDPARENT",
Q3.CHILD,
Q1.PARENT
FROM
DB2INST1.PARENT AS Q1,
DB2INST1.PARENT AS Q2,
DB2INST1.PARENT AS Q3,
DB2INST1.PARENT AS Q4
WHERE
(Q1.PARENT = Q2.CHILD) AND
(Q2.CHILD = Q4.PARENT) AND
(Q2.CHILD = Q3.CHILD)
ORDER BY
Q1.CHILD

关于SQL要怎么优化这一方面还有很多不足。。。