如何删除复杂连接中的重复项

时间:2021-02-28 04:40:54

I've an EMPLOYEE table which has built-in hierarchy structure as well (using manager column)

我有一个员工表,它也有内置的层次结构(使用管理器列)

如何删除复杂连接中的重复项

I've another REGION table which represents manager-region relation

我还有一个表示管理-区域关系的区域表

如何删除复杂连接中的重复项

I'm trying to create a SQL which will show which employees fall under which region by following up the hierarchy chain.

我正在尝试创建一个SQL,通过跟踪层次结构链来显示哪些员工属于哪个区域。

Constraints / Rules:

约束/规则:

  • It's possible that employee's immediate manager may not have a region - so I need to continue up the chain.
  • 有可能员工的直属经理没有区域——所以我需要继续沿着这个链条往上走。
  • I have guarantee that 4 level up the chain someone will have a region.

    我保证4级以上的人会有一个区域。

  • If region is found before the 4th level then use lower manager's region

    如果在第4级之前发现区域,则使用低级管理器区域

This is the naive SQL I came up with (but results have duplicates - 3rd Rule failed)

这是我提出的简单SQL(但是结果重复-第三条规则失败)

select distinct e.name, r.region
from employee e 
left outer join employee mgr1 on mgr1.id = e.manager 
left outer join employee mgr2 on mgr2.id = mgr1.manager 
left outer join employee mgr3 on mgr3.id = mgr2.manager 
left outer join employee mgr4 on mgr4.id = mgr3.manager 
left outer join REGION r on 
      (  r.id = mgr1.id 
      or r.id = mgr2.id 
      or r.id = mgr3.id 
      or r.id = mgr4.id  ) 

where e.IS_MANAGER = 'N'; //only interested in users for now; assume a flag

And this is the ResultSet:

这是结果集:

如何删除复杂连接中的重复项

How do I conditionally stop the left outer join if I already found a region?

如果已经找到一个区域,如何有条件地停止左外部连接?

3 个解决方案

#1


1  

Try this:

试试这个:

select distinct e.name, COALESCE(r1.region, r2.region, r3.region, r4.region, 'No Region') region
from employee e left outer join 
                region r1 on e.manager = r1.id
left outer join employee mgr1 on mgr1.id = e.manager left outer join
                region r2 on mgr1.manager = r2.id
left outer join employee mgr2 on mgr2.id = mgr1.manager left outer join
                region r3 on mgr2.manager = r3.id
left outer join employee mgr3 on mgr3.id = mgr2.manager left outer join
                region r4 on mgr3.manager = r4.id
where e.IS_MANAGER = 'N'; //only interested in users for now; assume a flag

I'm not sure the COALESCE function is supported on all mysql versions but you could find an equivalent (it returns the first non null argument).

我不确定所有mysql版本都支持合并函数,但是您可以找到一个等价的函数(它返回第一个非空参数)。

#2


1  

I had to modify your script a bit, but this works:

我不得不修改一下你的剧本,但这是可行的:

select distinct e.Name,
  CASE
    WHEN r1.RegionName IS NOT NULL THEN r1.RegionName
    WHEN r2.RegionName IS NOT NULL THEN r2.RegionName
    WHEN r3.RegionName IS NOT NULL THEN r3.RegionName
    WHEN r4.RegionName IS NOT NULL THEN r4.RegionName
    ELSE 'NA'
  END AS 'RegionName'
from employee e 
left outer join employee mgr1 on mgr1.id = e.Manager 
left outer join employee mgr2 on mgr2.id = mgr1.Manager 
left outer join employee mgr3 on mgr3.id = mgr2.Manager 
left outer join employee mgr4 on mgr4.id = mgr3.Manager 
left outer join Region r1 on r1.id = mgr1.RegionID 
left outer join Region r2 on r2.id = mgr2.RegionID 
left outer join Region r3 on r3.id = mgr3.RegionID 
left outer join Region r4 on r4.id = mgr4.RegionID 

where e.IS_MANAGER = 'N';

Here's the SQL Fiddle: http://sqlfiddle.com/#!9/93b45/5

下面是SQL Fiddle: http://sqlfiddle.com/#

#3


0  

DB2 (almost all versions) supports recursive CTEs, which are made to handle this kind of hierarchical data (some versions also support Oracles CONNECT BY, but I don't have any familiarity with that). Using it may make the connections easier to reason about:

DB2(几乎所有版本)都支持递归cte,它是用来处理这种分层数据的(有些版本也支持Oracles连接BY,但我对此并不熟悉)。使用它可以使联系更容易:

WITH Employee_Region AS (SELECT name, manager, CAST(null AS VARCHAR(2)) AS region
                         FROM Employee
                         WHERE is_manager = 'N'
                         UNION ALL
                         SELECT ER.name, Manager.manager, Region.regionName
                         FROM Employee_Region ER
                         JOIN Employee Manager
                           ON Manager.id = ER.manager
                         LEFT JOIN Region 
                                ON Region.id = Manager.regionId
                         WHERE ER.region IS NULL)
SELECT name, region
FROM Employee_Region
WHERE region IS NOT NULL

SQL Fiddle Example
(Fiddle base taken from @PhilWalton - thanks! PostgreSQL requires the RECURSIVE keyword, but DB2 doesn't)

SQL小提琴例子(小提琴基础取自@PhilWalton -谢谢!PostgreSQL需要递归关键字,但DB2不需要)

The query does start from the bottom (assuming you have a flag), but it's possible to reverse it and start from the top-level managers.
For the recursing section (everything in the CTE after UNION ALL):

查询确实从底部开始(假设您有一个标志),但是可以将其反转并从*管理器开始。递归部分(联合后CTE中的所有内容):

  • we self-JOIN to the Employee table to get the next manager up
  • 我们自己加入员工表,让下一个经理上来
  • because we don't know if the current manager has a region, we LEFT JOIN to that table.
  • 因为我们不知道当前管理器是否有一个区域,所以我们将JOIN放在该表中。
  • if we didn't find a region, try again, using the next manager up.
  • 如果我们没有找到一个区域,再试一次,使用下一个管理器。
  • we exclude all rows who, in the previous iteration, received a region (will grab the 'lowest' region). Otherwise, termination stops at the top-level manager.
  • 我们排除了在前一个迭代中接收到一个区域(将获取“最低”区域)的所有行。否则,终止将在*管理器处停止。

Finally, in the main query, we exclude those rows where an employee has no region. Mostly this will remove the intermediate rows generated wile searching until a manager with a region is found, although if some tree has no region (somehow) it would exclude them.
Using a WHERE is almost certainly cheaper than the hashing function required with DISTINCT, although I'm not sure what effect the recursive portion will have (if most immediate managers have a region, or within one hop, it's likely to perform better than doing four joins)

最后,在主查询中,我们排除那些员工没有区域的行。大多数情况下,这将删除生成的中间行,直到找到一个具有区域的管理器为止,尽管如果某个树没有区域(以某种方式),它将排除它们。使用WHERE几乎肯定比使用DISTINCT所需的散列函数要便宜,尽管我不确定递归部分会有什么影响(如果大多数直接管理器都有一个区域,或者在一个跃点内,它的性能可能比使用四个连接要好)

#1


1  

Try this:

试试这个:

select distinct e.name, COALESCE(r1.region, r2.region, r3.region, r4.region, 'No Region') region
from employee e left outer join 
                region r1 on e.manager = r1.id
left outer join employee mgr1 on mgr1.id = e.manager left outer join
                region r2 on mgr1.manager = r2.id
left outer join employee mgr2 on mgr2.id = mgr1.manager left outer join
                region r3 on mgr2.manager = r3.id
left outer join employee mgr3 on mgr3.id = mgr2.manager left outer join
                region r4 on mgr3.manager = r4.id
where e.IS_MANAGER = 'N'; //only interested in users for now; assume a flag

I'm not sure the COALESCE function is supported on all mysql versions but you could find an equivalent (it returns the first non null argument).

我不确定所有mysql版本都支持合并函数,但是您可以找到一个等价的函数(它返回第一个非空参数)。

#2


1  

I had to modify your script a bit, but this works:

我不得不修改一下你的剧本,但这是可行的:

select distinct e.Name,
  CASE
    WHEN r1.RegionName IS NOT NULL THEN r1.RegionName
    WHEN r2.RegionName IS NOT NULL THEN r2.RegionName
    WHEN r3.RegionName IS NOT NULL THEN r3.RegionName
    WHEN r4.RegionName IS NOT NULL THEN r4.RegionName
    ELSE 'NA'
  END AS 'RegionName'
from employee e 
left outer join employee mgr1 on mgr1.id = e.Manager 
left outer join employee mgr2 on mgr2.id = mgr1.Manager 
left outer join employee mgr3 on mgr3.id = mgr2.Manager 
left outer join employee mgr4 on mgr4.id = mgr3.Manager 
left outer join Region r1 on r1.id = mgr1.RegionID 
left outer join Region r2 on r2.id = mgr2.RegionID 
left outer join Region r3 on r3.id = mgr3.RegionID 
left outer join Region r4 on r4.id = mgr4.RegionID 

where e.IS_MANAGER = 'N';

Here's the SQL Fiddle: http://sqlfiddle.com/#!9/93b45/5

下面是SQL Fiddle: http://sqlfiddle.com/#

#3


0  

DB2 (almost all versions) supports recursive CTEs, which are made to handle this kind of hierarchical data (some versions also support Oracles CONNECT BY, but I don't have any familiarity with that). Using it may make the connections easier to reason about:

DB2(几乎所有版本)都支持递归cte,它是用来处理这种分层数据的(有些版本也支持Oracles连接BY,但我对此并不熟悉)。使用它可以使联系更容易:

WITH Employee_Region AS (SELECT name, manager, CAST(null AS VARCHAR(2)) AS region
                         FROM Employee
                         WHERE is_manager = 'N'
                         UNION ALL
                         SELECT ER.name, Manager.manager, Region.regionName
                         FROM Employee_Region ER
                         JOIN Employee Manager
                           ON Manager.id = ER.manager
                         LEFT JOIN Region 
                                ON Region.id = Manager.regionId
                         WHERE ER.region IS NULL)
SELECT name, region
FROM Employee_Region
WHERE region IS NOT NULL

SQL Fiddle Example
(Fiddle base taken from @PhilWalton - thanks! PostgreSQL requires the RECURSIVE keyword, but DB2 doesn't)

SQL小提琴例子(小提琴基础取自@PhilWalton -谢谢!PostgreSQL需要递归关键字,但DB2不需要)

The query does start from the bottom (assuming you have a flag), but it's possible to reverse it and start from the top-level managers.
For the recursing section (everything in the CTE after UNION ALL):

查询确实从底部开始(假设您有一个标志),但是可以将其反转并从*管理器开始。递归部分(联合后CTE中的所有内容):

  • we self-JOIN to the Employee table to get the next manager up
  • 我们自己加入员工表,让下一个经理上来
  • because we don't know if the current manager has a region, we LEFT JOIN to that table.
  • 因为我们不知道当前管理器是否有一个区域,所以我们将JOIN放在该表中。
  • if we didn't find a region, try again, using the next manager up.
  • 如果我们没有找到一个区域,再试一次,使用下一个管理器。
  • we exclude all rows who, in the previous iteration, received a region (will grab the 'lowest' region). Otherwise, termination stops at the top-level manager.
  • 我们排除了在前一个迭代中接收到一个区域(将获取“最低”区域)的所有行。否则,终止将在*管理器处停止。

Finally, in the main query, we exclude those rows where an employee has no region. Mostly this will remove the intermediate rows generated wile searching until a manager with a region is found, although if some tree has no region (somehow) it would exclude them.
Using a WHERE is almost certainly cheaper than the hashing function required with DISTINCT, although I'm not sure what effect the recursive portion will have (if most immediate managers have a region, or within one hop, it's likely to perform better than doing four joins)

最后,在主查询中,我们排除那些员工没有区域的行。大多数情况下,这将删除生成的中间行,直到找到一个具有区域的管理器为止,尽管如果某个树没有区域(以某种方式),它将排除它们。使用WHERE几乎肯定比使用DISTINCT所需的散列函数要便宜,尽管我不确定递归部分会有什么影响(如果大多数直接管理器都有一个区域,或者在一个跃点内,它的性能可能比使用四个连接要好)