如何在T-SQL中以分层格式有效地分组数据?

时间:2021-09-10 01:27:31

I have data like this:

我有这样的数据:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    60
2.1.1  |    15
15.9   |    24
16     |     5
19.1   |    40
19.1.1 |     8
19.1.2 |    12
19.2   |     6
19.2.1 |    21
19.2.2 |    15
19.2.3 |     2
19.3   |    64

I would like to group based on the first two levels of the Task, producing this result:

我想基于Task的前两个级别进行分组,产生以下结果:

Task   | Hours
1.1    |    40
2      |    40
2.1    |    75
15.9   |    24
16     |     5
19.1   |    60
19.2   |    44
19.3   |    64

I want the 16 to not roll up what's beneath it, but I need all the other levels to roll up. This is SQL Server 2005. I would normally do a split on the decimal, and break it out that way, but I was wondering if there's a better way to do it in SQL.

我希望16不要卷起它下面的东西,但我需要所有其他级别来卷起来。这是SQL Server 2005.我通常会对小数进行拆分,并以这种方式进行拆分,但我想知道是否有更好的方法在SQL中执行此操作。

4 个解决方案

#1


2  

Is changing the model an option? If your task column is really meant to represent a hierarchy, you should really be representing the hierarchy properly in your relational model.

改变模型是一种选择吗?如果您的任务列真的是要表示层次结构,那么您应该在关系模型中正确地表示层次结构。

If the number of levels deep is fixed at three, another option might be to add three columns to represent each of the "parts" of the task column independently.

如果深度级别固定为3,则另一个选项可能是添加三列以独立地表示任务列的每个“部分”。

If that's not an option, I think you can achieve this with a series of CASE statements that parse the string (plus SUM and GROUP BY).

如果那不是一个选项,我认为你可以用一系列解析字符串的CASE语句(加上SUM和GROUP BY)来实现这一点。

UPDATE:

更新:

Ok, this seemed like a fun challenge, so I came up with this:

好吧,这似乎是一个有趣的挑战,所以我提出了这个:

SELECT
    main_task,
    SUM(hours)
FROM
    (
    SELECT      
        task,
        CASE 
            WHEN 
                LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) = CHARINDEX ('.', task) THEN task
                ELSE LEFT(task, LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) - 1)
            END main_task,
        hours
    FROM 
        #temp
    ) sub
GROUP BY 
      main_task

#2


1  

Another route is to add some computed columns which break the various task levels apart, then group and sum as you wish.

另一种方法是添加一些计算列,将各个任务级别分开,然后根据需要进行分组和求和。

#3


1  

Assuming the structure of the field task is consistent, you could use the following

假设字段任务的结构是一致的,您可以使用以下内容

select left(task,4) as Task,sum(hours) as Hours
from table
group by left(task,4)

Here is a slightly modified version

这是一个稍微修改过的版本

select LEFT(task,charindex('.',task+'.')+1),SUM(hours)
from test1
group by LEFT(task,charindex('.',task+'.')+1)

#4


1  

I was thinking about this on my drive home, and I wanted to propose this solution:

我在开车回家的时候想到了这个问题,我想提出这个解决方案:

Create a table that stores the hierarchy, and then do a join grabbing the task's parent.

创建一个存储层次结构的表,然后执行连接以获取任务的父级。

TaskStructureTable:

TaskStructureTable:

task  | task_group
1     | 1
1.1   | 1.1
1.1.1 | 1.1
1.1.2 | 1.1
1.1.3 | 1.1
1.2   | 1.2
1.2.1 | 1.2

Then I could do something like this:

然后我可以做这样的事情:

SELECT SUM(d.Hours) AS "Hours", t.task_group
FROM Data d
JOIN TaskStructureTable t ON d.Task = t.task

Think this would be faster than doing CHARINDEX? (yes, I can measure and know for sure)

认为这比做CHARINDEX更快? (是的,我可以测量并确定)

#1


2  

Is changing the model an option? If your task column is really meant to represent a hierarchy, you should really be representing the hierarchy properly in your relational model.

改变模型是一种选择吗?如果您的任务列真的是要表示层次结构,那么您应该在关系模型中正确地表示层次结构。

If the number of levels deep is fixed at three, another option might be to add three columns to represent each of the "parts" of the task column independently.

如果深度级别固定为3,则另一个选项可能是添加三列以独立地表示任务列的每个“部分”。

If that's not an option, I think you can achieve this with a series of CASE statements that parse the string (plus SUM and GROUP BY).

如果那不是一个选项,我认为你可以用一系列解析字符串的CASE语句(加上SUM和GROUP BY)来实现这一点。

UPDATE:

更新:

Ok, this seemed like a fun challenge, so I came up with this:

好吧,这似乎是一个有趣的挑战,所以我提出了这个:

SELECT
    main_task,
    SUM(hours)
FROM
    (
    SELECT      
        task,
        CASE 
            WHEN 
                LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) = CHARINDEX ('.', task) THEN task
                ELSE LEFT(task, LEN(task) + 1 - CHARINDEX('.', REVERSE(task)) - 1)
            END main_task,
        hours
    FROM 
        #temp
    ) sub
GROUP BY 
      main_task

#2


1  

Another route is to add some computed columns which break the various task levels apart, then group and sum as you wish.

另一种方法是添加一些计算列,将各个任务级别分开,然后根据需要进行分组和求和。

#3


1  

Assuming the structure of the field task is consistent, you could use the following

假设字段任务的结构是一致的,您可以使用以下内容

select left(task,4) as Task,sum(hours) as Hours
from table
group by left(task,4)

Here is a slightly modified version

这是一个稍微修改过的版本

select LEFT(task,charindex('.',task+'.')+1),SUM(hours)
from test1
group by LEFT(task,charindex('.',task+'.')+1)

#4


1  

I was thinking about this on my drive home, and I wanted to propose this solution:

我在开车回家的时候想到了这个问题,我想提出这个解决方案:

Create a table that stores the hierarchy, and then do a join grabbing the task's parent.

创建一个存储层次结构的表,然后执行连接以获取任务的父级。

TaskStructureTable:

TaskStructureTable:

task  | task_group
1     | 1
1.1   | 1.1
1.1.1 | 1.1
1.1.2 | 1.1
1.1.3 | 1.1
1.2   | 1.2
1.2.1 | 1.2

Then I could do something like this:

然后我可以做这样的事情:

SELECT SUM(d.Hours) AS "Hours", t.task_group
FROM Data d
JOIN TaskStructureTable t ON d.Task = t.task

Think this would be faster than doing CHARINDEX? (yes, I can measure and know for sure)

认为这比做CHARINDEX更快? (是的,我可以测量并确定)