根据聚合创建大小均匀的组

Probably a newbie question, but I'm looking to split our server inventory up into several evenly sized groups based on total database size, and am stumped figuring out how to group them. I think NTILE will work, maybe, but I just can't wrap my head around splitting the groups evenly. My example below is just ordering the servers randomly. I would like the results to be 3 groups of fairly even size (obviously won't be exact).

可能是一个新手问题,但我希望根据总数据库大小将我们的服务器库存分成几个大小相等的组,并且难以弄清楚如何对它们进行分组。我认为NTILE可能会起作用,但我无法将我的头围绕均匀地分开。我下面的例子是随机排序服务器。我希望结果是3组相当均匀的大小(显然不准确)。

Using SQL Server 2012. Any help is appreciated. Thanks.

使用SQL Server 2012.任何帮助表示赞赏。谢谢。

declare @Servers table (ServerName sysname, TotalSizeGB decimal (12,2))
insert into @Servers values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

select GroupNumber, sum(TotalSizeGB) as TotalSizeGB
from (
     select ServerName, sum(TotalSizeGB) as TotalSizeGB, ntile(3) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers
          ) x 
     group by ServerName
     ) y
group by GroupNumber

The expected output here would be three groups of about 2000GB each. I expect it won't be exact, but at least close. If grouping per server, it might look like this:

这里的预期产量将是三组,每组约2000GB。我希望它不会准确,但至少接近。如果按服务器分组,它可能如下所示:

ServerName  TotalSizeGB GroupNumber
Server10    1023.35 1  
Server1 123.45  1
Server5 567.89  1
Server3 345.67  1
Server4 456.78  2
Server7 789.01  2
Server6 678.90  2
Server2 234.56  3
Server9 901.23  3
Server8 890.12  3

If I was taking a sum per group, it would look like this:

如果我每组拿一笔钱,它看起来像这样:

GroupNumber TotalSizeGB
1   2060.36
2   1924.69
3   2025.91

4 个解决方案

#1

SELECT  *
FROM(
    SELECT  y.TotalSizeGB,
            CASE 
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=0 THEN 2
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=1 THEN 1
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=2 THEN 0
                ELSE y.PseudoGrpNumber
            END GrpNumber
    FROM(
        SELECT 
            x.ServerName,
            x.TotalSizeGB,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))%3 PseudoGrpNumber,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))/3 AnotherGrp,
            ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC) RowNum
        FROM    @Servers x
    )y
)z
PIVOT( SUM(z.TotalSizeGB) FOR z.GrpNumber IN([0],[1],[2]) ) pvt;

Results:

0       1       2
------- ------- -------
2048.02 1925.80 2037.14

Some explanations:

The idea is to sort data descending on TotalSizeGB column. Then every 3 sequential rows are grouped together (column AnotherGrp) first in DESC order and then in ASC order (column PseudoGroNumber and GrpNumber). If it's executed SELECT * FROM () y derivate table then the results will be:

我们的想法是对TotalSizeGB列上的数据进行排序。然后,每个3个连续的行首先按DESC顺序组合在一起(列AnotherGrp),然后按ASC顺序(列PseudoGroNumber和GrpNumber)组合在一起。如果它执行SELECT * FROM()y derivate table,那么结果将是:

ServerName TotalSizeGB  PseudoGrpNumber AnotherGrp GrpNumber RowNum
---------- ------------ --------------- ---------- --------- ------
Server10   1023.35      0               1          0         1
Server9    901.23       1               1          1         2
Server8    890.12       2               1          2         3
Server7    789.01       0               2          2         4
Server6    678.90       1               2          1         5
Server5    567.89       2               2          0         6
Server4    456.78       0               3          0         7
Server3    345.67       1               3          1         8
Server2    234.56       2               3          2         9
Server1    123.45       0               4          2         10

#2

This task is scientific actually (Packing problem, or a kind of), and may be better suits math.stackexchange :)

这个任务实际上是科学的(包装问题,或者一种),可能更适合math.stackexchange :)

My solution is in two steps (as many optimization problems are) - find some initial solution and try to refine it.

我的解决方案有两个步骤(尽可能多的优化问题) - 找到一些初始解决方案并尝试对其进行优化。

Initial solution:

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    2           567.89
Server6    1           678.90
Server7    3           789.01
Server8    3           890.12
Server9    1           901.23
Server10   2           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           1936.91
3           2037.14

Optimized:

ServerName GroupNo     TotalSizeGB
---------- ----------- -----------
Server1    3           123.45
Server2    3           234.56
Server3    2           345.67
Server4    1           456.78
Server5    3           567.89
Server6    1           678.90
Server7    2           789.01
Server8    2           890.12
Server9    1           901.23
Server10   3           1023.35

GroupNo     GroupSizeGb
----------- -----------
1           2036.91
2           2024.80
3           1949.25

Unfortunately, I was not able to set it up on SQLFiddle, because of explicit transactions are used.

不幸的是,由于使用了显式事务,我无法在SQLFiddle上进行设置。

set nocount on

-- Parameters
declare
  @nGroups int, -- Number of groups to split servers to
  @tolerance float, -- let say 0.0 ... 0.1 (0.1 mean that (+/-)10% deviation allowed from target group size)
  @nTries int, -- refinement tries 100, 1000, 10000 or as much as you can wait if you are not satisfied with initial solution
  @mFactor float, -- refinement param 0.0 ... 1.0
  @tolerance2 float -- let say 0.1 ... 0.3

set @nGroups = 3
set @tolerance = 0
set @nTries = 1000
set @mFactor = 0.3
set @tolerance2 = 0.3


-- Initial Data
create table #Servers (ID int identity, ServerName sysname, TotalSizeGB decimal (12,2), primary key clustered(ID))

insert into #Servers (ServerName, TotalSizeGB) values
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)

create table #Groups (GroupNo int not NULL, primary key clustered (GroupNo))
insert into #Groups (GroupNo)
select N from (select row_number() over (order by @@spid) from sys.all_columns) S(N) where N <= @nGroups

create table #ServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered(ServerID))
create index #IX_GroupServers_GroupNo on #ServerGroups (GroupNo)

declare
    @srvCnt int,
    @grSize decimal (12,2),
    @grNo int,
    @grSz decimal (12,2),
    @srvID int

select @srvCnt = count(1), @grSize = sum(TotalSizeGB) / @nGroups from #Servers
select @grSize as [Target approx. group size]

-- Find initial solution
while (select count(1) from #ServerGroups) < @srvCnt
begin
    select top 1 @grNo = g.GroupNo
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
    order by sum(s.TotalSizeGB)

    select @grSz = IsNull(sum(s.TotalSizeGB), 0)
    from #Groups g
        left join #ServerGroups sg on sg.GroupNo = g.GroupNo
        left join #Servers s on s.ID = sg.ServerID
    where g.GroupNo = @grNo

    select top 1 @srvID = ID
    from #Servers s
    where not exists (select 1 from #ServerGroups where ServerID = s.ID)
    order by abs(@grSize - @grSz - s.TotalSizeGB)

    insert into #ServerGroups (ServerID, GroupNo) values (@srvID, @grNo)
end

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo


-- Refinement
declare @fTarg float

select @fTarg = sum(abs(case when abs(re) > @tolerance then re else 0 end))
from (
    select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
    from #Groups g
        join #ServerGroups sg on sg.GroupNo = g.GroupNo
        join #Servers s on s.ID = sg.ServerID
    group by g.GroupNo
) t
cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

print @fTarg

if @fTarg > 0
begin

create table #MServerGroups (ServerID int not NULL, GroupNo int not NULL, primary key clustered (ServerID))
insert into #MServerGroups
select ServerID, GroupNo from #ServerGroups

while @nTries > 0
begin
    set @nTries = @nTries - 1

    begin transaction

    ;with MS as (
        select top (100*@mFactor) percent ServerID, GroupNo
        from #MServerGroups
        order by checksum(newid())
    )
    update msg
    set
        msg.GroupNo = case when msg.ServerID = tt.ServerID1 then tt.NewNo1 else tt.NewNo2 end
    from
        #MServerGroups msg
        join (
            select ServerID1, NewNo1, ServerID2, NewNo2
            from (
                select MS.ServerID as ServerID1, SS.GroupNo as NewNo1, SS.ServerID as ServerID2, MS.GroupNo as NewNo2, row_number() over (partition by SS.ServerID order by @@spid) as rn
                from MS
                    join #Servers s on s.ID = MS.ServerID
                    cross apply (
                        select top 1 *
                        from
                            #Servers s2
                            join #MServerGroups ms2 on ms2.ServerID = s2.ID
                        where
                            s2.ID != MS.ServerID and ms2.GroupNo != MS.GroupNo and abs(s2.TotalSizeGB - s.TotalSizeGB)/s.TotalSizeGB < @tolerance2
                        order by checksum(newid())
                    ) SS
            ) t
            where rn = 1
        )tt on msg.ServerID in (tt.ServerID1, tt.ServerID2)

    if @@rowcount = 0
    begin
        rollback transaction
        continue;
    end

    declare @fT float

    select @fT = sum(abs(case when abs(re) > @tolerance then re else 0 end))
    from (
        select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
        from #Groups g
            join #MServerGroups sg on sg.GroupNo = g.GroupNo
            join #Servers s on s.ID = sg.ServerID
        group by g.GroupNo
    ) t
    cross apply (select (GroupSizeGb - @grSize)/@grSize re) p

    if @fT < @fTarg
    begin
        set @fTarg = @ft
        print @fTarg -- the less this number, the better solution is

        commit transaction
    end
    else
        rollback transaction
end

update s
set s.GroupNo = m.GroupNo
from #MServerGroups m
    join #ServerGroups s on s.ServerID = m.ServerID

select g.GroupNo, SUM(s.TotalSizeGB) GroupSizeGb
from #Groups g
    join #ServerGroups sg on sg.GroupNo = g.GroupNo
    join #Servers s on s.ID = sg.ServerID
group by g.GroupNo

drop table #MServerGroups

end
else
    print 'No refinement needed'

drop table #Groups
drop table #ServerGroups
drop table #Servers

I suggest to start with @nTries = 0 and reasonable @tolerance (e.g. 0.1, 0.05).

我建议从@nTries = 0和合理的@tolerance开始(例如0.1,0.05)。

#3

Check this, hope this will help. i'm not that sure what you meant by 'evenly sized groups. But what i did here is first allocate an even size to a group then if any remaining then allocate it to a group where is has items more than general group size. Rather than ntile i would recommend you to decide the group numbers (might be using a sp) and allocate size for each server. but below might be fine for the described problem. and note that i didnt test for all the scenarios.

检查一下,希望这会有所帮助。我不确定你的'均匀大小的团体意味着什么。但我在这里做的是首先为一个组分配一个偶数大小,然后如果有任何剩余,则将其分配给一个具有超过一般组大小的项目的组。我会建议您决定组号(可能使用sp)并为每个服务器分配大小,而不是ntile。但是对于所描述的问题,下面可能没问题。并注意我没有测试所有场景。

  declare @TotalSizeGB decimal;
  select  @TotalSizeGB = sum(TotalSizeGB) from @Servers;

  declare @Count int;
  select  @Count = count(TotalSizeGB) from @Servers;

  declare @GroupSize int;
  select  @GroupSize = 3;

  declare @NoofGroups int;
  select  @NoofGroups = 3;

  declare @UnitSizeGB decimal
  Set @UnitSizeGB =(@TotalSizeGB/@Count)*@NoofGroups;

  Declare @Remainder decimal;
  Set @Remainder = @TotalSizeGB-(@UnitSizeGB*@NoofGroups)  

Select GroupNumber,
   CASE 
      WHEN gcount = @GroupSize THEN @UnitSizeGB
      WHEN gcount > @GroupSize THEN @UnitSizeGB+@Remainder
   END 
 From (
 Select 
   GroupNumber,count(ServerName) as gcount,  @UnitSizeGB as UnitSizeGB from(
     Select ServerName,ntile(@GroupSize) over (order by newid()) as GroupNumber
     from (
          select ServerName, TotalSizeGB from @Servers ) x 
     group by ServerName  ) as d
     group by GroupNumber ) as ff

This will give a output

这将给出一个输出

 GroupNumber      Size
 1                2405
 2                1803
 3                1803

#4

Here's a solution which generates the same results as @i-one's code but perhaps a little easier to understand (at least for me). I use 'chunk' instead of 'group' to avoid keyword conflicts.

这是一个解决方案,它产生与@ i-one代码相同的结果,但也许更容易理解(至少对我来说)。我使用'chunk'而不是'group'来避免关键字冲突。

The premise is as follows. To create n evenly-sized chunks:

前提如下。要创建n个大小均匀的块:

Sort all records in decreasing order

按降序排列所有记录

Assign the first n records to their chunks by row number

按行号将前n个记录分配给它们的块

Loop through the remainder, always assigning to the smallest chunk

循环遍历其余部分,始终分配给最小的块

I've uploaded the code to SQLFiddle but it doesn't seem to like the table variable. Here's the link anyways.

我已经将代码上传到SQLFiddle但它似乎不喜欢表变量。无论如何,这是链接。

-- Source data:
DECLARE @Servers TABLE (ServerName SYSNAME, TotalSizeGB DECIMAL (12,2))
INSERT INTO @Servers VALUES
('Server1',123.45),
('Server2',234.56),
('Server3',345.67),
('Server4',456.78),
('Server5',567.89),
('Server6',678.90),
('Server7',789.01),
('Server8',890.12),
('Server9',901.23),
('Server10',1023.35)


-- Solution start
DECLARE @ServersChunked TABLE (
        ServerName SYSNAME,
        TotalSizeGB DECIMAL (12,2),
        RowNum INT,
        ChunkNo INT
    );
DECLARE
    @ChunkCount INT = 3,
    @MinRowNum INT,
    @SmallestChunk INT;


-- Copy table into variable (skip this if the original table can be amended to include the RowNum and ChunkNo fields)
INSERT INTO @ServersChunked
SELECT 
    *, 
    RowNum = ROW_NUMBER() OVER (ORDER BY TotalSizeGB DESC), 
    ChunkNo = NULL
FROM @Servers

-- Assign the initial chunks to largest tables
UPDATE @ServersChunked
SET ChunkNo = RowNum
WHERE RowNum <= @ChunkCount


-- Assign chunks to remaining tables
WHILE EXISTS (SELECT 1 FROM @ServersChunked WHERE ChunkNo IS NULL) BEGIN

    -- Find the next table (by descending row count)
    SELECT @MinRowNum = MIN(RowNum) FROM @ServersChunked WHERE ChunkNo IS NULL

    -- Find the smallest chunk
    SELECT TOP 1 @SmallestChunk = ChunkNo
    FROM @ServersChunked
    WHERE ChunkNo IS NOT NULL
    GROUP BY ChunkNo
    ORDER BY Sum(TotalSizeGB) ASC

    -- Assign the table to the chunk
    UPDATE @ServersChunked
    SET ChunkNo = @SmallestChunk
    WHERE RowNum = @MinRowNum
END

Here are the results:

结果如下:

ChunkNo SumTotalSizeGB
1       1936.91
2       2036.91
3       2037.14

#1

SELECT  *
FROM(
    SELECT  y.TotalSizeGB,
            CASE 
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=0 THEN 2
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=1 THEN 1
                WHEN y.AnotherGrp%2=0 AND y.PseudoGrpNumber=2 THEN 0
                ELSE y.PseudoGrpNumber
            END GrpNumber
    FROM(
        SELECT 
            x.ServerName,
            x.TotalSizeGB,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))%3 PseudoGrpNumber,
            (2+ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC))/3 AnotherGrp,
            ROW_NUMBER() OVER(ORDER BY x.TotalSizeGB DESC) RowNum
        FROM    @Servers x
    )y
)z
PIVOT( SUM(z.TotalSizeGB) FOR z.GrpNumber IN([0],[1],[2]) ) pvt;