益智:将数字均匀分布在群组中

时间:2021-12-08 11:40:47

This is more of a puzzle really. Its probably been asked elsewhere before, but I couldn't find anything so I thought I'd share the question.

这真的是一个难题。它之前可能被问过,但我找不到任何东西,所以我想我会分享这个问题。

I'm trying to implement some kind of load balancing in an application and have reduced the problem down to what I believe should be a simple TSQL exercise (the application is predominantly in the SQL Server domain (SQL Server 2008 R2)).

我正在尝试在应用程序中实现某种负载平衡,并将问题减少到我认为应该是一个简单的TSQL练习(该应用程序主要在SQL Server域(SQL Server 2008 R2)中)。

Basically I have a table with two integers; a unique, sequential Id and a non-unique Value. The table could hold any number of records and I'd like to produce a table of data where the first n largest Values are split into separate 'groupings' and then the second set of n largest Values are split into separate 'groupings'.

基本上我有一个有两个整数的表;唯一的,顺序的Id和非唯一的值。该表可以包含任意数量的记录,我想生成一个数据表,其中前n个最大值被分成单独的“分组”,然后第二组n个最大值被分成单独的“分组”。

I've got a first draft working below but I believe it can be improved...

我在下面有一份初稿,但我相信它可以改进......

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
VALUES  (100), (456), (121), (402), (253), (872), (765), (6529), (1029), (342), (98), (1), (0), (4), (46), (23), (456), (416), (2323), (4579)


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum DESC) Rnk
    FROM    cte
)
SELECT  ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
ORDER BY [Grouping], Value ASC

This works and produces the following dataset:

这适用于并生成以下数据集:

Grouping,   Value,      Id
========    =====       ==
1           46          15
1           342         10
1           765         7
1           6529        8
2           23          16
2           253         5
2           456         2
2           4579        20
3           4           14
3           121         3
3           456         17
3           2323        19
4           1           12
4           100         1
4           416         18
4           1029        9
5           0           13
5           98          11
5           402         4
5           872         6

The dataset returned is correct in that the first n largest values are split into separate groupings and so on but the total values in each grouping are quite different in grouping 1 compared to grouping 5 (for example).

返回的数据集是正确的,因为前n个最大值被分成单独的分组,依此类推,但每个分组中的总值在分组1中与分组5(例如)相比完全不同。

When grouped and SUMmed we can see the un-even spread:

分组和SUMmed时,我们可以看到非均匀传播:

Grouping,   SummedValues
========    ============
1           7682
2           5311
3           2904
4           1546
5           1372

In as few lines as possible how can I better balance the Values so that the total Values in each grouping is more evenly spread?

在尽可能少的行中,如何更好地平衡值,以便每个分组中的总值更均匀地分布?

5 个解决方案

#1


1  

This is flawed, but not terrible for given the example data. Your mileage may vary.

这是有缺陷的,但对于给出示例数据并不可怕。你的旅费可能会改变。

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/sum(value+.0) over()
      , target = 1.0 / @groupcount 
  from t
)
, remaining as (
select id, value, rn
  , grp = convert(int,(sum(value) over (order by rn)/sum(value+.0) over())*@groupCount)+1
from cte
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/UNV61100

rextester演示:http://rextester.com/UNV61100

results:

结果:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+


Sql Server 2008 compatable version:

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/tv.TotalValue
      , target = 1.0 / @groupcount 
  from t
    cross join (select TotalValue = sum(value+.0) from t) tv
)
, remaining as (
select id, value, rn
  , grp = convert(int,((x.sumValueOver/TotalValue)*@groupcount)+1)
from cte
  outer apply (
    select sumValueOver = sum(value) 
    from cte i
    where i.rn <= cte.rn
      ) x
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/DEUDJ77007

rextester演示:http://rextester.com/DEUDJ77007

returns:

收益:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+

#2


1  

Here NTILE function in sql server may help you.

这里sql server中的NTILE函数可以帮到你。

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579

;With cte
AS
(
    SELECT *, NTILE(@GroupCount) OVER(ORDER BY Value DESC) AS GroupNo FROM @Test
)
SELECT GroupNo, SUM(Value) AS SummedValues FROM cte
GROUP BY GroupNo

and i get this result.

我得到了这个结果。

GroupNo SummedValues
--------------------
1       14460
2       2549
3       1413
4       365
5       28

#3


1  

A slightly better way to do this would be to "snake" the selections. You're lining up the 1st, 6th, 11th highest - of course that's way higher than 5th, 10th, 15th.

稍微好一点的方法就是“选择”蛇。你排在第1,第6,第11位 - 当然比第5,第10,第15更高。

Better would be 1st, 10th, 11th, versus 5th, 6th, 15th. Still not perfect, and with your particular data still very poor, but slightly better than yours.

更好的是第1,第10,第11,第5,第6,第15。仍然不完美,并且您的特定数据仍然非常差,但略好于您的。

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % (@GroupCount*2 ) ORDER BY RowNum DESC) Rnk
    FROM    cte
)
select [Grouping], SUM(value) from (
SELECT  floor(abs(@GroupCount - (ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) - 0.5)) + 0.5) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
--ORDER BY [Grouping], Value ASC
) a group by [Grouping]
  order by [Grouping] ASC

Ultimately though I think random assignment is likely better than this, maybe random assignment while checking that the sum isn't already 2*(1/grouping * total).

最终虽然我认为随机分配可能比这更好,但是在检查总和不是2 *(1 /分组*总数)时可能是随机分配。

Really I think this is not a problem well solved by TSQL or any SQL; languages that can control flow on a row by row basis will better serve you. Python, C#, SAS, whatever other tool that is in your toolbox. (PL/SQL is the one place I'd consider going here...)

真的,我认为这不是TSQL或任何SQL很好解决的问题;可以逐行控制流量的语言将更好地为您服务。 Python,C#,SAS,工具箱中的其他工具。 (PL / SQL是我考虑去的地方......)

Anything that would let you say, on a row-level basis, "Having kept track of what I've assigned so far, assign this particular case to the bucket with the lowest number so far" really would work better.

任何可以让你在行级别上说“跟踪到目前为止已分配的内容,将此特定情况分配到目前为止数量最少的存储桶”的任何事情都会更好。

Grouping Summed Values
---------------------

1       1781
2       1608
3       2904
4       5249
5       7273

#4


1  

Using the ntile and the row_number window functions together to not only split it into the even groups (even by count, not sum) but make a better decision of what values to include in each group to even out the total in each group as much as possible.

使用ntile和row_number窗口一起使用,不仅可以将它分成偶数组(甚至可以通过计数,而不是求和),而是可以更好地决定每组中包含哪些值,以使每组中的总数均匀分布。可能。

Answer:

回答:

select case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end as [grouping]
, b.value
, b.id
from (
    select a.id
    , a.value
    , a.grp_split
    , row_number() over (partition by a.grp_split order by a.value desc) grp_split_rnk_desc
    , row_number() over (partition by a.grp_split order by a.value asc) grp_split_rnk_asc
    from (
        select t.id
        , t.value
        , ntile(@ntile_cnt) over (order by t.value desc) as grp_split
        from @test as t
        ) as a
    ) as b
order by case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end asc
, b.value asc

Results:

结果:

Not perfect, but slightly closer.

不完美,但稍微接近。

Group   Total
1       7029
2       5096
3       2904
4       1761
5       2025

#5


0  

The result is primary defined by first largest values. So you can try ordering all the rest in reverse order

结果由第一个最大值定义。因此,您可以尝试以相反的顺序排序所有其余部分

WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum ) Rnk
    FROM    cte
)
,cte3 AS
(SELECT  ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY case rnk when 1 then Value else -Value end DESC) AS [Grouping]
    ,Value
    ,Id
FROM    cte2
 )
select [Grouping],sum(value)
from cte3
group by [Grouping]
order by [Grouping];

The result

结果

  Grouping  (No column name)
1   1   7029
2   2   5096
3   3   2904
4   4   1761
5   5   2025

#1


1  

This is flawed, but not terrible for given the example data. Your mileage may vary.

这是有缺陷的,但对于给出示例数据并不可怕。你的旅费可能会改变。

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/sum(value+.0) over()
      , target = 1.0 / @groupcount 
  from t
)
, remaining as (
select id, value, rn
  , grp = convert(int,(sum(value) over (order by rn)/sum(value+.0) over())*@groupCount)+1
from cte
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/UNV61100

rextester演示:http://rextester.com/UNV61100

results:

结果:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+


Sql Server 2008 compatable version:

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/tv.TotalValue
      , target = 1.0 / @groupcount 
  from t
    cross join (select TotalValue = sum(value+.0) from t) tv
)
, remaining as (
select id, value, rn
  , grp = convert(int,((x.sumValueOver/TotalValue)*@groupcount)+1)
from cte
  outer apply (
    select sumValueOver = sum(value) 
    from cte i
    where i.rn <= cte.rn
      ) x
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/DEUDJ77007

rextester演示:http://rextester.com/DEUDJ77007

returns:

收益:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+

#2


1  

Here NTILE function in sql server may help you.

这里sql server中的NTILE函数可以帮到你。

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579

;With cte
AS
(
    SELECT *, NTILE(@GroupCount) OVER(ORDER BY Value DESC) AS GroupNo FROM @Test
)
SELECT GroupNo, SUM(Value) AS SummedValues FROM cte
GROUP BY GroupNo

and i get this result.

我得到了这个结果。

GroupNo SummedValues
--------------------
1       14460
2       2549
3       1413
4       365
5       28

#3


1  

A slightly better way to do this would be to "snake" the selections. You're lining up the 1st, 6th, 11th highest - of course that's way higher than 5th, 10th, 15th.

稍微好一点的方法就是“选择”蛇。你排在第1,第6,第11位 - 当然比第5,第10,第15更高。

Better would be 1st, 10th, 11th, versus 5th, 6th, 15th. Still not perfect, and with your particular data still very poor, but slightly better than yours.

更好的是第1,第10,第11,第5,第6,第15。仍然不完美,并且您的特定数据仍然非常差,但略好于您的。

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % (@GroupCount*2 ) ORDER BY RowNum DESC) Rnk
    FROM    cte
)
select [Grouping], SUM(value) from (
SELECT  floor(abs(@GroupCount - (ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) - 0.5)) + 0.5) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
--ORDER BY [Grouping], Value ASC
) a group by [Grouping]
  order by [Grouping] ASC

Ultimately though I think random assignment is likely better than this, maybe random assignment while checking that the sum isn't already 2*(1/grouping * total).

最终虽然我认为随机分配可能比这更好,但是在检查总和不是2 *(1 /分组*总数)时可能是随机分配。

Really I think this is not a problem well solved by TSQL or any SQL; languages that can control flow on a row by row basis will better serve you. Python, C#, SAS, whatever other tool that is in your toolbox. (PL/SQL is the one place I'd consider going here...)

真的,我认为这不是TSQL或任何SQL很好解决的问题;可以逐行控制流量的语言将更好地为您服务。 Python,C#,SAS,工具箱中的其他工具。 (PL / SQL是我考虑去的地方......)

Anything that would let you say, on a row-level basis, "Having kept track of what I've assigned so far, assign this particular case to the bucket with the lowest number so far" really would work better.

任何可以让你在行级别上说“跟踪到目前为止已分配的内容,将此特定情况分配到目前为止数量最少的存储桶”的任何事情都会更好。

Grouping Summed Values
---------------------

1       1781
2       1608
3       2904
4       5249
5       7273

#4


1  

Using the ntile and the row_number window functions together to not only split it into the even groups (even by count, not sum) but make a better decision of what values to include in each group to even out the total in each group as much as possible.

使用ntile和row_number窗口一起使用,不仅可以将它分成偶数组(甚至可以通过计数,而不是求和),而是可以更好地决定每组中包含哪些值,以使每组中的总数均匀分布。可能。

Answer:

回答:

select case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end as [grouping]
, b.value
, b.id
from (
    select a.id
    , a.value
    , a.grp_split
    , row_number() over (partition by a.grp_split order by a.value desc) grp_split_rnk_desc
    , row_number() over (partition by a.grp_split order by a.value asc) grp_split_rnk_asc
    from (
        select t.id
        , t.value
        , ntile(@ntile_cnt) over (order by t.value desc) as grp_split
        from @test as t
        ) as a
    ) as b
order by case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end asc
, b.value asc

Results:

结果:

Not perfect, but slightly closer.

不完美,但稍微接近。

Group   Total
1       7029
2       5096
3       2904
4       1761
5       2025

#5


0  

The result is primary defined by first largest values. So you can try ordering all the rest in reverse order

结果由第一个最大值定义。因此,您可以尝试以相反的顺序排序所有其余部分

WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum ) Rnk
    FROM    cte
)
,cte3 AS
(SELECT  ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY case rnk when 1 then Value else -Value end DESC) AS [Grouping]
    ,Value
    ,Id
FROM    cte2
 )
select [Grouping],sum(value)
from cte3
group by [Grouping]
order by [Grouping];

The result

结果

  Grouping  (No column name)
1   1   7029
2   2   5096
3   3   2904
4   4   1761
5   5   2025