sql - 列示例中的每个唯一值2来自另一列的不同值

时间:2022-01-20 13:41:08

I am stuck on a difficult sql aggregate problem.

我陷入困难的SQL聚合问题。

Consider the following table/view:

请考虑以下表/视图:

Column1  Column2
1        2564
2        6550
1        3578
2        6548
2        4789
1        9876

I would like to design a query to do the following:

我想设计一个查询来执行以下操作:

For every distinct Column1 value, sample 2x records. The sampling strategy could be some some sort of bootstrapping/resampling as there might not too many data points.

对于每个不同的Column1值,样本2x记录。采样策略可能是某种类型的自举/重采样,因为可能没有太多的数据点。

Thus the table will become:

因此该表将成为:

Column1     Column2
1           9876
1           3578
2           6548
2           6550

Platform: MS SQL

平台:MS SQL

Any answers are appreciated.

任何答案都表示赞赏。

1 个解决方案

#1


3  

For a random sample without replacement:

对于没有替换的随机样本:

select t.*
from (select t.*,
             row_number() over (partition by column1 order by newid()) as seqnum
      from t
     ) t
where seqnum <= 2;

Or, alternatively:

或者,或者:

select top (2) with ties t.*
from t
order by row_number() over (partition by id order by newid());

For a random sample with replacement:

对于替换的随机样本:

With replacement:

随着更换:

select *
from ((select top (1) with ties t.*
       from t
       order by row_number() over (partition by id order by newid())
      )
      union all
      (select top (1) with ties t.*
       from t
       order by row_number() over (partition by id order by newid())
      )
     ) x;

#1


3  

For a random sample without replacement:

对于没有替换的随机样本:

select t.*
from (select t.*,
             row_number() over (partition by column1 order by newid()) as seqnum
      from t
     ) t
where seqnum <= 2;

Or, alternatively:

或者,或者:

select top (2) with ties t.*
from t
order by row_number() over (partition by id order by newid());

For a random sample with replacement:

对于替换的随机样本:

With replacement:

随着更换:

select *
from ((select top (1) with ties t.*
       from t
       order by row_number() over (partition by id order by newid())
      )
      union all
      (select top (1) with ties t.*
       from t
       order by row_number() over (partition by id order by newid())
      )
     ) x;