I am stuck on a difficult sql aggregate problem.
我陷入困难的SQL聚合问题。
Consider the following table/view:
请考虑以下表/视图:
Column1 Column2
1 2564
2 6550
1 3578
2 6548
2 4789
1 9876
I would like to design a query to do the following:
我想设计一个查询来执行以下操作:
For every distinct Column1 value, sample 2x records. The sampling strategy could be some some sort of bootstrapping/resampling as there might not too many data points.
对于每个不同的Column1值,样本2x记录。采样策略可能是某种类型的自举/重采样,因为可能没有太多的数据点。
Thus the table will become:
因此该表将成为:
Column1 Column2
1 9876
1 3578
2 6548
2 6550
Platform: MS SQL
平台:MS SQL
Any answers are appreciated.
任何答案都表示赞赏。
1 个解决方案
#1
3
For a random sample without replacement:
对于没有替换的随机样本:
select t.*
from (select t.*,
row_number() over (partition by column1 order by newid()) as seqnum
from t
) t
where seqnum <= 2;
Or, alternatively:
或者,或者:
select top (2) with ties t.*
from t
order by row_number() over (partition by id order by newid());
For a random sample with replacement:
对于替换的随机样本:
With replacement:
随着更换:
select *
from ((select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
union all
(select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
) x;
#1
3
For a random sample without replacement:
对于没有替换的随机样本:
select t.*
from (select t.*,
row_number() over (partition by column1 order by newid()) as seqnum
from t
) t
where seqnum <= 2;
Or, alternatively:
或者,或者:
select top (2) with ties t.*
from t
order by row_number() over (partition by id order by newid());
For a random sample with replacement:
对于替换的随机样本:
With replacement:
随着更换:
select *
from ((select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
union all
(select top (1) with ties t.*
from t
order by row_number() over (partition by id order by newid())
)
) x;