Oracle SQL:如何使用预定义的贡献获得每个组的随机记录

时间:2020-12-16 12:22:17

This is with reference to the earlier question described here: Oracle SQL: How to get Random Records by each group

这是参考此处描述的早期问题:Oracle SQL:如何通过每个组获取随机记录

Question:

题:

Is is possible to get the random sample with a ratio of different categories.

是否可以获得具有不同类别比率的随机样本。

Ex: If I have a random record of 132 samples having 3 categories (approved, denied, canceled), how do I get the samples as per below ratio?

例如:如果我有132个样本的随机记录有3个类别(批准,拒绝,取消),我如何按照以下比率得到样本?

total sample = 132

category     samples %  sample Size
approved     50%        66
denied       30%        40
canceled     20%        26

Note: I need the raw data, not the count

注意:我需要原始数据,而不是计数

1 个解决方案

#1


0  

Let's get some sample data first. I created 132 records with approved category to get a 50% sample with 66 rows.

我们先得一些样本数据。我创建了132条已批准类别的记录,以获得包含66行的50%样本。

create table task as
select 'approved' category, rownum task_id from dual connect by level <= 132 union all
select 'denied' category, rownum task_id from dual connect by level <= 134 union all
select 'canceled' category, rownum task_id from dual connect by level <= 130 
;

The key step is to defined the column RAND_PERC containg for each category a value between 0 and 1. If you want a sample of say 50% select all rows in a catagory with a value less or equal .5

关键步骤是为每个类别定义RAND_PERC列,其值介于0和1之间。如果您想要50%的样本,请选择值小于或等于.5的类别中的所有行。

The column is caclulated first by assigning the row number in random order (independent for each category) and that divided by the number of rows in each category.

首先通过以随机顺序(对每个类别独立)分配行号并且除以每个类别中的行数来对列进行计算。

select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
order by 1,3;

CATEGORY    TASK_ID  RAND_PERC
-------- ---------- ----------
approved         56 ,00757575758 
approved        129 ,0151515152 
approved         61 ,0227272727 

To draw the sample simple define the WHERE condition as required - see example below.

要绘制示例,请根据需要定义WHERE条件 - 请参阅下面的示例。

with rnd as (
select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
)
select CATEGORY, count(*) cnt
from rnd
where 
category = 'approved' and rand_perc <= .5  or /* take 50% from active */
category = 'denied' and rand_perc <= .3  or
category = 'canceled' and rand_perc <= .2
group by CATEGORY
;

which gives the sample size as required

它根据需要提供样本大小

CATEGORY        CNT
-------- ----------
canceled         26 
denied           40 
approved         66

#1


0  

Let's get some sample data first. I created 132 records with approved category to get a 50% sample with 66 rows.

我们先得一些样本数据。我创建了132条已批准类别的记录,以获得包含66行的50%样本。

create table task as
select 'approved' category, rownum task_id from dual connect by level <= 132 union all
select 'denied' category, rownum task_id from dual connect by level <= 134 union all
select 'canceled' category, rownum task_id from dual connect by level <= 130 
;

The key step is to defined the column RAND_PERC containg for each category a value between 0 and 1. If you want a sample of say 50% select all rows in a catagory with a value less or equal .5

关键步骤是为每个类别定义RAND_PERC列,其值介于0和1之间。如果您想要50%的样本,请选择值小于或等于.5的类别中的所有行。

The column is caclulated first by assigning the row number in random order (independent for each category) and that divided by the number of rows in each category.

首先通过以随机顺序(对每个类别独立)分配行号并且除以每个类别中的行数来对列进行计算。

select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
order by 1,3;

CATEGORY    TASK_ID  RAND_PERC
-------- ---------- ----------
approved         56 ,00757575758 
approved        129 ,0151515152 
approved         61 ,0227272727 

To draw the sample simple define the WHERE condition as required - see example below.

要绘制示例,请根据需要定义WHERE条件 - 请参阅下面的示例。

with rnd as (
select CATEGORY, TASK_ID, 
 ( row_number() over (partition by task.category order by dbms_random.value)) / 
 ( count(*) over (partition by task.category)) as rand_perc
from task
)
select CATEGORY, count(*) cnt
from rnd
where 
category = 'approved' and rand_perc <= .5  or /* take 50% from active */
category = 'denied' and rand_perc <= .3  or
category = 'canceled' and rand_perc <= .2
group by CATEGORY
;

which gives the sample size as required

它根据需要提供样本大小

CATEGORY        CNT
-------- ----------
canceled         26 
denied           40 
approved         66