根据列值创建行并对其进行分类

时间:2022-02-15 07:37:49

I am attempting to assign a classification to a row of data based on whether certain values exist. Utilizing the sample code below I have gotten to a place where I've gotten stuck.

我试图根据某些值是否存在为一行数据分配分类。利用下面的示例代码,我已经到了一个我遇到困难的地方。

proc sql;

create table test 
(id char(4),
task char(4),
id2 char(4),
status char(10),
seconds num);


insert into test
values('1','A','1','COMP',15)
values('1','B','2','WORK',20)
values('1','C','3','COMP',50)
values('1','D','3','COMP',null)
values('2','A','1','COMP',15)
values('2','B','2','COMP',520)
values('2','C','2','COMP',NULL)
values('2','D','3','COMP',221)
values('2','E','3','COMP',null)
values('2','F','3','COMP',null);

proc sql;
create table test2 as 
select 
ID,
ID2,
STATUS,
SUM(SECONDS) AS SECONDS,
sum(case when task='A' THEN 1 ELSE 0 END) AS A,
sum(case when task='B' THEN 1 ELSE 0 END) AS B,
sum(case when task='C' THEN 1 ELSE 0 END) AS C,
sum(case when task='D' THEN 1 ELSE 0 END) AS D,
sum(case when task='E' THEN 1 ELSE 0 END) AS E,
sum(case when task='F' THEN 1 ELSE 0 END) AS F
from
test
GROUP BY
ID,
ID2,
STATUS
;
quit;

Ultimately I would like to classify each row that gets created in the second step 'test2' to have a column that looks to the values in each lettered column(A-F) and Label them as such. So when the Row has a 1 in Column A only, it would be labeled 'A' but when a row has a 1 in multiple columns like 'D', 'E' and 'F' I would like it to be labeled as D_E_F.

最后,我想将在第二步'test2'中创建的每一行分类为一个列,该列查看每个字母列(A-F)中的值并将它们标记为这样。因此,当Row仅在A列中有1时,它将被标记为'A',但是当一行在多列中有1时,如'D','E'和'F',我希望它被标记为D_E_F 。

2 个解决方案

#1


2  

Best way to do this is in a DATA STEP:

最好的方法是在数据步骤中:

data test3;
format classifier $32.;
set test2;
array vars[6] A B C D E F;

classifier = "";
do i=1 to 6;
    if vars[i] then
        classifier = catx("_",classifier,vname(vars[i]));
end;
drop i;
run;

I create a character variable CLASSIFIER with length 32.

我创建了一个长度为32的字符变量CLASSIFIER。

I define an array that groups the columns A through F. This allows me to loop over those columns easily.

我定义了一个对A到F列进行分组的数组。这样我就可以轻松地遍历这些列。

Initialize the CLASSIFIER variable.

初始化CLASSIFIER变量。

Loop over the array. If the value =1, then add the name of the variable to the CLASSIFIER string.

循环遍历数组。如果值= 1,则将变量的名称添加到CLASSIFIER字符串。

CATX(delim,str1,str2) concatenates str1 and str2 with the delim in the middle. It also removes whitespace.

CATX(delim,str1,str2)将str1和str2与中间的delim连接起来。它还删除了空格。

VNAME(array[i]) returns the variable name of the variable pointed to by array[i].

VNAME(array [i])返回array [i]指向的变量的变量名。

Finally remove the i loop variable, unless you really want it in your output.

最后删除i循环变量,除非你真的想要它在你的输出中。

#2


1  

I know it is ugly, but you can do it with CASE statements accumulating the wanted result in another field. You have the SQL Fiddle here.

我知道它很难看,但你可以用CASE语句在另一个字段中累积想要的结果。你有SQL小提琴。

Note that if it is possible that the concatenation is empty you will have to check this condition to avoid performing the substring.

请注意,如果串联可能为空,则必须检查此条件以避免执行子字符串。

select 
  ID,
  ID2,
  STATUS,
  SUM(SECONDS) AS SECONDS,
  sum(case when task='A' THEN 1 ELSE 0 END) AS A,
  sum(case when task='B' THEN 1 ELSE 0 END) AS B,
  sum(case when task='C' THEN 1 ELSE 0 END) AS C,
  sum(case when task='D' THEN 1 ELSE 0 END) AS D,
  sum(case when task='E' THEN 1 ELSE 0 END) AS E,
  sum(case when task='F' THEN 1 ELSE 0 END) AS F,  
  substring(
  case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end 
  + case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end 
  + case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end 
  + case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end 
  + case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end 
  + case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end,  
  2, len(case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end 
  + case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end 
  + case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end 
  + case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end 
  + case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end 
  + case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end) - 1) as wantedOutput
from
test
GROUP BY
ID,
ID2,
STATUS

#1


2  

Best way to do this is in a DATA STEP:

最好的方法是在数据步骤中:

data test3;
format classifier $32.;
set test2;
array vars[6] A B C D E F;

classifier = "";
do i=1 to 6;
    if vars[i] then
        classifier = catx("_",classifier,vname(vars[i]));
end;
drop i;
run;

I create a character variable CLASSIFIER with length 32.

我创建了一个长度为32的字符变量CLASSIFIER。

I define an array that groups the columns A through F. This allows me to loop over those columns easily.

我定义了一个对A到F列进行分组的数组。这样我就可以轻松地遍历这些列。

Initialize the CLASSIFIER variable.

初始化CLASSIFIER变量。

Loop over the array. If the value =1, then add the name of the variable to the CLASSIFIER string.

循环遍历数组。如果值= 1,则将变量的名称添加到CLASSIFIER字符串。

CATX(delim,str1,str2) concatenates str1 and str2 with the delim in the middle. It also removes whitespace.

CATX(delim,str1,str2)将str1和str2与中间的delim连接起来。它还删除了空格。

VNAME(array[i]) returns the variable name of the variable pointed to by array[i].

VNAME(array [i])返回array [i]指向的变量的变量名。

Finally remove the i loop variable, unless you really want it in your output.

最后删除i循环变量,除非你真的想要它在你的输出中。

#2


1  

I know it is ugly, but you can do it with CASE statements accumulating the wanted result in another field. You have the SQL Fiddle here.

我知道它很难看,但你可以用CASE语句在另一个字段中累积想要的结果。你有SQL小提琴。

Note that if it is possible that the concatenation is empty you will have to check this condition to avoid performing the substring.

请注意,如果串联可能为空,则必须检查此条件以避免执行子字符串。

select 
  ID,
  ID2,
  STATUS,
  SUM(SECONDS) AS SECONDS,
  sum(case when task='A' THEN 1 ELSE 0 END) AS A,
  sum(case when task='B' THEN 1 ELSE 0 END) AS B,
  sum(case when task='C' THEN 1 ELSE 0 END) AS C,
  sum(case when task='D' THEN 1 ELSE 0 END) AS D,
  sum(case when task='E' THEN 1 ELSE 0 END) AS E,
  sum(case when task='F' THEN 1 ELSE 0 END) AS F,  
  substring(
  case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end 
  + case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end 
  + case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end 
  + case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end 
  + case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end 
  + case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end,  
  2, len(case when sum(case when task='A' THEN 1 ELSE 0 END) = 1 then '_A' else '' end 
  + case when sum(case when task='B' THEN 1 ELSE 0 END) = 1 then '_B' else '' end 
  + case when sum(case when task='C' THEN 1 ELSE 0 END) = 1 then '_C' else '' end 
  + case when sum(case when task='D' THEN 1 ELSE 0 END) = 1 then '_D' else '' end 
  + case when sum(case when task='E' THEN 1 ELSE 0 END) = 1 then '_E' else '' end 
  + case when sum(case when task='F' THEN 1 ELSE 0 END) = 1 then '_F' else '' end) - 1) as wantedOutput
from
test
GROUP BY
ID,
ID2,
STATUS