I will try to describe my issue as clearly as possible.
我将尽可能清楚地描述我的问题。
I have a dataset of unique 1000 clients, say ##temp1.
我有一个独特的1000个客户的数据集,比如## temp1。
I have another dataset which holds the information related to the 1000 clients from ##temp1 across the past 7 years. Lets call this dataset ##temp2. There are 6 specific columns in this second dataset (##temp2) that I am interested in, lets call them column A, B, C, D, E, F. Just for information, the information that columns A, C, E hold is year of some form in data type float (2012, 2013, 2014..) and the information that columns B, D, F hold is ratings of some form (1,2,3,..upto 5) in data type float. Both year and ratings columns have NULL values which I have converted to 0 for now.
我有另一个数据集,它保存了过去7年来## temp1中1000个客户的相关信息。让我们称这个数据集为## temp2。我感兴趣的第二个数据集(## temp2)中有6个特定列,我们称之为A,B,C,D,E,F列。仅供参考,A,C,E列保留的信息数据类型为float(2012,2013,2014 ..)的某种形式的年份,列B,D,F所持有的信息是数据类型float中某种形式(1,2,3,...至5)的等级。年份和评级列都有NULL值,我现在已将其转换为0。
My eventual goal is to create a report which holds the information for the 1000 clients in ##temp1, such that each row should hold the information in the following form,
我最终的目标是创建一个报告,其中包含## temp1中1000个客户端的信息,这样每行应该按以下形式保存信息,
ClientID | ClientName | ColA_Latest_Year1 | ColB_Corresponding_Rating_Year_1 | ColC_Latest_Year2 | ColD_Corresponding_Rating_Year_2 | ColE_Latest_Year3 | ColF_Corresponding_Rating_Year3.
ColA_Latest_Year1 should hold the latest year for that particular client from dataset ##temp2 and ColB_Corresponding_Rating_Year_1 should hold the rating from Column B corresponding to the year pulled in from Column A. Same goes for the other columns.
ColA_Latest_Year1应该从数据集## temp2保存该特定客户端的最新年份,并且ColB_Corresponding_Rating_Year_1应该保持对应于从A列中提取的年份的B列的评级。对于其他列也是如此。
The approach which I have taken so far, was,
到目前为止,我采取的方法是,
- Create
##temp1
as needed - Create
##temp2
as needed -
##temp1 LEFT JOIN ##temp2
on client ids to retrieve the year and rating information for all the clients in##temp1
and put all that information in##temp3
. There will be multiple rows for every client in##temp3
because the data in##temp3
is for multiple years. - Ranked the year column (B,D,F) partition by client_ids and put in in
##temp4
,
根据需要创建## temp1
根据需要创建## temp2
## temp1 LEFT JOIN ## temp2在客户端ID上检索## temp1中所有客户端的年份和评级信息,并将所有信息放在## temp3中。 ## temp3中的每个客户端都会有多行,因为## temp3中的数据是多年的。
通过client_ids对年份列(B,D,F)进行排序,并将其放入## temp4中,
What I have now is something like this,
我现在拥有的是这样的,
Rnk_A | Rnk_C | Rnk_F | ColA | ColB | ColC | ColD | ColE | ColF | Client_id | Client_name
2 | 1 | 1 | 0 | 0 | 0 | 0 | 2014 | 1 | 111 | 'ABC'
1 | 2 | 1 | 2012 | 1 | 0 | 0 | 0 | 0 | 111 | 'ABC'
My goal is
我的目标是
Rnk_A | Rnk_C | Rnk_F | ColA | ColB | ColC | ColD | ColE | ColF | Client_id | Client_name
1 | 1 | 1 | 2012| 1 | 0 | 0 | 2014| 1 | 111 | 'ABC'
Any help is appreciated.
任何帮助表示赞赏。
1 个解决方案
#1
0
This answer assumes you don't have any duplicates per client in columns A, C, E. If you do have duplicates you'd need to find a way to differentiate them and make the necessary changes.
此答案假设您在A,C,E列中没有每个客户端的任何重复项。如果您确实有重复项,则需要找到区分它们并进行必要更改的方法。
The hurdle that you've failed to overcome in your attempt (as described) is that you're trying to join from temp1
to temp2
only once for lookup information that could come from 3 distinct rows of temp2
. This cannot work as you hope. You must perform separate joins for each pair [A,B] [C,D] and [E,F]. The following demonstrates a solution using CTEs to derive the lookup data for each pair.
你在尝试中未能克服的障碍(如上所述)是你试图从temp1到temp2只加入一次查找信息,这些信息可能来自3个不同的temp2行。这不能像你希望的那样工作。您必须为每对[A,B] [C,D]和[E,F]执行单独的连接。以下演示了使用CTE导出每对的查找数据的解决方案。
/********* Prepare sample tables and data ***********/
declare @t1 table (
ClientId int,
ClientName varchar(50)
)
declare @t2 table (
ClientId int,
ColA datetime,
ColB float,
ColC datetime,
ColD float,
ColE datetime,
ColF float
)
insert into @t1
select 1, 'Client 1' union all
select 2, 'Client 2' union all
select 3, 'Client 3' union all
select 4, 'Client 4'
insert into @t2
select 1, '20001011', 1, '20010101', 7, '20130101', 14 union all
select 1, '20040101', 4, '20170101', 1, '20120101', 1 union all
select 1, '20051231', 0, '20020101', 15, '20110101', 1 union all
select 2, '20060101', 2, NULL, 15, '20110101', NULL union all
select 2, '20030101', 3, NULL, NULL, '20100101', 17 union all
select 3, NULL, NULL, '20170101', 42, NULL, NULL
--select * from @t1
--select * from @t2
/********* Solution ***********/
;with MaxA as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColA DESC) rn,
t2.ClientId, t2.ColA, t2.ColB
from @t2 t2
--where t2.ColA is not null and t2.ColB is not null
), MaxC as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColC DESC) rn,
t2.ClientId, t2.ColC, t2.ColD
from @t2 t2
--where t2.ColC is not null and t2.ColD is not null
), MaxE as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColE DESC) rn,
t2.ClientId, t2.ColE, t2.ColF
from @t2 t2
--where t2.ColE is not null and t2.ColF is not null
)
select t1.ClientId, t1.ClientName, a.ColA, a.ColB, c.ColC, c.ColD, e.ColE, e.ColF
from @t1 t1
left join MaxA a on
a.ClientId = t1.ClientId
and a.rn = 1
left join MaxC c on
c.ClientId = t1.ClientId
and c.rn = 1
left join MaxE e on
e.ClientId = t1.ClientId
and e.rn = 1
If you run this you may notice some peculiar results for Client 2 in columns C and F. This is because (as per your question) there may be some NULL values. ColC date is "unknown" and ColF rating is "unknown".
如果你运行这个,你可能会注意到客户端2在C和F列中的一些特殊结果。这是因为(根据你的问题)可能有一些NULL值。 ColC日期为“未知”,ColF评级为“未知”。
My solution preserves NULL
values instead of converting them to zeroes. This allows you to handle them explicitly if you so choose. I commented out lines in the above query that could be used to ignore NULL dates and ratings if necessary.
我的解决方案保留NULL值而不是将它们转换为零。如果您愿意,这允许您明确地处理它们。我在上面的查询中注释掉了可用于在必要时忽略NULL日期和评级的行。
#1
0
This answer assumes you don't have any duplicates per client in columns A, C, E. If you do have duplicates you'd need to find a way to differentiate them and make the necessary changes.
此答案假设您在A,C,E列中没有每个客户端的任何重复项。如果您确实有重复项,则需要找到区分它们并进行必要更改的方法。
The hurdle that you've failed to overcome in your attempt (as described) is that you're trying to join from temp1
to temp2
only once for lookup information that could come from 3 distinct rows of temp2
. This cannot work as you hope. You must perform separate joins for each pair [A,B] [C,D] and [E,F]. The following demonstrates a solution using CTEs to derive the lookup data for each pair.
你在尝试中未能克服的障碍(如上所述)是你试图从temp1到temp2只加入一次查找信息,这些信息可能来自3个不同的temp2行。这不能像你希望的那样工作。您必须为每对[A,B] [C,D]和[E,F]执行单独的连接。以下演示了使用CTE导出每对的查找数据的解决方案。
/********* Prepare sample tables and data ***********/
declare @t1 table (
ClientId int,
ClientName varchar(50)
)
declare @t2 table (
ClientId int,
ColA datetime,
ColB float,
ColC datetime,
ColD float,
ColE datetime,
ColF float
)
insert into @t1
select 1, 'Client 1' union all
select 2, 'Client 2' union all
select 3, 'Client 3' union all
select 4, 'Client 4'
insert into @t2
select 1, '20001011', 1, '20010101', 7, '20130101', 14 union all
select 1, '20040101', 4, '20170101', 1, '20120101', 1 union all
select 1, '20051231', 0, '20020101', 15, '20110101', 1 union all
select 2, '20060101', 2, NULL, 15, '20110101', NULL union all
select 2, '20030101', 3, NULL, NULL, '20100101', 17 union all
select 3, NULL, NULL, '20170101', 42, NULL, NULL
--select * from @t1
--select * from @t2
/********* Solution ***********/
;with MaxA as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColA DESC) rn,
t2.ClientId, t2.ColA, t2.ColB
from @t2 t2
--where t2.ColA is not null and t2.ColB is not null
), MaxC as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColC DESC) rn,
t2.ClientId, t2.ColC, t2.ColD
from @t2 t2
--where t2.ColC is not null and t2.ColD is not null
), MaxE as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColE DESC) rn,
t2.ClientId, t2.ColE, t2.ColF
from @t2 t2
--where t2.ColE is not null and t2.ColF is not null
)
select t1.ClientId, t1.ClientName, a.ColA, a.ColB, c.ColC, c.ColD, e.ColE, e.ColF
from @t1 t1
left join MaxA a on
a.ClientId = t1.ClientId
and a.rn = 1
left join MaxC c on
c.ClientId = t1.ClientId
and c.rn = 1
left join MaxE e on
e.ClientId = t1.ClientId
and e.rn = 1
If you run this you may notice some peculiar results for Client 2 in columns C and F. This is because (as per your question) there may be some NULL values. ColC date is "unknown" and ColF rating is "unknown".
如果你运行这个,你可能会注意到客户端2在C和F列中的一些特殊结果。这是因为(根据你的问题)可能有一些NULL值。 ColC日期为“未知”,ColF评级为“未知”。
My solution preserves NULL
values instead of converting them to zeroes. This allows you to handle them explicitly if you so choose. I commented out lines in the above query that could be used to ignore NULL dates and ratings if necessary.
我的解决方案保留NULL值而不是将它们转换为零。如果您愿意,这允许您明确地处理它们。我在上面的查询中注释掉了可用于在必要时忽略NULL日期和评级的行。