I'm trying to work out a pretty complex query in SQL Server 2008. I'd like some input from SQL experts here.
我正在尝试在SQL Server 2008中编写一个非常复杂的查询。我想在这里得到SQL专家的一些意见。
Imagine I had a Payments table with these fields:
想象一下,我有一个付款表,其中包含以下字段:
PaymentID int, CustomerID int, PaymentDate datetime, Amount decimal
PaymentID int,CustomerID int,PaymentDate datetime,Amount decimal
So essentially, it is a table of payments made by a customer on specific dates. An important thing to note is that in some cases, a payment amount can be a negative value. So, over time, the total amount paid by any given customer, can go up or down.
基本上,它是客户在特定日期支付的表格。需要注意的一点是,在某些情况下,付款金额可能是负值。因此,随着时间的推移,任何给定客户支付的总金额可能会上升或下降。
What we're trying to figure out is the SQL to calculate the high point of the total amount paid per customer.
我们想要弄清楚的是用于计算每个客户支付总金额的高点的SQL。
So, if Fred made 3 payments: first for $5, second for $5, third for -$3. The report will show that Fred's peak total paid amount was $10 (on his second payment), and his final paid amount was $7.
因此,如果弗雷德支付3笔款项:首先是5美元,第二是5美元,第三是3美元。该报告将显示弗雷德的最高支付总额为10美元(第二次支付),最终支付金额为7美元。
We need to run this report for a hundred thousand customers (who've potentially made a hundred to a thousand payments each), so it's got to be fast.
我们需要为十万客户(他们可能每人支付一百到一千美元)运行此报告,因此它必须快速。
Is there a good way to structure this query without storing the running totals in the db? We'd like to avoid storing precalculated values if at all possible.
是否有一种很好的方法来构建此查询而不将运行总计存储在数据库中?我们希望尽可能避免存储预先计算的值。
5 个解决方案
#1
Your question seems to be this:
你的问题似乎是这样的:
SELECT CustomerID, SUM(Ammount) FROM table WHERE Amount > 0 GROUP BY CustomerID
SELECT CustomerID, SUM(Ammount) FROM table GROUP BY CustomerID
However, I think you mean that you want a table that appears like this
但是,我认为你的意思是你想要一个看起来像这样的表
Customer Payment HighPoint RunningTotal
123 5 5 5
123 5 10 10
123 -3 10 7
In which case I would create a view with the two selects above so that the view is something like.
在这种情况下,我将使用上面的两个选择创建一个视图,以便视图类似。
SELECT CusotmerID,
PaymentDate,
Ammount,
(SELECT SUM(Ammount)
FROM table as ALIAS
WHERE ALIAS.Amount > 0
AND ALIAS.PaymentDate <= PaymentDate
AND ALIAS.CustomerID = CustomerID),
(SELECT SUM(Ammount)
FROM table as ALIAS
WHERE ALIAS.CustomerID = CustomerID
AND ALIAS.PaymentDate <= PaymentDate)
FROM table
Also, you may consider a non-unique index on the Amount column of the table to speed up the view.
此外,您可以考虑在表的Amount列上使用非唯一索引来加速视图。
#2
The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it's probably not going to be fast.
该操作与每个客户的付款数量成线性关系。因此,您将不得不检查每笔付款,保持运行总额和高水位标记,并且在所有付款结束时,您将获得答案。无论你是在CLR存储过程中做到这一点(我立即想到的)还是使用游标或临时表或其他什么,它可能都不会很快。
If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial -- but this is what data marts are for.
如果你不得不一遍又一遍地运行这个报告,你应该认真考虑保留一个高水位字段,并在付款时更新(或不更新)。这样,你的报告将是微不足道的 - 但这就是数据市场是为了。
#3
As an alternative to subqueries, you can use a running total query. Here's how I set one up for this case. First create some test data:
作为子查询的替代方法,您可以使用正在运行的总查询。以下是我为此案例设置的方法。首先创建一些测试数据:
create table #payments (
paymentid int identity,
customerid int,
paymentdate datetime,
amount decimal
)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)
Now you can execute the running total query, which calculates the balance for each customer after each payment:
现在,您可以执行运行总计查询,该查询在每次付款后计算每个客户的余额:
select cur.customerid, cur.paymentdate, sum(prev.amount)
from #payments cur
inner join #payments prev
on cur.customerid = prev.customerid
and cur.paymentdate >= prev.paymentdate
group by cur.customerid, cur.paymentdate
This generates data:
这会生成数据:
Customer Paymentdate Balance after payment
1 2009.01.01 1
1 2009.01.02 3
1 2009.01.03 2
1 2009.01.04 4
1 2009.01.05 1
2 2009.01.01 10
2 2009.01.02 5
2 2009.01.03 12
To look at the maximum, you can do a group by on the running total query:
要查看最大值,您可以在运行的总查询上执行分组:
select customerid, max(balance)
from (
select cur.customerid, cur.paymentdate, balance = sum(prev.amount)
from #payments cur
inner join #payments prev
on cur.customerid = prev.customerid
and cur.paymentdate >= prev.paymentdate
group by cur.customerid, cur.paymentdate
) runningtotal
group by customerid
Which gives:
Customer Max balance
1 4
2 12
Hope this is useful.
希望这很有用。
#4
list = list of amounts ordered by date
foreach in list as amount
running += amount
if running >= high
high = running
To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).
为了保持快速,您需要在触发器上增加一个运行总计量,并为每个客户增加一个高值(也可以通过触发器更新,以使重新查询更简单)。
I don't think you can do this type of thing without code (stored procedures are code)
我不认为你可以做没有代码的这种类型的东西(存储过程是代码)
#5
like Andomar's answer. You can do the running total for each payment. Then find the max peak payment...
像Andomar的回答一样。您可以为每笔付款执行运行总计。然后找到最高峰值付款......
with
rt as (
select
Payments.*,
isnull(sum(p.Amount), 0) + Payments.Amount as running
from
Payments
left outer join Payments p on Payments.CustomerID = p.CustomerID
and p.PaymentDate <= Payments.PaymentDate
and p.PaymentID < Payments.PaymentID
),
highest as
(
select
CustomerID, PaymentID, running as peak_paid
from
rt
where
PaymentID = (select top 1 rt2.PaymentID
from rt rt2
where rt2.CustomerID = rt.CustomerID
order by rt2.running desc, rt2.PaymentDate, rt2.PaymentID)
)
select
*,
(select sum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid
from
highest;
however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.
但是,由于您有大约100万笔付款,这可能会非常缓慢。与其他人一样,您可能希望将CustomerID,PaymentID和peak_paid存储在单独的表中。此表可以在每个付款插入或sqljob上更新。
Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.
更新了查询以使用连接而不是子查询。由于PaymentDate没有时间,我会在同一天过滤掉PaymentId的多笔付款。
#1
Your question seems to be this:
你的问题似乎是这样的:
SELECT CustomerID, SUM(Ammount) FROM table WHERE Amount > 0 GROUP BY CustomerID
SELECT CustomerID, SUM(Ammount) FROM table GROUP BY CustomerID
However, I think you mean that you want a table that appears like this
但是,我认为你的意思是你想要一个看起来像这样的表
Customer Payment HighPoint RunningTotal
123 5 5 5
123 5 10 10
123 -3 10 7
In which case I would create a view with the two selects above so that the view is something like.
在这种情况下,我将使用上面的两个选择创建一个视图,以便视图类似。
SELECT CusotmerID,
PaymentDate,
Ammount,
(SELECT SUM(Ammount)
FROM table as ALIAS
WHERE ALIAS.Amount > 0
AND ALIAS.PaymentDate <= PaymentDate
AND ALIAS.CustomerID = CustomerID),
(SELECT SUM(Ammount)
FROM table as ALIAS
WHERE ALIAS.CustomerID = CustomerID
AND ALIAS.PaymentDate <= PaymentDate)
FROM table
Also, you may consider a non-unique index on the Amount column of the table to speed up the view.
此外,您可以考虑在表的Amount列上使用非唯一索引来加速视图。
#2
The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it's probably not going to be fast.
该操作与每个客户的付款数量成线性关系。因此,您将不得不检查每笔付款,保持运行总额和高水位标记,并且在所有付款结束时,您将获得答案。无论你是在CLR存储过程中做到这一点(我立即想到的)还是使用游标或临时表或其他什么,它可能都不会很快。
If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial -- but this is what data marts are for.
如果你不得不一遍又一遍地运行这个报告,你应该认真考虑保留一个高水位字段,并在付款时更新(或不更新)。这样,你的报告将是微不足道的 - 但这就是数据市场是为了。
#3
As an alternative to subqueries, you can use a running total query. Here's how I set one up for this case. First create some test data:
作为子查询的替代方法,您可以使用正在运行的总查询。以下是我为此案例设置的方法。首先创建一些测试数据:
create table #payments (
paymentid int identity,
customerid int,
paymentdate datetime,
amount decimal
)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)
Now you can execute the running total query, which calculates the balance for each customer after each payment:
现在,您可以执行运行总计查询,该查询在每次付款后计算每个客户的余额:
select cur.customerid, cur.paymentdate, sum(prev.amount)
from #payments cur
inner join #payments prev
on cur.customerid = prev.customerid
and cur.paymentdate >= prev.paymentdate
group by cur.customerid, cur.paymentdate
This generates data:
这会生成数据:
Customer Paymentdate Balance after payment
1 2009.01.01 1
1 2009.01.02 3
1 2009.01.03 2
1 2009.01.04 4
1 2009.01.05 1
2 2009.01.01 10
2 2009.01.02 5
2 2009.01.03 12
To look at the maximum, you can do a group by on the running total query:
要查看最大值,您可以在运行的总查询上执行分组:
select customerid, max(balance)
from (
select cur.customerid, cur.paymentdate, balance = sum(prev.amount)
from #payments cur
inner join #payments prev
on cur.customerid = prev.customerid
and cur.paymentdate >= prev.paymentdate
group by cur.customerid, cur.paymentdate
) runningtotal
group by customerid
Which gives:
Customer Max balance
1 4
2 12
Hope this is useful.
希望这很有用。
#4
list = list of amounts ordered by date
foreach in list as amount
running += amount
if running >= high
high = running
To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).
为了保持快速,您需要在触发器上增加一个运行总计量,并为每个客户增加一个高值(也可以通过触发器更新,以使重新查询更简单)。
I don't think you can do this type of thing without code (stored procedures are code)
我不认为你可以做没有代码的这种类型的东西(存储过程是代码)
#5
like Andomar's answer. You can do the running total for each payment. Then find the max peak payment...
像Andomar的回答一样。您可以为每笔付款执行运行总计。然后找到最高峰值付款......
with
rt as (
select
Payments.*,
isnull(sum(p.Amount), 0) + Payments.Amount as running
from
Payments
left outer join Payments p on Payments.CustomerID = p.CustomerID
and p.PaymentDate <= Payments.PaymentDate
and p.PaymentID < Payments.PaymentID
),
highest as
(
select
CustomerID, PaymentID, running as peak_paid
from
rt
where
PaymentID = (select top 1 rt2.PaymentID
from rt rt2
where rt2.CustomerID = rt.CustomerID
order by rt2.running desc, rt2.PaymentDate, rt2.PaymentID)
)
select
*,
(select sum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid
from
highest;
however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.
但是,由于您有大约100万笔付款,这可能会非常缓慢。与其他人一样,您可能希望将CustomerID,PaymentID和peak_paid存储在单独的表中。此表可以在每个付款插入或sqljob上更新。
Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.
更新了查询以使用连接而不是子查询。由于PaymentDate没有时间,我会在同一天过滤掉PaymentId的多笔付款。