SQL Query获取最新价格

时间:2021-08-04 12:19:58

I have a table containing prices for a lot of different "things" in a MS SQL 2005 table. There are hundreds of records per thing per day and the different things gets price updates at different times.

我有一个表格,其中包含MS SQL 2005表格中许多不同“事物”的价格。每件事每天有数百条记录,不同的东西在不同的时间获得价格更新。

ID uniqueidentifier not null,
ThingID int NOT NULL,
PriceDateTime datetime NOT NULL,
Price decimal(18,4) NOT NULL

I need to get today's latest prices for a group of things. The below query works but I'm getting hundreds of rows back and I have to loop trough them and only extract the latest one per ThingID. How can I (e.g. via a GROUP BY) say that I want the latest one per ThingID? Or will I have to use subqueries?

我需要获得今天最新的一组价格。下面的查询有效,但我要回收数百行,我必须循环它们,并且每个ThingID只提取最新的一行。我怎样才能(例如通过GROUP BY)说我想要每个ThingID最新的一个?或者我必须使用子查询?

SELECT * 
FROM Thing
WHERE ThingID IN (1,2,3,4,5,6)
  AND PriceDate > cast( convert(varchar(20), getdate(), 106) as DateTime) 

UPDATE: In an attempt to hide complexity I put the ID column in a an int. In real life it is GUID (and not the sequential kind). I have updated the table def above to use uniqueidentifier.

更新:为了隐藏复杂性,我将ID列放在一个int中。在现实生活中它是GUID(而不是顺序类)。我已经更新了上面的表def以使用uniqueidentifier。

10 个解决方案

#1


20  

I think the only solution with your table structure is to work with a subquery:

我认为表结构的唯一解决方案是使用子查询:

SELECT *
   FROM Thing
   WHERE ID IN (SELECT max(ID) FROM Thing 
                   WHERE ThingID IN (1,2,3,4)
                   GROUP BY ThingID)

(Given the highest ID also means the newest price)

(鉴于最高ID也意味着最新价格)

However I suggest you add a "IsCurrent" column that is 0 if it's not the latest price or 1 if it is the latest. This will add the possible risk of inconsistent data, but it will speed up the whole process a lot when the table gets bigger (if it is in an index). Then all you need to do is to...

但是我建议你添加一个“IsCurrent”列,如果它不是最新价格,则为0,如果是最新价格,则为1。这将增加数据不一致的可能风险,但是当表变大时(如果它在索引中),它将大大加快整个过程。那么你需要做的就是......

SELECT *
   FROM Thing
   WHERE ThingID IN (1,2,3,4)
     AND IsCurrent = 1

UPDATE

UPDATE

Okay, Markus updated the question to show that ID is a uniqueid, not an int. That makes writing the query even more complex.

好的,Markus更新了问题,以证明ID是唯一的ID,而不是int。这使得编写查询变得更加复杂。

SELECT T.* 
   FROM Thing T
   JOIN (SELECT ThingID, max(PriceDateTime)
            WHERE ThingID IN (1,2,3,4)
            GROUP BY ThingID) X ON X.ThingID = T.ThingID 
                                AND X.PriceDateTime = T.PriceDateTime
   WHERE ThingID IN (1,2,3,4)

I'd really suggest using either a "IsCurrent" column or go with the other suggestion found in the answers and use "current price" table and a separate "price history" table (which would ultimately be the fastest, because it keeps the price table itself small).

我真的建议使用“IsCurrent”列或者使用答案中的其他建议并使用“当前价格”表和单独的“价格历史”表(最终将是最快的,因为它保持价格表本身很小)。

(I know that the ThingID at the bottom is redundant. Just try if it is faster with or without that "WHERE". Not sure which version will be faster after the optimizer did its work.)

(我知道底部的ThingID是多余的。只要尝试使用或不使用“WHERE”就更快。不确定优化器完成其工作后哪个版本会更快。)

#2


2  

If the subquery route was too slow I would look at treating your price updates as an audit log and maintaining a ThingPrice table - perhaps as a trigger on the price updates table:

如果子查询路由太慢,我会将您的价格更新视为审计日志并维护ThingPrice表 - 可能作为价格更新表的触发器:

ThingID int not null,
UpdateID int not null,
PriceDateTime datetime not null,
Price decimal(18,4) not null

The primary key would just be ThingID and "UpdateID" is the "ID" in your original table.

主键只是ThingID,“UpdateID”是原始表中的“ID”。

#3


2  

I would try something like the following subquery and forget about changing your data structures.

我会尝试类似下面的子查询,忘记改变你的数据结构。

SELECT
 *
FROM
 Thing
WHERE 
 (ThingID, PriceDateTime) IN 
 (SELECT 
   ThingID, 
   max(PriceDateTime ) 
  FROM 
   Thing 
  WHERE 
   ThingID IN (1,2,3,4)
  GROUP BY 
   ThingID
 )

Edit the above is ANSI SQL and i'm now guessing having more than one column in a subquery doesnt work for T SQL. Marius, I can't test the following but try;

编辑上面是ANSI SQL,我现在猜测在子查询中有多个列不适用于T SQL。马吕斯,我不能测试以下但是试试;

SELECT
 p.*
FROM
 Thing p,
 (SELECT ThingID, max(PriceDateTime ) FROM Thing WHERE ThingID IN (1,2,3,4) GROUP BY ThingID) m
WHERE 
 p.ThingId = m.ThingId
 and p.PriceDateTime = m.PriceDateTime

another option might be to change the date to a string and concatenate with the id so you have only one column. This would be slightly nasty though.

另一种选择可能是将日期更改为字符串并与id连接,因此您只有一列。但这有点令人讨厌。

#4


2  

Since you are using SQL Server 2005, you can use the new (CROSS|OUTTER) APPLY clause. The APPLY clause let's you join a table with a table valued function.

由于您使用的是SQL Server 2005,因此可以使用新的(CROSS | OUTTER)APPLY子句。 APPLY子句允许您使用表值函数连接表。

To solve the problem, first define a table valued function to retrieve the top n rows from Thing for a specific id, date ordered:

要解决这个问题,首先要定义一个表值函数来从Thing中检索前面的n行,以获取特定的id,日期排序:

CREATE FUNCTION dbo.fn_GetTopThings(@ThingID AS GUID, @n AS INT)
  RETURNS TABLE
AS
RETURN
  SELECT TOP(@n) *
  FROM Things
  WHERE ThingID= @ThingID
  ORDER BY PriceDateTime DESC
GO

and then use the function to retrieve the top 1 records in a query:

然后使用该函数检索查询中的前1条记录:

SELECT *
   FROM Thing t
CROSS APPLY dbo.fn_GetTopThings(t.ThingID, 1)
WHERE t.ThingID IN (1,2,3,4,5,6)

The magic here is done by the APPLY clause which applies the function to every row in the left result set then joins with the result set returned by the function then retuns the final result set. (Note: to do a left join like apply, use OUTTER APPLY which returns all rows from the left side, while CROSS APPLY returns only the rows that have a match in the right side)

这里的魔力是由APPLY子句完成的,该子句将函数应用于左结果集中的每一行,然后与函数返回的结果集连接,然后返回最终结果集。 (注意:要执行类似apply的左连接,请使用从左侧返回所有行的OUTTER APPLY,而CROSS APPLY仅返回右侧匹配的行)

BlaM: Because I can't post comments yet( due to low rept points) not even to my own answers ^^, I'll answer in the body of the message: -the APPLY clause even, if it uses table valued functions it is optimized internally by SQL Server in such a way that it doesn't call the function for every row in the left result set, but instead takes the inner sql from the function and converts it into a join clause with the rest of the query, so the performance is equivalent or even better (if the plan is chosen right by sql server and further optimizations can be done) than the performance of a query using subqueries), and in my personal experience APPLY has no performance issues when the database is properly indexed and statistics are up to date (just like a normal query with subqueries behaves in such conditions)

BlaM:因为我还不能发表评论(由于低评价点)甚至不是我自己的答案^^,我将在消息正文中回答: - 即使APPLY子句,如果它使用表值函数它SQL Server在内部优化它不会为左结果集中的每一行调用函数,而是从函数中获取内部sql并将其转换为与查询其余部分的join子句,所以性能相当甚至更好(如果计划是由sql server选择并且可以进行进一步的优化)而不是使用子查询的查询性能,并且根据我的个人经验,当数据库正确时,APPLY没有性能问题索引和统计信息是最新的(就像子查询在这种情况下的常规查询一样)

#5


1  

It depends on the nature of how your data will be used, but if the old price data will not be used nearly as regularly as the current price data, there may be an argument here for a price history table. This way, non-current data may be archived off to the price history table (probably by triggers) as the new prices come in.

这取决于您的数据使用方式的性质,但如果旧的价格数据几乎不会像当前价格数据那样经常使用,那么这里可能存在价格历史表的参数。这样,当新价格进入时,非当前数据可以存档到价格历史表(可能通过触发器)。

As I say, depending on your access model, this could be an option.

正如我所说,根据您的访问模型,这可能是一个选项。

#6


1  

I'm converting the uniqueidentifier to a binary so that I can get a MAX of it. This should make sure that you won't get duplicates from multiple records with identical ThingIDs and PriceDateTimes:

我正在将uniqueidentifier转换为二进制文件,以便我可以得到它的MAX。这应该确保您不会从具有相同ThingID和PriceDateTimes的多个记录中获取重复项:

SELECT * FROM Thing WHERE CONVERT(BINARY(16),Thing.ID) IN
(
 SELECT MAX(CONVERT(BINARY(16),Thing.ID))
  FROM Thing
  INNER JOIN
   (SELECT ThingID, MAX(PriceDateTime) LatestPriceDateTime FROM Thing
    WHERE PriceDateTime >= CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)
    GROUP BY ThingID) LatestPrices
  ON Thing.ThingID = LatestPrices.ThingID
   AND Thing.PriceDateTime = LatestPrices.LatestPriceDateTime
 GROUP BY Thing.ThingID, Thing.PriceDateTime
) AND Thing.ThingID IN (1,2,3,4,5,6)

#7


1  

Since ID is not sequential, I assume you have a unique index on ThingID and PriceDateTime so only one price can be the most recent for a given item.

由于ID不是顺序的,我假设您在ThingID和PriceDateTime上有唯一索引,因此对于给定项目,只有一个价格可以是最新的。

This query will get all of the items in the list IF they were priced today. If you remove the where clause for PriceDate you will get the latest price regardless of date.

此查询将获取列表中的所有项目,如果它们今天定价。如果您删除PriceDate的where子句,无论日期如何,您都将获得最新价格。

SELECT * 
FROM Thing thi
WHERE thi.ThingID IN (1,2,3,4,5,6)
  AND thi.PriceDateTime =
     (SELECT MAX(maxThi.PriceDateTime)
      FROM Thing maxThi
      WHERE maxThi.PriceDateTime >= CAST( CONVERT(varchar(20), GETDATE(), 106) AS DateTime)
        AND maxThi.ThingID = thi.ThingID)

Note that I changed ">" to ">=" since you could have a price right at the start of a day

请注意,我将“>”更改为“> =”,因为您可以在一天开始时定价

#8


0  

Try this (provided you only need the latest price, not the identifier or datetime of that price)

试试这个(前提是你只需要最新的价格,而不是那个价格的标识符或日期时间)

SELECT ThingID, (SELECT TOP 1 Price FROM Thing WHERE ThingID = T.ThingID ORDER BY PriceDateTime DESC) Price
FROM Thing T
WHERE ThingID IN (1,2,3,4) AND DATEDIFF(D, PriceDateTime, GETDATE()) = 0
GROUP BY ThingID

#9


0  

It must work without using a global PK column (for complex primary keys for example):

它必须在不使用全局PK列的情况下工作(例如,对于复杂的主键):

SELECT t1.*, t2.PriceDateTime AS bigger FROM Prices t1 
LEFT JOIN Prices t2 ON t1.ThingID = t2.ThingID AND t1.PriceDateTime < t2.PriceDateTime 
HAVING t2.PriceDateTime IS NULL

#10


-1  

maybe i missunderstood the taks but what about a:

也许我很想念他们但是怎么样:

SELECT ID, ThingID, max(PriceDateTime), Price FROM Thing GROUP BY ThingID

SELECT ID,ThingID,max(PriceDateTime),价格FROM Thing GROUP BY ThingID

#1


20  

I think the only solution with your table structure is to work with a subquery:

我认为表结构的唯一解决方案是使用子查询:

SELECT *
   FROM Thing
   WHERE ID IN (SELECT max(ID) FROM Thing 
                   WHERE ThingID IN (1,2,3,4)
                   GROUP BY ThingID)

(Given the highest ID also means the newest price)

(鉴于最高ID也意味着最新价格)

However I suggest you add a "IsCurrent" column that is 0 if it's not the latest price or 1 if it is the latest. This will add the possible risk of inconsistent data, but it will speed up the whole process a lot when the table gets bigger (if it is in an index). Then all you need to do is to...

但是我建议你添加一个“IsCurrent”列,如果它不是最新价格,则为0,如果是最新价格,则为1。这将增加数据不一致的可能风险,但是当表变大时(如果它在索引中),它将大大加快整个过程。那么你需要做的就是......

SELECT *
   FROM Thing
   WHERE ThingID IN (1,2,3,4)
     AND IsCurrent = 1

UPDATE

UPDATE

Okay, Markus updated the question to show that ID is a uniqueid, not an int. That makes writing the query even more complex.

好的,Markus更新了问题,以证明ID是唯一的ID,而不是int。这使得编写查询变得更加复杂。

SELECT T.* 
   FROM Thing T
   JOIN (SELECT ThingID, max(PriceDateTime)
            WHERE ThingID IN (1,2,3,4)
            GROUP BY ThingID) X ON X.ThingID = T.ThingID 
                                AND X.PriceDateTime = T.PriceDateTime
   WHERE ThingID IN (1,2,3,4)

I'd really suggest using either a "IsCurrent" column or go with the other suggestion found in the answers and use "current price" table and a separate "price history" table (which would ultimately be the fastest, because it keeps the price table itself small).

我真的建议使用“IsCurrent”列或者使用答案中的其他建议并使用“当前价格”表和单独的“价格历史”表(最终将是最快的,因为它保持价格表本身很小)。

(I know that the ThingID at the bottom is redundant. Just try if it is faster with or without that "WHERE". Not sure which version will be faster after the optimizer did its work.)

(我知道底部的ThingID是多余的。只要尝试使用或不使用“WHERE”就更快。不确定优化器完成其工作后哪个版本会更快。)

#2


2  

If the subquery route was too slow I would look at treating your price updates as an audit log and maintaining a ThingPrice table - perhaps as a trigger on the price updates table:

如果子查询路由太慢,我会将您的价格更新视为审计日志并维护ThingPrice表 - 可能作为价格更新表的触发器:

ThingID int not null,
UpdateID int not null,
PriceDateTime datetime not null,
Price decimal(18,4) not null

The primary key would just be ThingID and "UpdateID" is the "ID" in your original table.

主键只是ThingID,“UpdateID”是原始表中的“ID”。

#3


2  

I would try something like the following subquery and forget about changing your data structures.

我会尝试类似下面的子查询,忘记改变你的数据结构。

SELECT
 *
FROM
 Thing
WHERE 
 (ThingID, PriceDateTime) IN 
 (SELECT 
   ThingID, 
   max(PriceDateTime ) 
  FROM 
   Thing 
  WHERE 
   ThingID IN (1,2,3,4)
  GROUP BY 
   ThingID
 )

Edit the above is ANSI SQL and i'm now guessing having more than one column in a subquery doesnt work for T SQL. Marius, I can't test the following but try;

编辑上面是ANSI SQL,我现在猜测在子查询中有多个列不适用于T SQL。马吕斯,我不能测试以下但是试试;

SELECT
 p.*
FROM
 Thing p,
 (SELECT ThingID, max(PriceDateTime ) FROM Thing WHERE ThingID IN (1,2,3,4) GROUP BY ThingID) m
WHERE 
 p.ThingId = m.ThingId
 and p.PriceDateTime = m.PriceDateTime

another option might be to change the date to a string and concatenate with the id so you have only one column. This would be slightly nasty though.

另一种选择可能是将日期更改为字符串并与id连接,因此您只有一列。但这有点令人讨厌。

#4


2  

Since you are using SQL Server 2005, you can use the new (CROSS|OUTTER) APPLY clause. The APPLY clause let's you join a table with a table valued function.

由于您使用的是SQL Server 2005,因此可以使用新的(CROSS | OUTTER)APPLY子句。 APPLY子句允许您使用表值函数连接表。

To solve the problem, first define a table valued function to retrieve the top n rows from Thing for a specific id, date ordered:

要解决这个问题,首先要定义一个表值函数来从Thing中检索前面的n行,以获取特定的id,日期排序:

CREATE FUNCTION dbo.fn_GetTopThings(@ThingID AS GUID, @n AS INT)
  RETURNS TABLE
AS
RETURN
  SELECT TOP(@n) *
  FROM Things
  WHERE ThingID= @ThingID
  ORDER BY PriceDateTime DESC
GO

and then use the function to retrieve the top 1 records in a query:

然后使用该函数检索查询中的前1条记录:

SELECT *
   FROM Thing t
CROSS APPLY dbo.fn_GetTopThings(t.ThingID, 1)
WHERE t.ThingID IN (1,2,3,4,5,6)

The magic here is done by the APPLY clause which applies the function to every row in the left result set then joins with the result set returned by the function then retuns the final result set. (Note: to do a left join like apply, use OUTTER APPLY which returns all rows from the left side, while CROSS APPLY returns only the rows that have a match in the right side)

这里的魔力是由APPLY子句完成的,该子句将函数应用于左结果集中的每一行,然后与函数返回的结果集连接,然后返回最终结果集。 (注意:要执行类似apply的左连接,请使用从左侧返回所有行的OUTTER APPLY,而CROSS APPLY仅返回右侧匹配的行)

BlaM: Because I can't post comments yet( due to low rept points) not even to my own answers ^^, I'll answer in the body of the message: -the APPLY clause even, if it uses table valued functions it is optimized internally by SQL Server in such a way that it doesn't call the function for every row in the left result set, but instead takes the inner sql from the function and converts it into a join clause with the rest of the query, so the performance is equivalent or even better (if the plan is chosen right by sql server and further optimizations can be done) than the performance of a query using subqueries), and in my personal experience APPLY has no performance issues when the database is properly indexed and statistics are up to date (just like a normal query with subqueries behaves in such conditions)

BlaM:因为我还不能发表评论(由于低评价点)甚至不是我自己的答案^^,我将在消息正文中回答: - 即使APPLY子句,如果它使用表值函数它SQL Server在内部优化它不会为左结果集中的每一行调用函数,而是从函数中获取内部sql并将其转换为与查询其余部分的join子句,所以性能相当甚至更好(如果计划是由sql server选择并且可以进行进一步的优化)而不是使用子查询的查询性能,并且根据我的个人经验,当数据库正确时,APPLY没有性能问题索引和统计信息是最新的(就像子查询在这种情况下的常规查询一样)

#5


1  

It depends on the nature of how your data will be used, but if the old price data will not be used nearly as regularly as the current price data, there may be an argument here for a price history table. This way, non-current data may be archived off to the price history table (probably by triggers) as the new prices come in.

这取决于您的数据使用方式的性质,但如果旧的价格数据几乎不会像当前价格数据那样经常使用,那么这里可能存在价格历史表的参数。这样,当新价格进入时,非当前数据可以存档到价格历史表(可能通过触发器)。

As I say, depending on your access model, this could be an option.

正如我所说,根据您的访问模型,这可能是一个选项。

#6


1  

I'm converting the uniqueidentifier to a binary so that I can get a MAX of it. This should make sure that you won't get duplicates from multiple records with identical ThingIDs and PriceDateTimes:

我正在将uniqueidentifier转换为二进制文件,以便我可以得到它的MAX。这应该确保您不会从具有相同ThingID和PriceDateTimes的多个记录中获取重复项:

SELECT * FROM Thing WHERE CONVERT(BINARY(16),Thing.ID) IN
(
 SELECT MAX(CONVERT(BINARY(16),Thing.ID))
  FROM Thing
  INNER JOIN
   (SELECT ThingID, MAX(PriceDateTime) LatestPriceDateTime FROM Thing
    WHERE PriceDateTime >= CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)
    GROUP BY ThingID) LatestPrices
  ON Thing.ThingID = LatestPrices.ThingID
   AND Thing.PriceDateTime = LatestPrices.LatestPriceDateTime
 GROUP BY Thing.ThingID, Thing.PriceDateTime
) AND Thing.ThingID IN (1,2,3,4,5,6)

#7


1  

Since ID is not sequential, I assume you have a unique index on ThingID and PriceDateTime so only one price can be the most recent for a given item.

由于ID不是顺序的,我假设您在ThingID和PriceDateTime上有唯一索引,因此对于给定项目,只有一个价格可以是最新的。

This query will get all of the items in the list IF they were priced today. If you remove the where clause for PriceDate you will get the latest price regardless of date.

此查询将获取列表中的所有项目,如果它们今天定价。如果您删除PriceDate的where子句,无论日期如何,您都将获得最新价格。

SELECT * 
FROM Thing thi
WHERE thi.ThingID IN (1,2,3,4,5,6)
  AND thi.PriceDateTime =
     (SELECT MAX(maxThi.PriceDateTime)
      FROM Thing maxThi
      WHERE maxThi.PriceDateTime >= CAST( CONVERT(varchar(20), GETDATE(), 106) AS DateTime)
        AND maxThi.ThingID = thi.ThingID)

Note that I changed ">" to ">=" since you could have a price right at the start of a day

请注意,我将“>”更改为“> =”,因为您可以在一天开始时定价

#8


0  

Try this (provided you only need the latest price, not the identifier or datetime of that price)

试试这个(前提是你只需要最新的价格,而不是那个价格的标识符或日期时间)

SELECT ThingID, (SELECT TOP 1 Price FROM Thing WHERE ThingID = T.ThingID ORDER BY PriceDateTime DESC) Price
FROM Thing T
WHERE ThingID IN (1,2,3,4) AND DATEDIFF(D, PriceDateTime, GETDATE()) = 0
GROUP BY ThingID

#9


0  

It must work without using a global PK column (for complex primary keys for example):

它必须在不使用全局PK列的情况下工作(例如,对于复杂的主键):

SELECT t1.*, t2.PriceDateTime AS bigger FROM Prices t1 
LEFT JOIN Prices t2 ON t1.ThingID = t2.ThingID AND t1.PriceDateTime < t2.PriceDateTime 
HAVING t2.PriceDateTime IS NULL

#10


-1  

maybe i missunderstood the taks but what about a:

也许我很想念他们但是怎么样:

SELECT ID, ThingID, max(PriceDateTime), Price FROM Thing GROUP BY ThingID

SELECT ID,ThingID,max(PriceDateTime),价格FROM Thing GROUP BY ThingID