从数据库中检索每个组中的最后一条记录 - SQL Server 2005/2008

时间:2023-02-05 01:42:57

I have done some seaching by can't seem to get the results I am looking for. Basically we have four different management systems in place throughout our company and I am in the process of combining all the data from each system on a regular basis. My goal is to update the data every hour into a central database. Here is a sample data set I am working with:

我做了一些搜索似乎无法得到我正在寻找的结果。基本上我们在整个公司都有四种不同的管理系统,我正在定期组合每个系统的所有数据。我的目标是每小时将数据更新到一个*数据库。这是我正在使用的示例数据集:

COMPUTERNAME | SERIALNUMBER | USERNAME | LASTIP | LASTUPDATE | SOURCE
TEST1 | 1111 | BOB | 1.1.1.1 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST1 | 1111 | BOB | 1.1.1.1 | 1/18/2011 01:00:00 | MGMT_SYSTEM_2
TEST1 | 1111 | PETER | 1.1.1.11 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST2 | 2222 | GEORGE | 1.1.1.2 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST3 | 3333 | TOM | 1.1.1.3 | 1/19/2011 01:00:00 | MGMT_SYSTEM_2
TEST4 | 4444 | MIKE   | 1.1.1.4 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST4 | 4444 | MIKE   | 1.1.1.41 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST5 | 5555 | SUSIE  | 1.1.1.5 | 1/19/2011 01:00:00 | MGMT_SYSTEM_1

So I want to query this master table and only retrieve the latest record (based on LASTUPDATE) that way I can get the latest info about that system. The problem is that one system may be in each database, but of course they will never have the same exact update time.

所以我想查询这个主表,只检索最新的记录(基于LASTUPDATE),这样我就可以获得有关该系统的最新信息。问题是每个数据库中可能有一个系统,但当然它们永远不会有相同的更新时间。

I would expect to get something like this:

我希望得到这样的东西:

TEST1 | 1111 | PETER | 1.1.1.11 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST2 | 2222 | GEORGE | 1.1.1.2 | 1/17/2011 01:00:00 | MGMT_SYSTEM_1
TEST3 | 3333 | TOM | 1.1.1.3 | 1/19/2011 01:00:00 | MGMT_SYSTEM_2
TEST4 | 4444 | MIKE   | 1.1.1.41 | 1/19/2011 01:00:00 | MGMT_SYSTEM_3
TEST5 | 5555 | SUSIE  | 1.1.1.5 | 1/19/2011 01:00:00 | MGMT_SYSTEM_1

I have tried using the MAX function, but with that I can only retrieve one column. And I can't use that in a subquery because I don't have a unique ID field that would give me the last updated record. One of the systems is a MySQL database and the MAX function in MySQL will actually work the way I need it to only returning one record per GROUP BY, but it doesn't work in SQL Server.

我尝试过使用MAX函数,但是我只能检索一列。而且我不能在子查询中使用它,因为我没有唯一的ID字段可以给我最后更新的记录。其中一个系统是一个MySQL数据库,MySQL中的MAX函数实际上会按照我需要的方式工作,每个GROUP BY只返回一条记录,但它在SQL Server中不起作用。

I'm thinking I need to use MAX and a LEFT JOIN, but my attempts so far have failed.

我想我需要使用MAX和LEFT JOIN,但到目前为止我的尝试都失败了。

Your help would be greatly appreciated. I have been racking my brain for the past 3-4 hours trying to get a working query. This master table is located on a SQL Server 2005 server.

非常感谢您的帮助。在过去的3-4个小时里,我一直绞尽脑汁试图找到一个有效的查询。此主表位于SQL Server 2005服务器上。

Thanks!

谢谢!

2 个解决方案

#1


42  

;with cteRowNumber as (
    select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE,
           row_number() over(partition by COMPUTERNAME order by LASTUPDATE desc) as RowNum
        from YourTable
)
select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE
    from cteRowNumber
    where RowNum = 1

#2


4  

In SQL Server, the most performant solution is often a correlated subquery:

在SQL Server中,性能最高的解决方案通常是相关的子查询:

select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
                      from t t2
                      where t2.computername = t.computername
                     );

In particular, this can take advantage of an index on (computername, lastupdate). Conceptually, the reason this is faster than row_number() is because this query simply filters out the rows that don't match. The row_number() version needs to attach to the row number to all rows, before it filters -- that is more data processing.

特别是,这可以利用(computername,lastupdate)上的索引。从概念上讲,这比row_number()更快的原因是因为此查询只是过滤掉不匹配的行。 row_number()版本需要在行过滤之前将行号附加到所有行 - 这是更多的数据处理。

#1


42  

;with cteRowNumber as (
    select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE,
           row_number() over(partition by COMPUTERNAME order by LASTUPDATE desc) as RowNum
        from YourTable
)
select COMPUTERNAME, SERIALNUMBER, USERNAME, LASTIP, LASTUPDATE, SOURCE
    from cteRowNumber
    where RowNum = 1

#2


4  

In SQL Server, the most performant solution is often a correlated subquery:

在SQL Server中,性能最高的解决方案通常是相关的子查询:

select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
                      from t t2
                      where t2.computername = t.computername
                     );

In particular, this can take advantage of an index on (computername, lastupdate). Conceptually, the reason this is faster than row_number() is because this query simply filters out the rows that don't match. The row_number() version needs to attach to the row number to all rows, before it filters -- that is more data processing.

特别是,这可以利用(computername,lastupdate)上的索引。从概念上讲,这比row_number()更快的原因是因为此查询只是过滤掉不匹配的行。 row_number()版本需要在行过滤之前将行号附加到所有行 - 这是更多的数据处理。