SQL Query:如何只返回第一个和最后一个实例?

时间:2023-01-19 08:35:20

I have a table that shows the status of each case with multiple jobs being performed simultaneously, I would like to have the results displayed so that it only shows the first and last instance. (Mainly I want to know when the job was first started and what's its last known status).

我有一个表格,显示每个案例的状态,同时执行多个作业,我希望显示结果,以便它只显示第一个和最后一个实例。 (主要是我想知道什么时候开始工作,最后知道什么状态)。

I've managed to get the results with 2 similar min, max, and group by queries joined by an UNION function. But is there a simpler way?

我通过UNION函数加入的查询设法得到了2个类似的min,max和group的结果。但是有更简单的方法吗?

However, would it be possible to display the 2 instances on one line instead of 2 separate lines? because the date from the first instance will be the start date and the last instance will be the end date, and i don't really care about the first status because it's always pending, i just want to know what's the last known status

但是,是否可以在一行显示2个实例而不是2个单独的行?因为第一个实例的日期将是开始日期,最后一个实例将是结束日期,我真的不关心第一个状态,因为它总是待定,我只是想知道最后的已知状态是什么

1st table shows unfiltered results and 2nd table is desired results (but if we can combine the first and last instance on one line that'd be even better)

第一个表显示未过滤的结果,第二个表是期望的结果(但是如果我们可以将第一个和最后一个实例组合在一条线上,那就更好了)

ID  Status      Date       Job  Note
1   pending     1-Jul       A   abc
1   pending     2-Jul       A   xyz
1   pending     2-Jul       A   abc
1   done        3-Jul       B   xyz
1   done        4-Jul       A   abc
2   pending     1-Jul       A   abc
2   done        2-Jul       A   xyz
2   done        2-Jul       A   abc
2   pending     3-Jul       C   xyz
2   pending     4-Jul       C   xyz
2   pending     5-Jul       C   xyz
2   pending     6-Jul       C   xyz
3   pending     2-Jul       D   xyz
3   done        3-Jul       D   abc
3   pending     4-Jul       D   abc
3   pending     1-Jul       E   xyz
3   done        3-Jul       E   xyz

ID  Status      Date       Job  Note
1   pending     1-Jul       A   abc
1   done        3-Jul       B   xyz
1   done        4-Jul       A   abc
2   pending     1-Jul       A   abc
2   done        2-Jul       A   abc
2   pending     3-Jul       C   xyz
2   pending     6-Jul       C   xyz
3   pending     2-Jul       D   xyz
3   pending     4-Jul       D   abc
3   pending     1-Jul       E   xyz
3   done        3-Jul       E   xyz

Thank you very much in advance

非常感谢你提前

3 个解决方案

#1


1  

One way to do it is to use ROW_NUMBER function twice in ascending and descending order to get first and last rows of each group. See SQL Fiddle

一种方法是使用ROW_NUMBER函数按升序和降序两次获取每组的第一行和最后一行。请参阅SQL小提琴

WITH
CTE
AS
(
  SELECT
    ID
    ,Status
    ,dt
    ,Job
    ,Note
    ,ROW_NUMBER() OVER (PARTITION BY ID, Job ORDER BY dt ASC) AS rnASC
    ,ROW_NUMBER() OVER (PARTITION BY ID, Job ORDER BY dt DESC) AS rnDESC
  FROM T
)
SELECT 
    ID
    ,Status
    ,dt
    ,Job
    ,Note
FROM CTE
WHERE rnAsc=1 OR rnDesc=1
ORDER BY ID, Job, dt

This variant would scan through the whole table, calculate row numbers and discard those rows that don't satisfy the filter.

此变体将扫描整个表,计算行号并丢弃那些不满足过滤器的行。

The second variant is to use CROSS APPLY, which may be more efficient, if (a) your main table has millions of rows, (b) you have a small table with the list of all IDs and Jobs, (c) the main table has appropriate index. In this case instead of reading all rows of the main table you can do index seek for each (ID, Job) (two seeks, one for first row plus one for the last row).

第二个变体是使用CROSS APPLY,如果(a)你的主表有数百万行,(b)你有一个包含所有ID和作业列表的小表,(c)主表,它可能更有效率有适当的指数。在这种情况下,您不必读取主表的所有行,而是可以为每个行(ID,Job)执行索引搜索(两个搜索,一个用于第一行,另一个用于最后一行)。

#2


0  

Try this:

SELECT A.ID, A.JOB, A.STATUS, B.START_DATE, CASE WHEN A.STATUS = 'done' THEN C.END_DATE ELSE NULL AS END_DATE
FROM <JOBS_TABLE> A
JOIN (SELECT ID, JOB, MIN(DATE) AS START_DATE FROM <JOBS_TABLE> GROUP BY ID, JOB) B
ON A.ID = B.ID
AND A.JOB = B.JOB
JOIN (SELECT ID, JOB, MAX(DATE) AS END_DATE FROM <JOBS_TABLE GROUP BY ID, JOB) C
ON A.ID= C.ID
AND A.JOB = C.JOB
AND A.DATE = C.END_DATE

You'll need to replace < JOBS_TABLE > with whatever your table name is. Ideally, this should combine the data from the first and last rows for each distinct set of ID and JOB values. If the job is not finished, it will not show an END_DATE.

您需要将 替换为您的表名。理想情况下,这应该为每个不同的ID和JOB值组合来自第一行和最后一行的数据。如果作业未完成,则不会显示END_DATE。

#3


0  

I don't think there's much wrong with your UNION idea. Is this what you have?

我不认为你的UNION想法有多大问题。这是你的吗?

select id, job, status, max(date), job, note, 'max' as type from test1 group by job UNION select id, job, status, min(date), job, note, 'min' as type from test1 group by job;

选择id,job,status,max(date),job,note,'max'作为test1 group by job的类型UNION select id,job,status,min(date),job,note,'min'作为test1的类型按工作分组;

#1


1  

One way to do it is to use ROW_NUMBER function twice in ascending and descending order to get first and last rows of each group. See SQL Fiddle

一种方法是使用ROW_NUMBER函数按升序和降序两次获取每组的第一行和最后一行。请参阅SQL小提琴

WITH
CTE
AS
(
  SELECT
    ID
    ,Status
    ,dt
    ,Job
    ,Note
    ,ROW_NUMBER() OVER (PARTITION BY ID, Job ORDER BY dt ASC) AS rnASC
    ,ROW_NUMBER() OVER (PARTITION BY ID, Job ORDER BY dt DESC) AS rnDESC
  FROM T
)
SELECT 
    ID
    ,Status
    ,dt
    ,Job
    ,Note
FROM CTE
WHERE rnAsc=1 OR rnDesc=1
ORDER BY ID, Job, dt

This variant would scan through the whole table, calculate row numbers and discard those rows that don't satisfy the filter.

此变体将扫描整个表,计算行号并丢弃那些不满足过滤器的行。

The second variant is to use CROSS APPLY, which may be more efficient, if (a) your main table has millions of rows, (b) you have a small table with the list of all IDs and Jobs, (c) the main table has appropriate index. In this case instead of reading all rows of the main table you can do index seek for each (ID, Job) (two seeks, one for first row plus one for the last row).

第二个变体是使用CROSS APPLY,如果(a)你的主表有数百万行,(b)你有一个包含所有ID和作业列表的小表,(c)主表,它可能更有效率有适当的指数。在这种情况下,您不必读取主表的所有行,而是可以为每个行(ID,Job)执行索引搜索(两个搜索,一个用于第一行,另一个用于最后一行)。

#2


0  

Try this:

SELECT A.ID, A.JOB, A.STATUS, B.START_DATE, CASE WHEN A.STATUS = 'done' THEN C.END_DATE ELSE NULL AS END_DATE
FROM <JOBS_TABLE> A
JOIN (SELECT ID, JOB, MIN(DATE) AS START_DATE FROM <JOBS_TABLE> GROUP BY ID, JOB) B
ON A.ID = B.ID
AND A.JOB = B.JOB
JOIN (SELECT ID, JOB, MAX(DATE) AS END_DATE FROM <JOBS_TABLE GROUP BY ID, JOB) C
ON A.ID= C.ID
AND A.JOB = C.JOB
AND A.DATE = C.END_DATE

You'll need to replace < JOBS_TABLE > with whatever your table name is. Ideally, this should combine the data from the first and last rows for each distinct set of ID and JOB values. If the job is not finished, it will not show an END_DATE.

您需要将 替换为您的表名。理想情况下,这应该为每个不同的ID和JOB值组合来自第一行和最后一行的数据。如果作业未完成,则不会显示END_DATE。

#3


0  

I don't think there's much wrong with your UNION idea. Is this what you have?

我不认为你的UNION想法有多大问题。这是你的吗?

select id, job, status, max(date), job, note, 'max' as type from test1 group by job UNION select id, job, status, min(date), job, note, 'min' as type from test1 group by job;

选择id,job,status,max(date),job,note,'max'作为test1 group by job的类型UNION select id,job,status,min(date),job,note,'min'作为test1的类型按工作分组;