SQL选择内连接,子选择和限制

时间:2021-02-20 23:31:33

I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.

我已经使用这个SQL问题大约2天了,并怀疑我非常接近解决问题,但似乎无法找到一个完全有效的解决方案。

What I'm attempting to do is a selective join on two tables called application_info and application_status that are used to store information about open access journal article funding requests.

我试图做的是在两个名为application_info和application_status的表上进行选择性连接,这两个表用于存储有关开放访问期刊文章资金请求的信息。

application_info has general information about the applicant and uses an auto indexing field called Application_ID as a key field. application_status is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID field and an auto indexing field called Status_ID along with a status text and status date field.

application_info具有关于申请人的一般信息,并使用名为Application_ID的自动索引字段作为关键字段。 application_status用于跟踪有关应用程序状态(已接收,正在审核,资助,拒绝,撤回等)的持续信息以及期刊文章的状态(已提交,接受,重新提交,已发布或已拒绝)并包含Application_ID字段和名为Status_ID的自动索引字段以及状态文本和状态日期字段。

Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.

因为我们希望保持应用程序,文章和资金状态更改的运行日志,所以我们不希望用更新的值覆盖application_status中的现有行,而是希望仅显示最新的状态值。由于应用程序最终会有多个状态更改,因此需要对状态数据的内部联接应用某种限制,以便为每个应用程序ID返回一行。

Here's an example of what I am attempting to do in a query that currently throws an error:

这是我在当前抛出错误的查询中尝试执行的操作的示例:

-- simplified example
SELECT 
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
    SELECT 
    Status_ID,
    Application_ID,
    Status_State_Text,
    Status_State_Date,
    Status_State_InitiatedBy,
    Status_State_ChangebBy,
    Status_State_Notes
    FROM application_status 
    WHERE Status_State_Text LIKE 'Article Status%'
    AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
    -- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
    -- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
    ORDER BY Status_ID DESC
    -- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
    LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;

Eliminating the AND Application_ID = application_info.Application_ID and portion of the sub query along with the LIMIT causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN/MAX operators but have noticed that they return unpredictable rows from the application_status table when they work.

消除AND Application_ID = application_info.Application_ID以及子查询的一部分以及LIMIT会导致select工作,但会为给定应用程序ID的每个状态返回一行。我已经尝试过使用MIN / MAX运算符,但是注意到它们工作时会从application_status表中返回不可预测的行。

I've also attempted to do sub selects in the ON section of the join, but don't know how to make that work because the end result would always need to return an Application_ID (can both Application_ID and Status_ID be returned and used?).

我也尝试在连接的ON部分进行子选择,但不知道如何使其工作,因为最终结果总是需要返回Application_ID(可以返回并使用Application_ID和Status_ID吗?) 。

Any hints on how to get this to work as I'm intending? Can this even be done?

关于如何让我按照我的意图工作的任何提示?甚至可以这样做吗?

Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.

进一步编辑:下面的工作查询。关键是将连接中的子查询更深一层地移动,然后只返回一个状态ID。

-- simplified example (now working)
SELECT 
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
    SELECT 
    Status_ID,
    Application_ID,
    Status_State_Text,
    Status_State_Date,
    Status_State_InitiatedBy,
    Status_State_ChangebBy,
    Status_State_Notes
    FROM application_status AS artstatus_int
    WHERE 
    -- sub query moved one level deeper so current join Application_ID can be passed
    -- order by and limit can now be used
    Status_ID = (
        SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
        AND status_State_Text LIKE 'Article Status%'
        ORDER BY Status_ID DESC
        LIMIT 1
    )
    ORDER BY Application_ID, Status_ID DESC
    -- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well

-- more LEFT JOINS follow
;

1 个解决方案

#1


2  

You can't have a correlated subquery in the from clause.

您不能在from子句中使用相关子查询。

Try this idea instead:

试试这个想法:

select <whatever>
from (select a.*,
             (select max(status_id) as maxstatusid
              from application_status aps
              where aps.application_id = a.application_id
             ) as maxstatusid
      from application
     ) left outer join
     application_status aps
     on aps.status_id = a.maxstatusid
. . .

That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.

也就是说,将相关子查询放在select子句中以获取最新状态。然后将其加入状态表以获取其他信息。并使用其他详细信息完成查询。

You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.

您似乎非常擅长SQL技能,因此似乎没有必要为您重写整个查询。

#1


2  

You can't have a correlated subquery in the from clause.

您不能在from子句中使用相关子查询。

Try this idea instead:

试试这个想法:

select <whatever>
from (select a.*,
             (select max(status_id) as maxstatusid
              from application_status aps
              where aps.application_id = a.application_id
             ) as maxstatusid
      from application
     ) left outer join
     application_status aps
     on aps.status_id = a.maxstatusid
. . .

That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.

也就是说,将相关子查询放在select子句中以获取最新状态。然后将其加入状态表以获取其他信息。并使用其他详细信息完成查询。

You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.

您似乎非常擅长SQL技能,因此似乎没有必要为您重写整个查询。