GROUP BY的SQL查询,返回符合LEAST,COALESCE条件的组

时间:2022-09-23 13:06:07

I want to write an SQL query calling for several columns with a bit complicated conditions. I'm working on R Studio using RMySQL package. My server is MySQL.

我想编写一个SQL查询,调用几个具有一些复杂条件的列。我正在使用RMySQL包在R Studio上工作。我的服务器是MySQL。

The table looks like this.

表格看起来像这样。

organisation    Tour_ID             A           B           C           D  
Ikea                  a    2018-04-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  a    2018-06-01  2018-05-03  2018-05-29          NA   
Ikea                  a    2018-04-02  2018-05-01  2018-07-08  2018-05-26 
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01
Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  c    2018-06-01  2018-05-03          NA          NA   
Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26

This is what I want to do:

这就是我想要做的:

  1. filter the rows where organisation = Ikea
  2. 过滤organization = Ikea的行

  3. groupby by Tour_ID like this:

    groupby by Tour_ID是这样的:

    organisation    Tour_ID             A           B           C           D  
    Ikea                  a    2018-04-01  2018-05-07  2018-05-09  2018-05-01
    Ikea                  a    2018-06-01  2018-05-03  2018-05-29          NA   
    Ikea                  a    2018-04-02  2018-05-01  2018-07-08  2018-05-26 
    
    
    Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
    Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
    Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01
    
    
    Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
    Ikea                  c    2018-06-01  2018-05-03          NA          NA   
    Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26
    
  4. in each group of Tour_ID, look at the earliest date in columns A, B, C and D. If the earliest date among the four columns in the group is between 2018-05-01 and 2018-05-31, return the entire group. If a row contains NA values, I want to ignore the NAs and see what's the earliest date among the rest of the values. For example, for the group of Tour_ID = a, the earliest date is 2018-04-01 therefore it doesn't meet the criteria.

    在每组Tour_ID中,查看A,B,C和D列中最早的日期。如果组中四列中最早的日期在2018-05-01和2018-05-31之间,则返回整个组。如果一行包含NA值,我想忽略NA并查看其余值中最早的日期。例如,对于Tour_ID = a的组,最早的日期是2018-04-01,因此它不符合标准。

In conclusion, only the groups where Tour_ID = b and Tour_ID = c match the conditions. The result should be:

总之,只有Tour_ID = b和Tour_ID = c的组符合条件。结果应该是:

organisation    Tour_ID             A           B           C           D
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b    2018-06-02  2018-05-01          NA  2018-05-26
Ikea                  b            NA  2018-05-05  2018-08-02  2018-06-01
Ikea                  c    2018-06-01  2018-05-07  2018-05-09  2018-05-01
Ikea                  c    2018-06-01  2018-05-03          NA          NA   
Ikea                  c    2018-08-02  2018-05-09  2018-07-08  2018-05-26

How should I write an SQL query? Here is my attempt, but I just don't know how to do groupby, and how to return the entire group not just the rows with the earliest date.

我该如何编写SQL查询?这是我的尝试,但我只是不知道怎么做groupby,以及如何返回整个组而不仅仅是最早的日期行。

SELECT *
FROM myTable 
WHERE organisation LIKE 'Ikea' AND
GROUP BY 'Tour_ID' AND
LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) >= '2018-05-01' AND
LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) < '2018-06-01';

('2019-01-01' is to replace NAs)

('2019-01-01'将取代NAs)

Thank you for any kinds of help!

感谢您提供各种帮助!

ADDED: Following the answer by Gordon, here I rewrote the SQL statement.

补充:根据Gordon的回答,我在这里重写了SQL语句。

"SELECT t.* FROM myTable JOIN (SELECT organisation, Tour_ID 
                                                   FROM myTable
                                                   WHERE organisation LIKE 'Ikea' AND
                                                   GROUP BY organisation, Tour_ID
                                                   HAVING LEAST(COALESCE(MIN(A), '2119-01-01'), 
                                                                COALESCE(MIN(B), '2119-01-01'), 
                                                                COALESCE(MIN(C), '2119-01-01'), 
                                                                COALESCE(MIN(D), '2119-01-01')) >= '2018-05-01' AND
                                                          LEAST(COALESCE(MIN(A), '2119-01-01'), 
                                                                COALESCE(MIN(B), '2119-01-01'), 
                                                                COALESCE(MIN(C), '2119-01-01'), 
                                                                COALESCE(MIN(D), '2119-01-01')) < '2018-06-01'
                                                  ) tt
                                                  ON tt.Tour_ID = t.Tour_ID AND
                                                     tt.organisation = t.organisation"

And I ran dbGetQuery from RMySQL package. But I get the following error. I don't understand because GROUP BY part seems quite okay. Does anyone know why I'm getting this error?

我从RMySQL包运行dbGetQuery。但是我收到以下错误。我不明白,因为GROUP BY部分似乎很好。有谁知道为什么我收到这个错误?

dbGetQuery(connection = connection, statement = condition)

Error in .local(conn, statement, ...) : could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'GROUP BY organisation, Tour_ID HAVING LEAST(COALESCE(A' at line 1

1 个解决方案

#1


0  

First get the tour_ids that match the conditions:

首先获取符合条件的tour_ids:

SELECT Tour_ID
FROM myTable 
WHERE organisation LIKE 'Ikea'
GROUP BY Tour_ID
HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
       LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01';

Then put this into a query to get the original rows. Here is one way:

然后将其放入查询中以获取原始行。这是一种方式:

select t.*
from mytable t join
     (SELECT organisation, Tour_ID
      FROM myTable 
      WHERE organisation LIKE 'Ikea'
      GROUP BY organisation, Tour_ID
      HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
             LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01'
     ) tt
     ON tt.tour_id = t.tour_id AND
        tt.organisation = t.organisation;

#1


0  

First get the tour_ids that match the conditions:

首先获取符合条件的tour_ids:

SELECT Tour_ID
FROM myTable 
WHERE organisation LIKE 'Ikea'
GROUP BY Tour_ID
HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
       LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01';

Then put this into a query to get the original rows. Here is one way:

然后将其放入查询中以获取原始行。这是一种方式:

select t.*
from mytable t join
     (SELECT organisation, Tour_ID
      FROM myTable 
      WHERE organisation LIKE 'Ikea'
      GROUP BY organisation, Tour_ID
      HAVING LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) >= '2018-05-01' AND
             LEAST(COALESCE(MIN(A), '2019-01-01'), COALESCE(MIN(B), '2019-01-01'), COALESCE(MIN(C), '2019-01-01'), COALESCE(MIN(D), '2019-01-01')) < '2018-06-01'
     ) tt
     ON tt.tour_id = t.tour_id AND
        tt.organisation = t.organisation;