I'm writing a cronjob that runs analysis on a flags
table in my database, structured as such:
我正在编写一个cronjob,它在我的数据库中的flags表上运行分析,其结构如下:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def
, for each unique price
, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`
). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
对于每个不同的def,对于每个唯一的价格,我希望获得“最新”行(IFNULL(' time_resolve ', ' time_flags ')作为' time ')。如果给定的def-item组合不存在这样的行,那也没关系;我只是不想要任何一个有缺陷的组合。
For the above data set, I would like to select:
对于上述数据集,我想选择:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time
.
不包括第1行,因为第4行“覆盖”了它,因为这两行都有相同的def-item组合,后者的时间更晚。
The data set will have a few dozen distinct def
s, a few hundred distinct item
s, and a very large number of flag
s that will only increase over time.
数据集将有几十个不同的defs、几百个不同的项和大量的标记,这些标记只会随着时间的推移而增加。
How can I go about doing this? I see the greatest-n-per-group
tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.
我该怎么做呢?我看到每个组都有类似的问题,但是我没有看到任何涉及到我需要的“嵌套分组”的具体情况。
4 个解决方案
#1
2
You could try:
你可以试试:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
我知道这不是最好的方法,但可能对你有用
#2
1
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
你的意思是“每一个独特的定义和每个独特的项目”?如果是这样,由多个列组成的组似乎可以工作(显示为一个临时表t),并将其连接到原始表以获取其余的数据:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
#3
1
Depending on your version of mySQL, you can use a window function:
根据您的mySQL版本,您可以使用一个窗口函数:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK()
to ROW_NUMBER
. This will guarantee you only get one row per grouping.
如果您可以使用相同的“时间”值的(def, item)组合,那么将RANK()更改为ROW_NUMBER。这将保证每个分组只得到一行。
#4
1
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time
#1
2
You could try:
你可以试试:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
我知道这不是最好的方法,但可能对你有用
#2
1
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
你的意思是“每一个独特的定义和每个独特的项目”?如果是这样,由多个列组成的组似乎可以工作(显示为一个临时表t),并将其连接到原始表以获取其余的数据:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
#3
1
Depending on your version of mySQL, you can use a window function:
根据您的mySQL版本,您可以使用一个窗口函数:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK()
to ROW_NUMBER
. This will guarantee you only get one row per grouping.
如果您可以使用相同的“时间”值的(def, item)组合,那么将RANK()更改为ROW_NUMBER。这将保证每个分组只得到一行。
#4
1
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time