如何在mysql中获得相应的最大和最小id ?

时间:2021-05-31 12:22:04

I am using MySQL to keep data for a large set of simulations I am running on an HPC cluster. Each simulation has its own entry in a table, and there is a second table which keeps the simulation time step result data. The time step result data table is quite large (tens to hundreds of millions of rows). The tables look like this:

我正在使用MySQL为我在HPC集群上运行的大量模拟数据保留数据。每个模拟在一个表中都有自己的条目,并且有第二个表保存模拟时间步长结果数据。时间步长结果数据表非常大(数千万到数亿行)。表格是这样的:

Table: simulations

表:模拟

id      descriptor  notes 
1       SIM1        notes here...
2       SIM2        SIM2 Notes...
...     ...         ...
8643    SIM8643     SIM8643 Notes...

Table: simulations_ts

表:simulations_ts

id         simulation_id    step        data_value
1          1                1           0.05
2          1                2           0.051
...        ...              ...         ...
1983       1                1983        0.253
1984       2                1           0.043
...        ...              ...         ...
59345435   8643             2832        0.067

I would like to efficiently be able to return the following table:

我希望能够有效地返回下列表格:

simulation_id    first_ts_id     last_ts_id  num_steps
1                1               1983        1983
2                1984            2938434     2052
...              ...             ...         ...
8643             12835283        59345435    2832

I know I can perform a query like:

我知道我可以执行这样的查询:

SELECT
   simulation_id
   MIN(step) AS first_step,
   MAX(step) AS last_step,
   COUNT(id) AS num_steps
FROM
   simulations_ts
GROUP BY
   simulation_id
ORDER BY
   simulation_id ASC

And that there are ways to do sub-queries to pull the corresponding id for one aggregate, but I have found no examples to pull the corresponding id for two aggregate functions. Is this possible to do in a single query in an efficient way, or am I better off just stepping through and doing a min lookup and max lookup separately?

有一些方法可以执行子查询来为一个聚合提取对应的id,但是我没有找到任何示例来为两个聚合函数提取对应的id。是否可以以一种有效的方式在单个查询中执行此操作,还是最好只进行一次最小查找和最大查找?

2 个解决方案

#1


2  

SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
      FROM simulations_ts
      GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep

#2


1  

I think this is what you're after. Note that I'm only displaying the id column from the first_dim_id and last_dim_id aliases of simulations_ts, but you could of course display other columns from that table.

我想这就是你想要的。注意,我只显示来自simulations_ts的first_dim_id和last_dim_id别名的id列,但是您当然可以显示该表中的其他列。

SELECT
   main.simulation_id,
   first_step,
   first_sim.id as first_sim_id,
   last_step,
   last_sim.id as last_sim_id
FROM
   (SELECT
       simulation_id,
       MIN(step) AS first_step,
       MAX(step) AS last_step,
       COUNT(id) AS num_steps
    FROM
       simulations_ts
    GROUP BY
       simulation_id) as main
    JOIN simulations_ts first_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.first_step = first_sim.step
    JOIN simulations_ts last_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.last_step = last_sim.step

I start with your original query, then simply join it back to simulations_ts on the sim id and min/max step.

我从您最初的查询开始,然后简单地将它连接回sim id和最小/max步骤上的simulations_ts。

#1


2  

SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
      FROM simulations_ts
      GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep

#2


1  

I think this is what you're after. Note that I'm only displaying the id column from the first_dim_id and last_dim_id aliases of simulations_ts, but you could of course display other columns from that table.

我想这就是你想要的。注意,我只显示来自simulations_ts的first_dim_id和last_dim_id别名的id列,但是您当然可以显示该表中的其他列。

SELECT
   main.simulation_id,
   first_step,
   first_sim.id as first_sim_id,
   last_step,
   last_sim.id as last_sim_id
FROM
   (SELECT
       simulation_id,
       MIN(step) AS first_step,
       MAX(step) AS last_step,
       COUNT(id) AS num_steps
    FROM
       simulations_ts
    GROUP BY
       simulation_id) as main
    JOIN simulations_ts first_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.first_step = first_sim.step
    JOIN simulations_ts last_sim
         ON main.simulation_id = first_sim.simulation_id
            AND main.last_step = last_sim.step

I start with your original query, then simply join it back to simulations_ts on the sim id and min/max step.

我从您最初的查询开始,然后简单地将它连接回sim id和最小/max步骤上的simulations_ts。