I am using MySQL to keep data for a large set of simulations I am running on an HPC cluster. Each simulation has its own entry in a table, and there is a second table which keeps the simulation time step result data. The time step result data table is quite large (tens to hundreds of millions of rows). The tables look like this:
我正在使用MySQL为我在HPC集群上运行的大量模拟数据保留数据。每个模拟在一个表中都有自己的条目,并且有第二个表保存模拟时间步长结果数据。时间步长结果数据表非常大(数千万到数亿行)。表格是这样的:
Table: simulations
表:模拟
id descriptor notes
1 SIM1 notes here...
2 SIM2 SIM2 Notes...
... ... ...
8643 SIM8643 SIM8643 Notes...
Table: simulations_ts
表:simulations_ts
id simulation_id step data_value
1 1 1 0.05
2 1 2 0.051
... ... ... ...
1983 1 1983 0.253
1984 2 1 0.043
... ... ... ...
59345435 8643 2832 0.067
I would like to efficiently be able to return the following table:
我希望能够有效地返回下列表格:
simulation_id first_ts_id last_ts_id num_steps
1 1 1983 1983
2 1984 2938434 2052
... ... ... ...
8643 12835283 59345435 2832
I know I can perform a query like:
我知道我可以执行这样的查询:
SELECT
simulation_id
MIN(step) AS first_step,
MAX(step) AS last_step,
COUNT(id) AS num_steps
FROM
simulations_ts
GROUP BY
simulation_id
ORDER BY
simulation_id ASC
And that there are ways to do sub-queries to pull the corresponding id for one aggregate, but I have found no examples to pull the corresponding id for two aggregate functions. Is this possible to do in a single query in an efficient way, or am I better off just stepping through and doing a min lookup and max lookup separately?
有一些方法可以执行子查询来为一个聚合提取对应的id,但是我没有找到任何示例来为两个聚合函数提取对应的id。是否可以以一种有效的方式在单个查询中执行此操作,还是最好只进行一次最小查找和最大查找?
2 个解决方案
#1
2
SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
FROM simulations_ts
GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep
#2
1
I think this is what you're after. Note that I'm only displaying the id column from the first_dim_id
and last_dim_id
aliases of simulations_ts, but you could of course display other columns from that table.
我想这就是你想要的。注意,我只显示来自simulations_ts的first_dim_id和last_dim_id别名的id列,但是您当然可以显示该表中的其他列。
SELECT
main.simulation_id,
first_step,
first_sim.id as first_sim_id,
last_step,
last_sim.id as last_sim_id
FROM
(SELECT
simulation_id,
MIN(step) AS first_step,
MAX(step) AS last_step,
COUNT(id) AS num_steps
FROM
simulations_ts
GROUP BY
simulation_id) as main
JOIN simulations_ts first_sim
ON main.simulation_id = first_sim.simulation_id
AND main.first_step = first_sim.step
JOIN simulations_ts last_sim
ON main.simulation_id = first_sim.simulation_id
AND main.last_step = last_sim.step
I start with your original query, then simply join it back to simulations_ts
on the sim id and min/max step.
我从您最初的查询开始,然后简单地将它连接回sim id和最小/max步骤上的simulations_ts。
#1
2
SELECT simulation_id, first.id as first_ts_id, last.id as last_ts_id, num_steps
FROM (SELECT simulation_id, MIN(step) minstep, MAX(step) maxstep, COUNT(*) num_steps
FROM simulations_ts
GROUP BY simulation_id) AS g
JOIN simulations_ts first ON first.simulation_id = g.simulation_id AND first.step = g.minstep
JOIN simulations_ts last ON last.simulation_id = g.simulation_id AND last.step = g.maxstep
#2
1
I think this is what you're after. Note that I'm only displaying the id column from the first_dim_id
and last_dim_id
aliases of simulations_ts, but you could of course display other columns from that table.
我想这就是你想要的。注意,我只显示来自simulations_ts的first_dim_id和last_dim_id别名的id列,但是您当然可以显示该表中的其他列。
SELECT
main.simulation_id,
first_step,
first_sim.id as first_sim_id,
last_step,
last_sim.id as last_sim_id
FROM
(SELECT
simulation_id,
MIN(step) AS first_step,
MAX(step) AS last_step,
COUNT(id) AS num_steps
FROM
simulations_ts
GROUP BY
simulation_id) as main
JOIN simulations_ts first_sim
ON main.simulation_id = first_sim.simulation_id
AND main.first_step = first_sim.step
JOIN simulations_ts last_sim
ON main.simulation_id = first_sim.simulation_id
AND main.last_step = last_sim.step
I start with your original query, then simply join it back to simulations_ts
on the sim id and min/max step.
我从您最初的查询开始,然后简单地将它连接回sim id和最小/max步骤上的simulations_ts。