MySQL从每个组中选择最大记录并插入另一个表中

时间:2022-08-08 12:45:08

There are 4 columns in table A, id, name, create_time and content.

表A,id,name,create_time和content中有4列。

create table A
(
    id int primary key,
    name varchar(20),
    create_time datetime,
    content varchar(4000)
);
create table B like A;

I want to select max create_time records in the same name, and insert into another table B.

我想以相同的名称选择max create_time记录,并插入另一个表B.

Execute sql as follow, but the time consumption is unacceptable.

执行sql如下,但时间消耗是不可接受的。

insert into B
select A.*
from A,
    (select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
  and A.create_time = tmp.create_time;

A table has 1000W rows and 10GB, execute sql spend 200s.

一个表有1000W行和10GB,执行sql花费200s。

Is there any way to do this job faster, or change which parameters in MySQL Server to run faster.

有没有办法更快地完成这项工作,或更改MySQL服务器中哪些参数运行得更快。

p: table A can be any type, paration table or some else.

p:表A可以是任何类型,paration表或其他。

2 个解决方案

#1


1  

First be sure you have proper index on A (name, create_time) and B (name, create_time) then try using explicit join and on condtion

首先要确保你对A(name,create_time)和B(name,create_time)有适当的索引,然后尝试使用显式连接和condtion

insert into B 
select A.* 
from A 
inner join ( 
    select name, max(create_time) create_time 
    from B 
    group by name) tmp on  ( A.name = tmp.name and A.create_time = tmp.create_time)

#2


1  

The query you need is:

您需要的查询是:

INSERT INTO B
SELECT m.*
FROM A m                                      # m from "max"
LEFT JOIN A l                                 # l from "later"
    ON m.name = l.name                        # the same name
        AND m.create_time < l.create_time     # "l" was created later than "m"
WHERE l.name IS NULL                          # there is no "later"

How it works:

It joins A aliased as m (from "max") against itself aliased as l (from "later" than "max"). The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. Each row from m is combined with all rows from l that have the same name (m.name = l.name) and are created after the row from m (m.create_time < l.create_time). The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time).

它将A别名作为m(来自“max”)与其自身别名为l(来自“later”而不是“max”)。 LEFT JOIN确保在没有WHERE子句的情况下,m中的所有行都出现在结果集中。 m中的每一行与l中具有相同名称(m.name = l.name)的所有行组合,并在m(m.create_time )的行之后创建。>

Discussion

If there are more than one rows in A that have the same name and creation_time, the query returns all of them. In order to keep only one of them and additional condition is required.

如果A中有多个具有相同名称和creation_time的行,则查询将返回所有这些行。为了只保留其中一个和附加条件是必需的。

Add:

OR (m.create_time = l.create_time AND m.id < l.id)

to the ON clause (right before WHERE). Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table).

到ON子句(在WHERE之前)。调整/替换条件的m.id 部分以满足您的需要(此版本支持在表格中先前插入的行)。

Make sure the table A has indexes on the columns used by the query (name and create_time). Otherwise the performance improvement compared with your original query is not significant.

确保表A在查询使用的列(name和create_time)上有索引。否则,与原始查询相比,性能提升并不重要。

#1


1  

First be sure you have proper index on A (name, create_time) and B (name, create_time) then try using explicit join and on condtion

首先要确保你对A(name,create_time)和B(name,create_time)有适当的索引,然后尝试使用显式连接和condtion

insert into B 
select A.* 
from A 
inner join ( 
    select name, max(create_time) create_time 
    from B 
    group by name) tmp on  ( A.name = tmp.name and A.create_time = tmp.create_time)

#2


1  

The query you need is:

您需要的查询是:

INSERT INTO B
SELECT m.*
FROM A m                                      # m from "max"
LEFT JOIN A l                                 # l from "later"
    ON m.name = l.name                        # the same name
        AND m.create_time < l.create_time     # "l" was created later than "m"
WHERE l.name IS NULL                          # there is no "later"

How it works:

It joins A aliased as m (from "max") against itself aliased as l (from "later" than "max"). The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. Each row from m is combined with all rows from l that have the same name (m.name = l.name) and are created after the row from m (m.create_time < l.create_time). The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time).

它将A别名作为m(来自“max”)与其自身别名为l(来自“later”而不是“max”)。 LEFT JOIN确保在没有WHERE子句的情况下,m中的所有行都出现在结果集中。 m中的每一行与l中具有相同名称(m.name = l.name)的所有行组合,并在m(m.create_time )的行之后创建。>

Discussion

If there are more than one rows in A that have the same name and creation_time, the query returns all of them. In order to keep only one of them and additional condition is required.

如果A中有多个具有相同名称和creation_time的行,则查询将返回所有这些行。为了只保留其中一个和附加条件是必需的。

Add:

OR (m.create_time = l.create_time AND m.id < l.id)

to the ON clause (right before WHERE). Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table).

到ON子句(在WHERE之前)。调整/替换条件的m.id 部分以满足您的需要(此版本支持在表格中先前插入的行)。

Make sure the table A has indexes on the columns used by the query (name and create_time). Otherwise the performance improvement compared with your original query is not significant.

确保表A在查询使用的列(name和create_time)上有索引。否则,与原始查询相比,性能提升并不重要。