除了多个重复的最新记录外,MYSQL分割出记录

时间:2021-02-22 04:18:51

I've been looking for hours to find this and while there are a lot of variations I can't quite seem to close the loop on my specific requirement....every time I think I've got it it slips away from me :)

我一直在寻找几个小时来找到这个,虽然有很多变化我似乎无法完全按照我的具体要求关闭循环....每次我想我已经得到它它从我身上溜走了:)

so here it is;

所以这里是;

I've imported a bunch of records into a table with ultimately unique rows but with some duplicate data in certain columns. I want to split out the records into two tables - one table with DISTINCT or UNIQUE 'code' records that have the latest timestamp within their 'code' group, and one table with the rest of the records

我已经将一堆记录导入到一个表中,该表最终具有唯一的行,但在某些列中有一些重复的数据。我想将记录拆分为两个表 - 一个表带有DISTINCT或UNIQUE'代码'记录,其中包含“代码”组中的最新时间戳,另一个表带有其余记录

[EDIT - Sincerest apologies, I have to rephrase as I don't think I articulated very clearly the first time - in fact I got it very wrong...sorry!]

[编辑 - 真诚的道歉,我必须改写,因为我不认为我第一次非常清楚地表达 - 实际上我弄错了...抱歉!]

I have multiple columns with unique ROWS only - (i.e. each column has duplicate data, but the combination of all columns in a specific row is unique - obviously excluding the primary key)

我有多个列只有唯一的ROWS - (即每列都有重复数据,但特定行中所有列的组合是唯一的 - 显然不包括主键)

What I need is the row that contains the latest timestamp for a code within a specific area_id.
In the example below I've excluded the other columns as I view these three as key;

我需要的是包含特定area_id内代码的最新时间戳的行。在下面的示例中,我排除了其他列,因为我将这三个列视为关键;

TABLE#1
        code    area_id   timestamp    
         1        2      2010-02-31 00:00:00
         2        2      2012-01-31 00:00:00
         2        2      2011-02-31 00:00:00
         1        5      2010-02-31 00:00:00
         2        5      2010-02-31 00:00:00
         1        2      2011-01-31 00:00:00
         1        5      2012-01-31 00:00:00

So the structure of the answer I'm trying to phrase is;

所以我想说的答案的结构是;

"For the combination of code 1 & area_id 2, the latest timestamp is 2011-01-31 00:00:00" - return that row.

“对于代码1和area_id 2的组合,最新时间戳是2011-01-31 00:00:00” - 返回该行。

Repeat for each combination of code and area_id.

对代码和area_id的每个组合重复上述步骤。

so;

RESULT
        code    area_id   timestamp    
         1        2      2011-01-31 00:00:00
         2        2      2012-01-31 00:00:00
         1        5      2012-01-31 00:00:00
         2        5      2010-02-31 00:00:00

As I mentioned, there are quite a few other columns that need to come with the data when I split the rows out, but I think I can worry about that later - first step is to get the data in a result set without having mysql/workbench time out on me!

正如我所提到的,当我将行拆分出来时,还有很多其他列需要附带数据,但我想我以后可以担心 - 第一步是在没有mysql /的情况下获取结果集中的数据工作台时间在我身上!

JS

1 个解决方案

#1


0  

And this is for table2

这是针对table2

INSERT INTO Table2
SELECT *
FROM Table1
WHERE (code, timestamp) NOT IN (SELECT code, MAX(timestamp)
                                FROM Table1
                                GROUP BY code)

And then this to delete records from Table1:

然后这将删除Table1中的记录:

DELETE FROM Table1
WHERE (code, timestamp) NOT IN (SELECT * FROM (SELECT code, MAX(timestamp)
                                FROM yourtable
                                GROUP BY code) s)

Please see fiddle here (I changed 31th of February to 28th, and 31th of April to 30th, I think it was a typo).

请看这里的小提琴(我改变了2月31日到28日,4月31日到30日,我认为这是一个错字)。

Edit

Since these queries are too slow to execute, you can try the JOIN version:

由于这些查询执行速度太慢,您可以尝试JOIN版本:

INSERT INTO Table2
SELECT t1.*
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
                          FROM Table1
                          GROUP BY code) t2
     ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
  t2.code IS NULL;

DELETE t1
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
                           FROM Table1
                           GROUP BY code) t2
     ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
  t2.code IS NULL;

Please see fiddle here.

请看这里的小提琴。

Also, you can try adding one of more of the following indexes:

此外,您可以尝试添加以下一个或多个索引:

CREATE INDEX idx1 ON Table1 (code)
CREATE INDEX idx2 ON Table1 (timestamp)
CREATE INDEX idx3 ON Table1 (code,timestamp)

#1


0  

And this is for table2

这是针对table2

INSERT INTO Table2
SELECT *
FROM Table1
WHERE (code, timestamp) NOT IN (SELECT code, MAX(timestamp)
                                FROM Table1
                                GROUP BY code)

And then this to delete records from Table1:

然后这将删除Table1中的记录:

DELETE FROM Table1
WHERE (code, timestamp) NOT IN (SELECT * FROM (SELECT code, MAX(timestamp)
                                FROM yourtable
                                GROUP BY code) s)

Please see fiddle here (I changed 31th of February to 28th, and 31th of April to 30th, I think it was a typo).

请看这里的小提琴(我改变了2月31日到28日,4月31日到30日,我认为这是一个错字)。

Edit

Since these queries are too slow to execute, you can try the JOIN version:

由于这些查询执行速度太慢,您可以尝试JOIN版本:

INSERT INTO Table2
SELECT t1.*
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
                          FROM Table1
                          GROUP BY code) t2
     ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
  t2.code IS NULL;

DELETE t1
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
                           FROM Table1
                           GROUP BY code) t2
     ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
  t2.code IS NULL;

Please see fiddle here.

请看这里的小提琴。

Also, you can try adding one of more of the following indexes:

此外,您可以尝试添加以下一个或多个索引:

CREATE INDEX idx1 ON Table1 (code)
CREATE INDEX idx2 ON Table1 (timestamp)
CREATE INDEX idx3 ON Table1 (code,timestamp)