I've been looking for hours to find this and while there are a lot of variations I can't quite seem to close the loop on my specific requirement....every time I think I've got it it slips away from me :)
我一直在寻找几个小时来找到这个,虽然有很多变化我似乎无法完全按照我的具体要求关闭循环....每次我想我已经得到它它从我身上溜走了:)
so here it is;
所以这里是;
I've imported a bunch of records into a table with ultimately unique rows but with some duplicate data in certain columns.
I want to split out the records into two tables - one table with DISTINCT or UNIQUE 'code' records that have the latest timestamp within their 'code' group, and one table with the rest of the records
我已经将一堆记录导入到一个表中,该表最终具有唯一的行,但在某些列中有一些重复的数据。我想将记录拆分为两个表 - 一个表带有DISTINCT或UNIQUE'代码'记录,其中包含“代码”组中的最新时间戳,另一个表带有其余记录
[EDIT - Sincerest apologies, I have to rephrase as I don't think I articulated very clearly the first time - in fact I got it very wrong...sorry!]
[编辑 - 真诚的道歉,我必须改写,因为我不认为我第一次非常清楚地表达 - 实际上我弄错了...抱歉!]
I have multiple columns with unique ROWS only - (i.e. each column has duplicate data, but the combination of all columns in a specific row is unique - obviously excluding the primary key)
我有多个列只有唯一的ROWS - (即每列都有重复数据,但特定行中所有列的组合是唯一的 - 显然不包括主键)
What I need is the row that contains the latest timestamp for a code within a specific area_id.
In the example below I've excluded the other columns as I view these three as key;
我需要的是包含特定area_id内代码的最新时间戳的行。在下面的示例中,我排除了其他列,因为我将这三个列视为关键;
TABLE#1
code area_id timestamp
1 2 2010-02-31 00:00:00
2 2 2012-01-31 00:00:00
2 2 2011-02-31 00:00:00
1 5 2010-02-31 00:00:00
2 5 2010-02-31 00:00:00
1 2 2011-01-31 00:00:00
1 5 2012-01-31 00:00:00
So the structure of the answer I'm trying to phrase is;
所以我想说的答案的结构是;
"For the combination of code 1 & area_id 2, the latest timestamp is 2011-01-31 00:00:00" - return that row.
“对于代码1和area_id 2的组合,最新时间戳是2011-01-31 00:00:00” - 返回该行。
Repeat for each combination of code and area_id.
对代码和area_id的每个组合重复上述步骤。
so;
RESULT
code area_id timestamp
1 2 2011-01-31 00:00:00
2 2 2012-01-31 00:00:00
1 5 2012-01-31 00:00:00
2 5 2010-02-31 00:00:00
As I mentioned, there are quite a few other columns that need to come with the data when I split the rows out, but I think I can worry about that later - first step is to get the data in a result set without having mysql/workbench time out on me!
正如我所提到的,当我将行拆分出来时,还有很多其他列需要附带数据,但我想我以后可以担心 - 第一步是在没有mysql /的情况下获取结果集中的数据工作台时间在我身上!
JS
1 个解决方案
#1
0
And this is for table2
这是针对table2
INSERT INTO Table2
SELECT *
FROM Table1
WHERE (code, timestamp) NOT IN (SELECT code, MAX(timestamp)
FROM Table1
GROUP BY code)
And then this to delete records from Table1:
然后这将删除Table1中的记录:
DELETE FROM Table1
WHERE (code, timestamp) NOT IN (SELECT * FROM (SELECT code, MAX(timestamp)
FROM yourtable
GROUP BY code) s)
Please see fiddle here (I changed 31th of February to 28th, and 31th of April to 30th, I think it was a typo).
请看这里的小提琴(我改变了2月31日到28日,4月31日到30日,我认为这是一个错字)。
Edit
Since these queries are too slow to execute, you can try the JOIN version:
由于这些查询执行速度太慢,您可以尝试JOIN版本:
INSERT INTO Table2
SELECT t1.*
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
FROM Table1
GROUP BY code) t2
ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
t2.code IS NULL;
DELETE t1
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
FROM Table1
GROUP BY code) t2
ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
t2.code IS NULL;
Please see fiddle here.
请看这里的小提琴。
Also, you can try adding one of more of the following indexes:
此外,您可以尝试添加以下一个或多个索引:
CREATE INDEX idx1 ON Table1 (code)
CREATE INDEX idx2 ON Table1 (timestamp)
CREATE INDEX idx3 ON Table1 (code,timestamp)
#1
0
And this is for table2
这是针对table2
INSERT INTO Table2
SELECT *
FROM Table1
WHERE (code, timestamp) NOT IN (SELECT code, MAX(timestamp)
FROM Table1
GROUP BY code)
And then this to delete records from Table1:
然后这将删除Table1中的记录:
DELETE FROM Table1
WHERE (code, timestamp) NOT IN (SELECT * FROM (SELECT code, MAX(timestamp)
FROM yourtable
GROUP BY code) s)
Please see fiddle here (I changed 31th of February to 28th, and 31th of April to 30th, I think it was a typo).
请看这里的小提琴(我改变了2月31日到28日,4月31日到30日,我认为这是一个错字)。
Edit
Since these queries are too slow to execute, you can try the JOIN version:
由于这些查询执行速度太慢,您可以尝试JOIN版本:
INSERT INTO Table2
SELECT t1.*
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
FROM Table1
GROUP BY code) t2
ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
t2.code IS NULL;
DELETE t1
FROM Table1 t1 LEFT JOIN (SELECT code, MAX(timestamp) max_timestamp
FROM Table1
GROUP BY code) t2
ON t1.code=t2.code and t1.timestamp=t2.max_timestamp
WHERE
t2.code IS NULL;
Please see fiddle here.
请看这里的小提琴。
Also, you can try adding one of more of the following indexes:
此外,您可以尝试添加以下一个或多个索引:
CREATE INDEX idx1 ON Table1 (code)
CREATE INDEX idx2 ON Table1 (timestamp)
CREATE INDEX idx3 ON Table1 (code,timestamp)