如何删除两个完全相同的行中的一个?

时间:2021-12-01 09:17:40

I am cleaning out a database table without a primary key (I know, I know, what were they thinking?). I cannot add a primary key, because there is a duplicate in the column that would become the key. The duplicate value comes from one of two rows that are in all respects identical. I can't delete the row via a GUI (in this case MySQL Workbench, but I'm looking for a database agnostic approach) because it refuses to perform tasks on tables without primary keys (or at least a UQ NN column), and I cannot add a primary key, because there is a duplicate in the column that would become the key. The duplicate value comes from one...

我正在清理没有主键的数据库表(我知道,我知道,他们在想什么?)。我无法添加主键,因为列中的副本将成为键。重复值来自两行中的一行,这两行在所有方面都相同。我无法通过GUI删除该行(在本例中是MySQL Workbench,但我正在寻找一种与数据库无关的方法),因为它拒绝在没有主键(或至少是UQ NN列)的表上执行任务,并且我无法添加主键,因为列中的副本将成为键。重复值来自一个......

How can I delete one of the twins?

如何删除其中一对双胞胎?

13 个解决方案

#1


18  

One option to solve your problem is to create a new table with the same schema, and then do:

解决问题的一个选择是创建具有相同模式的新表,然后执行以下操作:

INSERT INTO new_table (SELECT DISTINCT * FROM old_table)

and then just rename the tables.

然后只需重命名表格。

You will of course need approximately the same amount of space as your table requires spare on your disk to do this!

当然,您需要大约相同的空间,因为您的磁盘需要备用磁盘来执行此操作!

It's not efficient, but it's incredibly simple.

它效率不高,但非常简单。

#2


43  

SET ROWCOUNT 1
DELETE FROM [table] WHERE ....
SET ROWCOUNT 0

This will only delete one of the two identical rows

这只会删除两个相同行中的一个

#3


19  

Note that MySQL has its own extension of DELETE, which is DELETE ... LIMIT, which works in the usual way you'd expect from LIMIT: http://dev.mysql.com/doc/refman/5.0/en/delete.html

请注意,MySQL有自己的DELETE扩展,它是DELETE ... LIMIT,它以您期望的LIMIT工作方式工作:http://dev.mysql.com/doc/refman/5.0/en/delete html的

The MySQL-specific LIMIT row_count option to DELETE tells the server the maximum number of rows to be deleted before control is returned to the client. This can be used to ensure that a given DELETE statement does not take too much time. You can simply repeat the DELETE statement until the number of affected rows is less than the LIMIT value.

DELETE的特定于MySQL的LIMIT row_count选项告诉服务器在将控制权返回给客户端之前要删除的最大行数。这可用于确保给定的DELETE语句不会占用太多时间。您可以简单地重复DELETE语句,直到受影响的行数小于LIMIT值。

Therefore, you could use DELETE FROM some_table WHERE x="y" AND foo="bar" LIMIT 1; note that there isn't a simple way to say "delete everything except one" - just keep checking whether you still have row duplicates.

因此,您可以使用DELETE FROM some_table WHERE x =“y”AND foo =“bar”LIMIT 1;请注意,没有一种简单的方法可以说“删除除一个以外的所有内容” - 只需继续检查是否仍有行重复项。

#4


9  

For PostgreSQL you can do this:

对于PostgreSQL,你可以这样做:

DELETE FROM tablename
WHERE id IN (SELECT id
          FROM (SELECT id, ROW_NUMBER() 
               OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
               FROM tablename) t
          WHERE t.rnum > 1);

column1, column2, column3 would the column set which have duplicate values.

column1,column2,column3将列集具有重复值。

Reference here.

参考这里。

#5


4  

This can be accomplished using a CTE and the ROW_NUMBER() function, as below:

这可以使用CTE和ROW_NUMBER()函数来完成,如下所示:

/* Sample Data */
    CREATE TABLE #dupes (ID INT, DWCreated DATETIME2(3))

    INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2015-08-03 01:02:03.456'
    INSERT INTO #dupes (ID, DWCreated) SELECT 2, '2014-08-03 01:02:03.456'
    INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2013-08-03 01:02:03.456'

/* Check sample data - returns three rows, with two rows for ID#1 */
    SELECT * FROM #dupes 

/* CTE to give each row that shares an ID a unique number */
    ;WITH toDelete AS
      (
        SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DWCreated) AS RN
        FROM #dupes 
      )

  /* Delete any row that is not the first instance of an ID */
    DELETE FROM toDelete WHERE RN > 1

/* Check the results: ID is now unique */
    SELECT * FROM #dupes

/* Clean up */
    DROP TABLE #dupes

Having a column to ORDER BY is handy, but not necessary unless you have a preference for which of the rows to delete. This will also handle all instances of duplicate records, rather than forcing you to delete one row at a time.

将列添加到ORDER BY非常方便,但除非您优先选择要删除的行,否则不需要。这也将处理重复记录的所有实例,而不是强制您一次删除一行。

#6


3  

Tried LIMIT 1? This will only delete 1 of the rows that match your DELETE query

试过LIMIT 1?这只会删除与DELETE查询匹配的1行

DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;

#7


2  

delete top(1) works on Microsoft SQL Server (T-SQL).

delete top(1)适用于Microsoft SQL Server(T-SQL)。

#8


1  

In my case I could get the GUI to give me a string of values of the row in question (alternatively, I could have done this by hand). On the suggestion of a colleague, in whose debt I remain, I used this to create an INSERT statement:

在我的情况下,我可以让GUI给我一行相关行的值(或者,我可以手动完成)。根据一位同事的建议,我留下了他的债务,我用这个来创建INSERT声明:

INSERT
'ID1219243408800307444663', '2004-01-20 10:20:55', 'INFORMATION', 'admin' (...)
INTO some_table;

I tested the insert statement, so that I now had triplets. Finally, I ran a simple DELETE to remove all of them...

我测试了insert语句,所以我现在有了三元组。最后,我运行了一个简单的DELETE来删除所有这些...

DELETE FROM some_table WHERE logid = 'ID1219243408800307444663';

followed by the INSERT one more time, leaving me with a single row, and the bright possibilities of a primary key.

然后再插入INSERT,留下一行,以及主键的明亮可能性。

#9


1  

in case you can add a column like

以防你可以添加像

  ALTER TABLE yourtable ADD IDCOLUMN bigint NOT NULL IDENTITY (1, 1)

do so.

这样做。

then count rows grouping by your problem column where count >1 , this will identify your twins (or triplets or whatever).

然后计算您的问题列中的行分组,其中count> 1,这将识别您的双胞胎(或三胞胎或其他)。

then select your problem column where its content equals the identified content of above and check the IDs in IDCOLUMN.

然后选择您的问题列,其内容等于上面标识的内容,并检查IDCOLUMN中的ID。

delete from your table where IDCOLUMN equals one of those IDs.

从表中删除IDCOLUMN等于其中一个ID。

#10


1  

You could use a max, which was relevant in my case.

您可以使用max,这与我的情况相关。

DELETE FROM [table] where id in 
(select max(id) from [table] group by id, col2, col3 having count(id) > 1)

Be sure to test your results first and having a limiting condition in your "having" clausule. With such a huge delete query you might want to update your database first.

一定要先测试你的结果,并在你的“有”的句子中有一个限制条件。有了这么大的删除查询,您可能需要先更新数据库。

#11


1  

This works for PostgreSQL

这适用于PostgreSQL

DELETE FROM tablename WHERE id = 123 AND ctid IN (SELECT ctid FROM tablename WHERE id = 123 LIMIT 1)

#12


0  

I added a Guid column to the table and set it to generate a new id for each row. Then I could delete the rows using a GUI.

我在表中添加了一个Guid列,并将其设置为每行生成一个新的id。然后我可以使用GUI删除行。

#13


0  

In PostgreSQL there is an implicit column called ctid. See the wiki. So you are free to use the following:

在PostgreSQL中有一个名为ctid的隐式列。查看维基。所以你可以*使用以下内容:

WITH cte1 as(
    SELECT unique_column, max( ctid ) as max_ctid
    FROM table_1
    GROUP BY unique_column
    HAVING count(*) > 1
), cte2 as(
    SELECT t.ctid as target_ctid
    FROM table_1 t
    JOIN cte1 USING( unique_column )
    WHERE t.ctid != max_ctid
)
DELETE FROM table_1
WHERE ctid IN( SELECT target_ctid FROM cte2 )

I'm not sure how safe it is to use this when there is a possibility of concurrent updates. So one may find it sensible to make a LOCK TABLE table_1 IN ACCESS EXCLUSIVE MODE; before actually doing the cleanup.

当有可能同时更新时,我不确定使用它是多么安全。因此,人们可能会发现将LOCK TABLE table_1设置为ACCESS EXCLUSIVE MODE是明智的;在实际进行清理之前。

#1


18  

One option to solve your problem is to create a new table with the same schema, and then do:

解决问题的一个选择是创建具有相同模式的新表,然后执行以下操作:

INSERT INTO new_table (SELECT DISTINCT * FROM old_table)

and then just rename the tables.

然后只需重命名表格。

You will of course need approximately the same amount of space as your table requires spare on your disk to do this!

当然,您需要大约相同的空间,因为您的磁盘需要备用磁盘来执行此操作!

It's not efficient, but it's incredibly simple.

它效率不高,但非常简单。

#2


43  

SET ROWCOUNT 1
DELETE FROM [table] WHERE ....
SET ROWCOUNT 0

This will only delete one of the two identical rows

这只会删除两个相同行中的一个

#3


19  

Note that MySQL has its own extension of DELETE, which is DELETE ... LIMIT, which works in the usual way you'd expect from LIMIT: http://dev.mysql.com/doc/refman/5.0/en/delete.html

请注意,MySQL有自己的DELETE扩展,它是DELETE ... LIMIT,它以您期望的LIMIT工作方式工作:http://dev.mysql.com/doc/refman/5.0/en/delete html的

The MySQL-specific LIMIT row_count option to DELETE tells the server the maximum number of rows to be deleted before control is returned to the client. This can be used to ensure that a given DELETE statement does not take too much time. You can simply repeat the DELETE statement until the number of affected rows is less than the LIMIT value.

DELETE的特定于MySQL的LIMIT row_count选项告诉服务器在将控制权返回给客户端之前要删除的最大行数。这可用于确保给定的DELETE语句不会占用太多时间。您可以简单地重复DELETE语句,直到受影响的行数小于LIMIT值。

Therefore, you could use DELETE FROM some_table WHERE x="y" AND foo="bar" LIMIT 1; note that there isn't a simple way to say "delete everything except one" - just keep checking whether you still have row duplicates.

因此,您可以使用DELETE FROM some_table WHERE x =“y”AND foo =“bar”LIMIT 1;请注意,没有一种简单的方法可以说“删除除一个以外的所有内容” - 只需继续检查是否仍有行重复项。

#4


9  

For PostgreSQL you can do this:

对于PostgreSQL,你可以这样做:

DELETE FROM tablename
WHERE id IN (SELECT id
          FROM (SELECT id, ROW_NUMBER() 
               OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
               FROM tablename) t
          WHERE t.rnum > 1);

column1, column2, column3 would the column set which have duplicate values.

column1,column2,column3将列集具有重复值。

Reference here.

参考这里。

#5


4  

This can be accomplished using a CTE and the ROW_NUMBER() function, as below:

这可以使用CTE和ROW_NUMBER()函数来完成,如下所示:

/* Sample Data */
    CREATE TABLE #dupes (ID INT, DWCreated DATETIME2(3))

    INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2015-08-03 01:02:03.456'
    INSERT INTO #dupes (ID, DWCreated) SELECT 2, '2014-08-03 01:02:03.456'
    INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2013-08-03 01:02:03.456'

/* Check sample data - returns three rows, with two rows for ID#1 */
    SELECT * FROM #dupes 

/* CTE to give each row that shares an ID a unique number */
    ;WITH toDelete AS
      (
        SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DWCreated) AS RN
        FROM #dupes 
      )

  /* Delete any row that is not the first instance of an ID */
    DELETE FROM toDelete WHERE RN > 1

/* Check the results: ID is now unique */
    SELECT * FROM #dupes

/* Clean up */
    DROP TABLE #dupes

Having a column to ORDER BY is handy, but not necessary unless you have a preference for which of the rows to delete. This will also handle all instances of duplicate records, rather than forcing you to delete one row at a time.

将列添加到ORDER BY非常方便,但除非您优先选择要删除的行,否则不需要。这也将处理重复记录的所有实例,而不是强制您一次删除一行。

#6


3  

Tried LIMIT 1? This will only delete 1 of the rows that match your DELETE query

试过LIMIT 1?这只会删除与DELETE查询匹配的1行

DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;

#7


2  

delete top(1) works on Microsoft SQL Server (T-SQL).

delete top(1)适用于Microsoft SQL Server(T-SQL)。

#8


1  

In my case I could get the GUI to give me a string of values of the row in question (alternatively, I could have done this by hand). On the suggestion of a colleague, in whose debt I remain, I used this to create an INSERT statement:

在我的情况下,我可以让GUI给我一行相关行的值(或者,我可以手动完成)。根据一位同事的建议,我留下了他的债务,我用这个来创建INSERT声明:

INSERT
'ID1219243408800307444663', '2004-01-20 10:20:55', 'INFORMATION', 'admin' (...)
INTO some_table;

I tested the insert statement, so that I now had triplets. Finally, I ran a simple DELETE to remove all of them...

我测试了insert语句,所以我现在有了三元组。最后,我运行了一个简单的DELETE来删除所有这些...

DELETE FROM some_table WHERE logid = 'ID1219243408800307444663';

followed by the INSERT one more time, leaving me with a single row, and the bright possibilities of a primary key.

然后再插入INSERT,留下一行,以及主键的明亮可能性。

#9


1  

in case you can add a column like

以防你可以添加像

  ALTER TABLE yourtable ADD IDCOLUMN bigint NOT NULL IDENTITY (1, 1)

do so.

这样做。

then count rows grouping by your problem column where count >1 , this will identify your twins (or triplets or whatever).

然后计算您的问题列中的行分组,其中count> 1,这将识别您的双胞胎(或三胞胎或其他)。

then select your problem column where its content equals the identified content of above and check the IDs in IDCOLUMN.

然后选择您的问题列,其内容等于上面标识的内容,并检查IDCOLUMN中的ID。

delete from your table where IDCOLUMN equals one of those IDs.

从表中删除IDCOLUMN等于其中一个ID。

#10


1  

You could use a max, which was relevant in my case.

您可以使用max,这与我的情况相关。

DELETE FROM [table] where id in 
(select max(id) from [table] group by id, col2, col3 having count(id) > 1)

Be sure to test your results first and having a limiting condition in your "having" clausule. With such a huge delete query you might want to update your database first.

一定要先测试你的结果,并在你的“有”的句子中有一个限制条件。有了这么大的删除查询,您可能需要先更新数据库。

#11


1  

This works for PostgreSQL

这适用于PostgreSQL

DELETE FROM tablename WHERE id = 123 AND ctid IN (SELECT ctid FROM tablename WHERE id = 123 LIMIT 1)

#12


0  

I added a Guid column to the table and set it to generate a new id for each row. Then I could delete the rows using a GUI.

我在表中添加了一个Guid列,并将其设置为每行生成一个新的id。然后我可以使用GUI删除行。

#13


0  

In PostgreSQL there is an implicit column called ctid. See the wiki. So you are free to use the following:

在PostgreSQL中有一个名为ctid的隐式列。查看维基。所以你可以*使用以下内容:

WITH cte1 as(
    SELECT unique_column, max( ctid ) as max_ctid
    FROM table_1
    GROUP BY unique_column
    HAVING count(*) > 1
), cte2 as(
    SELECT t.ctid as target_ctid
    FROM table_1 t
    JOIN cte1 USING( unique_column )
    WHERE t.ctid != max_ctid
)
DELETE FROM table_1
WHERE ctid IN( SELECT target_ctid FROM cte2 )

I'm not sure how safe it is to use this when there is a possibility of concurrent updates. So one may find it sensible to make a LOCK TABLE table_1 IN ACCESS EXCLUSIVE MODE; before actually doing the cleanup.

当有可能同时更新时,我不确定使用它是多么安全。因此,人们可能会发现将LOCK TABLE table_1设置为ACCESS EXCLUSIVE MODE是明智的;在实际进行清理之前。