删除所有表中不再用于任何FK关系的所有行

In order to trim a production database for loading in a test system, we've deleted rows in many tables. This now left us with cruft in a couple of tables, namely rows which aren't used in any FK relation anymore. What I want to achieve is like the garbage collection in Java.

为了修剪生产数据库以便在测试系统中加载，我们删除了许多表中的行。现在这让我们陷入了几个表格，即不再用于任何FK关系的行。我想要实现的就像Java中的垃圾收集。

Or to put it another way: If I have M tables in the database. N of them (i.e. most but not all) have foreign key relations. I've deleted a couple of high level rows (i.e. which only have outgoing FK relations) via SQL. This leaves the rows in the related tables alone.

或者换句话说：如果我在数据库中有M个表。它们中的N个（即大多数但不是全部）具有外键关系。我已经通过SQL删除了几个高级行（即只有传出的FK关系）。这样就只在相关表中留下了行。

Does someone have a SQL stored procedure or a Java program which finds the N tables and then follows all the FK relations to delete rows which are no longer needed.

有人有SQL存储过程或Java程序找到N个表，然后遵循所有FK关系来删除不再需要的行。

If finding the N tables to too complex, I could probably provide the script a list of tables to scan or, preferably, a negative list of tables to ignore.

如果发现N表太复杂，我可能会为脚本提供要扫描的表列表，或者最好是要忽略的表的负列表。

Also note:

另请注意：

We have some tables which are used in many (>50) FK relations, i.e. A, B, C, ... all use rows in Z.
我们有一些表用于许多（> 50）FK关系，即A，B，C ......都使用Z中的行。
All FK relations use the technical PK column which is always a single column.
所有FK关系都使用技术PK列，它始终是单列。

3 个解决方案

#1

Even simple stored procedures are usually a little ugly, and this was an interesting exercise in pushing stored procedures well beyond the point where it's easy to take them.

即使是简单的存储过程通常也有点难看，这是一个很有趣的练习，可以将存储过程推到很容易接受的程度。

To use the code below, launch your MySQL shell, use your target database, paste the big block of stored procedures from below, and then execute

要使用下面的代码，启动MySQL shell，使用目标数据库，从下面粘贴大块存储过程，然后执行

CALL delete_orphans_from_all_tables();

to delete all orphaned rows from all tables in your database.

从数据库中的所有表中删除所有孤立的行。

To provide a zoomed-out overview:

要提供缩小的概述：

delete_orphans_from_all_tables is the entry point. All other sprocs are prefixed with dofat to make clear that they relate to delete_orphans_from_all_tables and make it less noisy to have them kicking around.
delete_orphans_from_all_tables是入口点。所有其他的sprocs都以dofat为前缀，以明确它们与delete_orphans_from_all_tables有关，并使它们更少噪音让它们四处乱窜。
delete_orphans_from_all_tables works by calling dofat_delete_orphans_from_all_tables_iter repeatedly until there are no more rows to delete.
delete_orphans_from_all_tables通过重复调用dofat_delete_orphans_from_all_tables_iter来工作，直到没有更多行要删除。
dofat_delete_orphans_from_all_tables_iter works by looping over all the tables that are targets of foreign key constraints, and for each table deleting all rows that currently aren't referenced from anywhere.
dofat_delete_orphans_from_all_tables_iter通过循环遍历所有作为外键约束的目标的表来工作，并为每个表删除当前未从任何地方引用的所有行。

Here's the code:

这是代码：

delimiter //
CREATE PROCEDURE dofat_store_tables_targeted_by_foreign_keys ()
BEGIN
    -- This procedure creates a temporary table called TargetTableNames
    -- containing the names of all tables that are the target of any foreign
    -- key relation.

    SET @db_name = DATABASE();

    DROP TEMPORARY TABLE IF EXISTS TargetTableNames;
    CREATE TEMPORARY TABLE TargetTableNames (
        table_name VARCHAR(255) NOT NULL
    );

    PREPARE stmt FROM 
   'INSERT INTO TargetTableNames(table_name)
    SELECT DISTINCT referenced_table_name
    FROM INFORMATION_SCHEMA.key_column_usage
    WHERE referenced_table_schema = ?';

    EXECUTE stmt USING @db_name;
END//

CREATE PROCEDURE dofat_deletion_clause_for_table(
    IN table_name VARCHAR(255), OUT result text
)
DETERMINISTIC
BEGIN
    -- Given a table Foo, where Foo.col1 is referenced by Bar.col1, and
    -- Foo.col2 is referenced by Qwe.col3, this will return a string like:
    --
    -- NOT (Foo.col1 IN (SELECT col1 FROM BAR) <=> 1) AND
    -- NOT (Foo.col2 IN (SELECT col3 FROM Qwe) <=> 1)
    --
    -- This is used by dofat_delete_orphans_from_table to target only orphaned
    -- rows.
    --
    -- The odd-looking `NOT (x IN y <=> 1)` construct is used in favour of the
    -- more obvious (x NOT IN y) construct to handle nulls properly; note that
    -- (x NOT IN y) will evaluate to NULL if either x is NULL or if x is not in
    -- y and *any* value in y is NULL.

    SET @db_name = DATABASE();
    SET @table_name = table_name;

    PREPARE stmt FROM 
   'SELECT GROUP_CONCAT(
        CONCAT(
            \'NOT (\', @table_name, \'.\', referenced_column_name, \' IN (\',
            \'SELECT \', column_name, \' FROM \', table_name, \')\',
            \' <=> 1)\'
        )
        SEPARATOR \' AND \'
    ) INTO @result
    FROM INFORMATION_SCHEMA.key_column_usage 
    WHERE
        referenced_table_schema = ?
        AND referenced_table_name = ?';
    EXECUTE stmt USING @db_name, @table_name;

    SET result = @result;
END//

CREATE PROCEDURE dofat_delete_orphans_from_table (table_name varchar(255))
BEGIN
    -- Takes as an argument the name of a table that is the target of at least
    -- one foreign key.
    -- Deletes from that table all rows that are not currently referenced by
    -- any foreign key.

    CALL dofat_deletion_clause_for_table(table_name, @deletion_clause);
    SET @stmt = CONCAT(
       'DELETE FROM ', @table_name,
       ' WHERE ', @deletion_clause
    );

    PREPARE stmt FROM @stmt;
    EXECUTE stmt;
END//

CREATE PROCEDURE dofat_delete_orphans_from_all_tables_iter(
    OUT rows_deleted INT
)
BEGIN    
    -- dofat_store_tables_targeted_by_foreign_keys must be called before this
    -- will work.
    --
    -- Loops ONCE over all tables that are currently referenced by a foreign
    -- key. For each table, deletes all rows that are not currently referenced.
    -- Note that this is not guaranteed to leave all tables without orphans,
    -- since the deletion of rows from a table late in the sequence may leave
    -- rows from a table early in the sequence orphaned.
    DECLARE loop_done BOOL;

    -- Variable name needs to differ from the column name we use to populate it
    -- because of bug http://bugs.mysql.com/bug.php?id=28227
    DECLARE table_name_ VARCHAR(255); 

    DECLARE curs CURSOR FOR SELECT table_name FROM TargetTableNames;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET loop_done = TRUE;

    SET rows_deleted = 0;
    SET loop_done = FALSE;

    OPEN curs;
    REPEAT
        FETCH curs INTO table_name_;
        CALL dofat_delete_orphans_from_table(table_name_);
        SET rows_deleted = rows_deleted + ROW_COUNT();
    UNTIL loop_done END REPEAT;
    CLOSE curs;
END//

CREATE PROCEDURE delete_orphans_from_all_tables ()
BEGIN    
    CALL dofat_store_tables_targeted_by_foreign_keys();
    REPEAT
        CALL dofat_delete_orphans_from_all_tables_iter(@rows_deleted);
    UNTIL @rows_deleted = 0 END REPEAT;
END//
delimiter ;

As an aside, this exercise has taught me about a few things that make writing code of this level of complexity using MySQL sprocs a frustrating business. I mention all these only because they may help you, or a curious future reader, understand what look like crazy stylistic choices in the code above.

顺便说一句，这个练习让我了解了一些使得使用MySQL sprocs编写这种复杂程度的代码令人沮丧的事情。我提到所有这些只是因为它们可以帮助你，或者一个好奇的未来读者，在上面的代码中理解看起来像疯狂的风格选择。

Grossly verbose syntax and boilerplate for simple things. e.g.
- needing to declare and assign on different lines
- 需要在不同的行上声明和分配
- needing to set delimiters around procedure definitions
- 需要围绕过程定义设置分隔符
- needing to use a PREPARE/EXECUTE combo to use dynamic SQL).
- 需要使用PREPARE / EXECUTE组合来使用动态SQL）。
简单易懂的语法和样板文件。例如需要在需要在需要使用PREPARE / EXECUTE组合来使用动态SQL的过程定义周围设置分隔符的不同行上声明和赋值。
Utter lack of referential transparency:
- PREPARE stmt FROM CONCAT( ... ); is a syntax error, while @foo = CONCAT( ... ); PREPARE stmt FROM @foo; is not.
- 从CONCAT预备stmt（...）;是语法错误，而@foo = CONCAT（...）;从@foo预备stmt FROM;不是。
- EXECUTE stmt USING @foo is fine, but EXECUTE stmt USING foo where foo is a procedure variable is a syntax error.
- 执行stmt USING @foo很好，但是EXECUTE stmt使用foo，其中foo是一个过程变量，这是一个语法错误。
- A SELECT statement and a procedure whose last statement is a select statement both return a result set, but pretty much everything you'd ever like to do with a result set (like looping over it or checking if something is IN it) can only be targeted at a SELECT statement, not a CALL statement.
- 一个SELECT语句和一个最后一个语句是select语句的过程都返回一个结果集，但几乎所有你想用结果集做的事情（比如循环它或检查是否有东西）只能是针对SELECT语句，而不是CALL语句。
- You can pass a session variable as an OUT parameter to a sproc, but you can't pass a sproc variable as an OUT parameter to a sproc.
- 您可以将会话变量作为OUT参数传递给sproc，但不能将sproc变量作为OUT参数传递给sproc。
缺乏参考透明度：PREPARE stmt FROM CONCAT（...）;是语法错误，而@foo = CONCAT（...）;从@foo预备stmt FROM;不是。执行stmt USING @foo很好，但是EXECUTE stmt使用foo，其中foo是一个过程变量，这是一个语法错误。一个SELECT语句和一个最后一个语句是select语句的过程都返回一个结果集，但几乎所有你想用结果集做的事情（比如循环它或检查是否有东西）只能是针对SELECT语句，而不是CALL语句。您可以将会话变量作为OUT参数传递给sproc，但不能将sproc变量作为OUT参数传递给sproc。
Totally arbitrary restrictions and bizarre behaviours that blindside you:
- No dynamic SQL allowed in functions, only in procedures
- 函数中不允许使用动态SQL，仅在过程中允许
- Using a cursor to fetch from a column into a procedure variable of the same name always sets the variable to NULL but throws no warning or error
- 使用游标从列中获取同名的过程变量始终将变量设置为NULL但不抛出任何警告或错误
完全随意的限制和奇怪的行为让你不知所措：函数中不允许动态SQL，仅在过程中使用游标从列中获取同名过程变量总是将变量设置为NULL但不抛出任何警告或错误
Lack of ability to cleanly pass result sets between procedures

缺乏在程序之间干净地传递结果集的能力

Result sets are a basic type in SQL; they're what SELECTs return and you think about them as objects when using SQL from the application layer. But within a MySQL sproc, you can't assign them to variables or pass them from one sproc to another. If you truly need this functionality, you have to have one sproc write a result set into a temporary table so that another sproc can read it.

结果集是SQL中的基本类型;它们是SELECT返回的内容，当您从应用程序层使用SQL时，它们将它们视为对象。但是在MySQL sproc中，你不能将它们分配给变量或将它们从一个sproc传递到另一个sproc。如果您确实需要此功能，则必须让一个sproc将结果集写入临时表，以便另一个sproc可以读取它。
Eccentric and unfamiliar constructs and idioms:
- Three equivalent ways of assigning to a variable - SET foo = bar, SELECT foo = bar and SELECT bar INTO foo.
- 分配给变量的三种等效方法 - SET foo = bar，SELECT foo = bar和SELECT bar INTO foo。
- You'd expect that you should use procedure variables for all your state and avoid session variables for the same reasons that you avoid globals in a normal programming language. But in fact you need to use session variables everywhere because so many language constructs (like OUT params and EXECUTE) won't accept any other kind of variable.
- 您希望您应该对所有状态使用过程变量并避免会话变量，原因与您在普通编程语言中避免使用全局变量的原因相同。但实际上你需要在任何地方使用会话变量，因为许多语言结构（如OUT params和EXECUTE）不会接受任何其他类型的变量。
- The syntax for using a cursor to loop over a result set just looks alien.
- 使用游标循环结果集的语法看起来很陌生。
偏心和不熟悉的结构和习语：分配给变量的三种等价方式 - SET foo = bar，SELECT foo = bar和SELECT bar INTO foo。您希望您应该对所有状态使用过程变量并避免会话变量，原因与您在普通编程语言中避免使用全局变量的原因相同。但实际上你需要在任何地方使用会话变量，因为许多语言结构（如OUT params和EXECUTE）不会接受任何其他类型的变量。使用游标循环结果集的语法看起来很陌生。

Despite these obstacles, you can still piece together small programs like this with sprocs if you are determined.

尽管有这些障碍，如果你有决心的话，你仍然可以用sprocs拼凑这样的小程序。

#2

This issue is addressed in the MySQL Performance blog, http://www.percona.com/blog/2011/11/18/eventual-consistency-in-mysql/

MySQL性能博客中解决了此问题，http：//www.percona.com/blog/2011/11/18/eventual-consistency-in-mysql/

He provides the following meta query, to generate queries that will identify orphaned nodes;

他提供以下元查询，以生成将识别孤立节点的查询;

SELECT CONCAT(
 'SELECT ', GROUP_CONCAT(DISTINCT CONCAT(K.CONSTRAINT_NAME, '.', P.COLUMN_NAME,
  ' AS `', P.TABLE_SCHEMA, '.', P.TABLE_NAME, '.', P.COLUMN_NAME, '`') ORDER BY P.ORDINAL_POSITION), ' ',
 'FROM ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',
 'LEFT OUTER JOIN ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',
 ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),
 ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',
 'WHERE ', K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME, ' IS NULL;'
 ) AS _SQL
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K
 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P
 ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)
 AND P.CONSTRAINT_NAME = 'PRIMARY'
 WHERE K.REFERENCED_TABLE_NAME IS NOT NULL
 GROUP BY K.CONSTRAINT_NAME;

I converted this to find childless parents, producing;

我改变了这个，找到没有孩子的父母，生产;

SELECT CONCAT(
 'SELECT ', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ' ',

 'FROM ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',
 'LEFT OUTER JOIN ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',
 ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),
 ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',
 'WHERE ', K.CONSTRAINT_NAME, '.', K.COLUMN_NAME, ' IS NULL;'
 ) AS _SQL
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K
 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P
 ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)
 AND P.CONSTRAINT_NAME = 'PRIMARY'
 WHERE K.REFERENCED_TABLE_NAME IS NOT NULL
 GROUP BY K.CONSTRAINT_NAME;

#3

Since I had some weird SQL syntax errors, here is a solution which uses SQL from the accepted answer and Groovy. Use orphanedNodeStatistics() to get the number of nodes per table which would be deleted, dumpOrphanedNodes(String tableName) to dump the PKs of nodes which would be deleted and deleteOrphanedNodes(String tableName) to delete them.

由于我有一些奇怪的SQL语法错误，这里有一个解决方案，它使用接受的答案和Groovy中的SQL。使用orphanedNodeStatistics（）获取每个要删除的表的节点数，dumpOrphanedNodes（String tableName）转储要删除的节点的PK和deleteOrphanedNodes（String tableName）来删除它们。

To delete all of them, iterate over the set returned by tablesTargetedByForeignKeys()

要删除所有这些，请迭代tablesTargetedByForeignKeys（）返回的集合

import groovy.sql.Sql

class OrphanNodesTool {

    Sql sql;
    String schema;

    Set<String> tablesTargetedByForeignKeys() {
        def query = '''\
SELECT referenced_table_name
FROM INFORMATION_SCHEMA.key_column_usage
WHERE referenced_table_schema = ?
'''
        def result = new TreeSet()
        sql.eachRow( query, [ schema ] ) { row ->
            result << row[0]
        }
        return result
    }

    String conditionsToFindOrphans( String tableName ) {
        List<String> conditions = []

        def query = '''\
SELECT referenced_column_name, column_name, table_name
FROM INFORMATION_SCHEMA.key_column_usage
WHERE referenced_table_schema = ?
    AND referenced_table_name = ?
'''
        sql.eachRow( query, [ schema, tableName ] ) { row ->
            conditions << "NOT (${tableName}.${row.referenced_column_name} IN (SELECT ${row.column_name} FROM ${row.table_name}) <=> 1)"
        }

        return conditions.join( '\nAND ' )
    }

    List<Long> listOrphanedNodes( String tableName ) {
        def query = """\
SELECT ${tableName}.${tableName}_ID
FROM ${tableName}
WHERE ${conditionsToFindOrphans(tableName)}
""".toString()

        def result = []
        sql.eachRow( query ) { row ->
            result << row[0]
        }
        return result
    }

    void dumpOrphanedNodes( String tableName ) {
        def pks = listOrphanedNodes( tableName )
        println( String.format( "%8d %s", pks.size(), tableName ) )
        if( pks.size() < 10 ) {
            pks.each {
                println( String.format( "%16d", it as long ) )
            }
        } else {
            pks.collate( 20 ) { chunk ->
                chunk.each {
                    print( String.format( "%16d ", it as long ) )
                }
                println()
            }
        }
    }

    int countOrphanedNodes( String tableName ) {
        def query = """\
SELECT COUNT(*)
FROM ${tableName}
WHERE ${conditionsToFindOrphans(tableName)}
""".toString()

        int result;
        sql.eachRow( query ) { row ->
                result = row[0]
        }
        return result
    }

    int deleteOrphanedNodes( String tableName ) {
        def query = """\
DELETE
FROM ${tableName}
WHERE ${conditionsToFindOrphans(tableName)}
""".toString()

        int result = sql.execute( query )
        return result
    }

    void orphanedNodeStatistics() {
        def tableNames = tablesTargetedByForeignKeys()
        for( String tableName : tableNames ) {
            int n = countOrphanedNodes( tableName )
            println( String.format( "%8d %s", n, tableName ) )
        }
    }
}

(gist)

（要旨）

#1