删除后加载数据回收磁盘空间

时间:2022-09-24 10:42:17

I have a DB schema composed of MYISAM tables, i am interested to delete old records from time to time from some of the tables.

我有一个由MYISAM表组成的DB模式,我有兴趣不时地从一些表中删除旧记录。

I know that delete does not reclaim the memory space, but as i found in a description of DELETE command, inserts may reuse the space deleted

我知道delete并不回收内存空间,但是我在delete命令的描述中发现,insert可以重用已删除的空间

In MyISAM tables, deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions.

在MyISAM表中,删除的行保存在一个链表中,随后的插入操作重用旧行位置。

I am interested if LOAD DATA command also reuses the deleted space?

我感兴趣的是LOAD DATA command是否也重用已删除的空间?

UPDATE

更新

I am also interested how the index space reclaimed?

我还对索引空间如何回收感兴趣?

UPDATE 2012-12-03 23:11

更新2012-12-03 23:11

some more info supplied based on the answer received from @RolandoMySQLDBA

根据从@RolandoMySQLDBA收到的答案提供更多的信息

after executing the following suggested query i got different results for different tables for which space need to be reused or reclaimed:

在执行以下建议查询后,对于需要重用或回收空间的不同表,我得到了不同的结果:

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable1';

> Dynamic

>动态

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable2';

> Fixed

>固定

UPDATE 2012-12-09 08:06

更新2012-12-09 08:06

LOAD DATA do reuses previously deleted space (i have checked it by running a short script) if and only if the row format is fixed or (the row format is dynamic and there is a deleted row with exactly the same size).

当且仅当行格式是固定的或(行格式是动态的,并且有一个被删除的行大小完全相同)时,LOAD数据会重新使用先前删除的空间(我通过运行一个简短的脚本对其进行了检查)。

it seems that if the row_format is dynamic, full look-up over the deleted list is made for each record , and if the exact row size is not found , the deleted record is not used, and the table memory usage will raise, additionally LOAD DATA will take much more time to import records.

看来,如果row_format是动态的,全面查找删除列表为每个记录,如果没有找到确切的行大小,删除没有使用记录,和表内存使用将提高,另外加载数据将进口更多的时间记录。

I will except the answer given here , since it describes all the process perfectly.

我将不考虑这里给出的答案,因为它完美地描述了所有的过程。

1 个解决方案

#1


4  

For a MySQL table called mydb.mytable just run the following:

MySQL表中称为mydb。mytable运行如下:

OPTIMIZE TABLE mydb.mytable;

You could also do this in stages:

你也可以分阶段进行:

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;

In either case, the table ends up with no fragmentation.

在这两种情况下,表最终都没有分段。

Give it a Try !!!

试试看!!!

UPDATE 2012-12-03 12:50 EDT

If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:

如果您关心是否在批量插入时通过加载数据文件重用行,请注意以下内容:

When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either

创建MyISAM表时,我假设默认的行格式是动态的。你也可以检查它是什么

SHOW CREATE TABLE mydb.mytable\G

or

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';

Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.

由于表的行格式是动态的,所以分段行大小不一。MyISAM存储引擎将继续检查每个被删除的行长度,以查看是否插入下一组数据。如果传入的数据不能适合任何已删除的行,则添加新的行数据。

The presence of such rows can make myisamchk struggle.

这种争吵的出现会让米萨姆切克陷入困境。

This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.

这就是为什么我推荐运行优化表。这样,数据就会被更快地添加进来。

UPDATE 2012-12-03 12:58 EDT

Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.

您还可以做一些有趣的事情:尝试将concurrent_insert设置为2。通过这种方式,您总是附加到一个ismyam表,而不检查表中的间隔。这将极大地加速插入,但只留下所有已知的空白。

You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.

您仍然可以使用优化表在方便的时候对表进行碎片整理。

UPDATE 2012-12-03 13:40 EDT

Why don't run the my second sugesstion

为什么不提出我的第二个建议呢

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;

This will give you an idea

这会给你一个想法

  • How long OPTIMIZE TABLE would take to run
  • 优化表运行需要多长时间
  • How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE
  • 运行优化表之后。myd和。myi会变得更小吗

After you run my second suggestion, you can compare them with

在你运行我的第二个建议之后,你可以比较它们。

SELECT
    A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
    A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable_new'
) B;

UPDATE 2012-12-03 16:42 EDT

Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.

任何将ROW_FORMAT设置为fixed的表都可以每次分配相同的长度行。如果MyISAM表维护一个已删除行列表,则应该始终选择列表中的第一行作为插入数据的下一行。不需要遍历整个列表,直到找到足够长的合适行间隔。每个被删除的行都在删除后被快速追加。每个插入将选择删除行的第一行。

We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:

我们可以假定这些事情,因为MyISAM表可以执行并发插入。为了通过concurrent_insert选项获得这个特性,必须能够检测到以下三种情况之一:

  1. The presence of a list of deleted rows, thus choosing from the list
    • Row_Format=Dynamic : list of deleted rows with each row with a different length
    • Row_Format=Dynamic:删除行列表,每一行的长度不同
    • Row_Format=Fixed : list of deleted rows with all rows the same length
    • Row_Format=Fixed:删除行列表,所有行的长度相同
  2. 删除行列表的存在,从而从列表Row_Format=Dynamic:删除行列表中选择,每一行的长度不同,行长度不同,行长度相同的行列表
  3. The absence of a list of deleted rows, thus appending
  4. 没有删除的行列表,因此附加。
  5. Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
  6. 对已删除行的列表进行旁路检查(将concurrent_insert设置为2)

For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.

为了使检测#1是最快的,必须修复MyISAM表的row_format。如果它是动态的,很可能需要进行列表遍历。

#1


4  

For a MySQL table called mydb.mytable just run the following:

MySQL表中称为mydb。mytable运行如下:

OPTIMIZE TABLE mydb.mytable;

You could also do this in stages:

你也可以分阶段进行:

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;

In either case, the table ends up with no fragmentation.

在这两种情况下,表最终都没有分段。

Give it a Try !!!

试试看!!!

UPDATE 2012-12-03 12:50 EDT

If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE, please note the following:

如果您关心是否在批量插入时通过加载数据文件重用行,请注意以下内容:

When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either

创建MyISAM表时,我假设默认的行格式是动态的。你也可以检查它是什么

SHOW CREATE TABLE mydb.mytable\G

or

SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';

Since the row format of your table is Dynamic, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.

由于表的行格式是动态的,所以分段行大小不一。MyISAM存储引擎将继续检查每个被删除的行长度,以查看是否插入下一组数据。如果传入的数据不能适合任何已删除的行,则添加新的行数据。

The presence of such rows can make myisamchk struggle.

这种争吵的出现会让米萨姆切克陷入困境。

This is why I recommended running OPTIMIZE TABLE. That way, data would be appended quicker.

这就是为什么我推荐运行优化表。这样,数据就会被更快地添加进来。

UPDATE 2012-12-03 12:58 EDT

Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.

您还可以做一些有趣的事情:尝试将concurrent_insert设置为2。通过这种方式,您总是附加到一个ismyam表,而不检查表中的间隔。这将极大地加速插入,但只留下所有已知的空白。

You could still defragment your table at your earliest convenience using OPTIMIZE TABLE.

您仍然可以使用优化表在方便的时候对表进行碎片整理。

UPDATE 2012-12-03 13:40 EDT

Why don't run the my second sugesstion

为什么不提出我的第二个建议呢

CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;

This will give you an idea

这会给你一个想法

  • How long OPTIMIZE TABLE would take to run
  • 优化表运行需要多长时间
  • How much smaller the .MYD and .MYI would be after running OPTIMIZE TABLE
  • 运行优化表之后。myd和。myi会变得更小吗

After you run my second suggestion, you can compare them with

在你运行我的第二个建议之后,你可以比较它们。

SELECT
    A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
    A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
    SELECT data_length mydsize,index_length myisize
    FROM information_schema.tables
    WHERE table_schema='mydb' AND table_name='mytable_new'
) B;

UPDATE 2012-12-03 16:42 EDT

Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE. Each INSERT would pick the first row of the deleted rows.

任何将ROW_FORMAT设置为fixed的表都可以每次分配相同的长度行。如果MyISAM表维护一个已删除行列表,则应该始终选择列表中的第一行作为插入数据的下一行。不需要遍历整个列表,直到找到足够长的合适行间隔。每个被删除的行都在删除后被快速追加。每个插入将选择删除行的第一行。

We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:

我们可以假定这些事情,因为MyISAM表可以执行并发插入。为了通过concurrent_insert选项获得这个特性,必须能够检测到以下三种情况之一:

  1. The presence of a list of deleted rows, thus choosing from the list
    • Row_Format=Dynamic : list of deleted rows with each row with a different length
    • Row_Format=Dynamic:删除行列表,每一行的长度不同
    • Row_Format=Fixed : list of deleted rows with all rows the same length
    • Row_Format=Fixed:删除行列表,所有行的长度相同
  2. 删除行列表的存在,从而从列表Row_Format=Dynamic:删除行列表中选择,每一行的长度不同,行长度不同,行长度相同的行列表
  3. The absence of a list of deleted rows, thus appending
  4. 没有删除的行列表,因此附加。
  5. Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
  6. 对已删除行的列表进行旁路检查(将concurrent_insert设置为2)

For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.

为了使检测#1是最快的,必须修复MyISAM表的row_format。如果它是动态的,很可能需要进行列表遍历。