I've been playing around with database programming lately, and I noticed something a little bit alarming.
我最近一直在研究数据库编程,我注意到一些令人担忧的事情。
I took a binary flat file saved in a proprietary, non-compressed format that holds several different types of records, built schemas to represent the same records, and uploaded the data into a Firebird database. The original flat file was about 7 MB. The database is over 70 MB!
我取了一个保存在私有的、非压缩格式中的二进制平面文件,该格式保存了几种不同类型的记录,构建了表示相同记录的模式,并将数据上载到Firebird数据库中。原始的平面文件大约是7mb,数据库超过70mb !
I can understand that there's some overhead to describe the tables themselves, and I've got a few minimal indices (mostly PKs) and FKs on various tables, and all that is going to take up some space, but a factor of 10 just seems a little bit ridiculous. Does anyone have any ideas as to what could be bloating up this database so badly, and how I could bring the size down?
我可以理解描述表本身有一些开销,我有一些最小的索引(主要是PKs)和FKs在不同的表上,所有这些都将占用一些空间,但是10的因数看起来有点荒谬。有没有人知道什么东西会把这个数据库弄得这么糟,我怎么能把尺寸减小?
3 个解决方案
#1
2
From Firebird FAQ:
从火鸟常见问题解答:
Many users wonder why they don't get their disk space back when they delete a lot of records from database.
许多用户想知道,当他们从数据库中删除大量的记录时,为什么没有得到他们的磁盘空间。
The reason is that it is an expensive operation, it would require a lot of disk writes and memory - just like doing refragmentation of hard disk partition. The parts of database (pages) that were used by such data are marked as empty and Firebird will reuse them next time it needs to write new data.
原因是这是一个昂贵的操作,它需要大量的磁盘写和内存——就像重新分割硬盘分区一样。这些数据所使用的数据库(页面)部分被标记为空,当下次需要编写新数据时,Firebird将重用它们。
If disk space is critical for you, you can get the space back by doing backup and then restore. Since you're doing the backup to restore right away, it's wise to use the "inhibit garbage collection" or "don't use garbage collection" switch (-G in isql), which will make backup go A LOT FASTER. Garbage collection is used to clean up your database, and as it is a maintenance task, it's often done together with backup (as backup has to go throught entire database anyway). However, you're soon going to ditch that database file, and there's no need to clean it up.
如果磁盘空间对您来说很重要,您可以通过执行备份然后恢复来获得空间。由于您正在进行备份以立即恢复,因此明智的做法是使用“禁止垃圾收集”或“不要使用垃圾收集”开关(isql中的-G),这将使备份运行得更快。垃圾收集用于清理数据库,由于它是一项维护任务,所以经常与备份一起执行(因为备份必须遍历整个数据库)。但是,您很快就会丢弃这个数据库文件,不需要清理它。
#2
1
Gstat is the tool to examine table sizes etc, maybe it will give you some hints what's using space.
Gstat是检查表大小等的工具,它可能会给您一些关于使用空格的提示。
In addition, you may also have multiple snapshots or other garbage in database file, it depends on how you add data to the database. The database file never shrinks automatically, but backup/restore cycle gets rid of junk and empty space.
此外,在数据库文件中也可能有多个快照或其他垃圾,这取决于如何向数据库添加数据。数据库文件永远不会自动收缩,但是备份/恢复循环将摆脱垃圾和空空间。
#3
1
Firebird fill pages in some factor not full. e.g. db page can contain 70% of data and 30% free space to speed up future record updates, deletes without moving to new db page.
Firebird填充页面的某个因素不完整。例如,db页面可以包含70%的数据和30%的空闲空间来加速未来的记录更新,删除不移动到新的db页面。
CONFIGREVISIONSTORE (213)
Primary pointer page: 572, Index root page: 573
Data pages: 2122, data page slots: 2122, average fill: 82%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 79
80 - 99% = 2042
The same is for indexes.
索引也是如此。
You can see how really db size is when you do backup and restore with option
您可以看到在使用选项进行备份和恢复时db的大小有多大
-USE_ALL_SPACE
then database will be restored without this space preservation. You must know also that not only pages with data are allocated but also some pages are preallocated (empty) for future fast use without expensive disc allocation and fragmentation.
然后数据库将被恢复,而不需要这个空间保存。您还必须知道,不仅要分配具有数据的页面,而且还要预先分配(空)一些页面,以便将来快速使用,而不需要昂贵的磁盘分配和分段。
as "Peter G." say - database is much more then flat file and is optimized to speed up thinks.
正如“Peter G.”所说的那样——数据库比平面文件要重要得多,而且是为了加快思考而优化的。
and as "Harriv" say - you can get details about database file with gstat
正如“Harriv”所说,您可以使用gstat获得关于数据库文件的详细信息
use command like gstat - here are details about its output
使用像gstat这样的命令——这里是关于它的输出的详细信息。
#1
2
From Firebird FAQ:
从火鸟常见问题解答:
Many users wonder why they don't get their disk space back when they delete a lot of records from database.
许多用户想知道,当他们从数据库中删除大量的记录时,为什么没有得到他们的磁盘空间。
The reason is that it is an expensive operation, it would require a lot of disk writes and memory - just like doing refragmentation of hard disk partition. The parts of database (pages) that were used by such data are marked as empty and Firebird will reuse them next time it needs to write new data.
原因是这是一个昂贵的操作,它需要大量的磁盘写和内存——就像重新分割硬盘分区一样。这些数据所使用的数据库(页面)部分被标记为空,当下次需要编写新数据时,Firebird将重用它们。
If disk space is critical for you, you can get the space back by doing backup and then restore. Since you're doing the backup to restore right away, it's wise to use the "inhibit garbage collection" or "don't use garbage collection" switch (-G in isql), which will make backup go A LOT FASTER. Garbage collection is used to clean up your database, and as it is a maintenance task, it's often done together with backup (as backup has to go throught entire database anyway). However, you're soon going to ditch that database file, and there's no need to clean it up.
如果磁盘空间对您来说很重要,您可以通过执行备份然后恢复来获得空间。由于您正在进行备份以立即恢复,因此明智的做法是使用“禁止垃圾收集”或“不要使用垃圾收集”开关(isql中的-G),这将使备份运行得更快。垃圾收集用于清理数据库,由于它是一项维护任务,所以经常与备份一起执行(因为备份必须遍历整个数据库)。但是,您很快就会丢弃这个数据库文件,不需要清理它。
#2
1
Gstat is the tool to examine table sizes etc, maybe it will give you some hints what's using space.
Gstat是检查表大小等的工具,它可能会给您一些关于使用空格的提示。
In addition, you may also have multiple snapshots or other garbage in database file, it depends on how you add data to the database. The database file never shrinks automatically, but backup/restore cycle gets rid of junk and empty space.
此外,在数据库文件中也可能有多个快照或其他垃圾,这取决于如何向数据库添加数据。数据库文件永远不会自动收缩,但是备份/恢复循环将摆脱垃圾和空空间。
#3
1
Firebird fill pages in some factor not full. e.g. db page can contain 70% of data and 30% free space to speed up future record updates, deletes without moving to new db page.
Firebird填充页面的某个因素不完整。例如,db页面可以包含70%的数据和30%的空闲空间来加速未来的记录更新,删除不移动到新的db页面。
CONFIGREVISIONSTORE (213)
Primary pointer page: 572, Index root page: 573
Data pages: 2122, data page slots: 2122, average fill: 82%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 79
80 - 99% = 2042
The same is for indexes.
索引也是如此。
You can see how really db size is when you do backup and restore with option
您可以看到在使用选项进行备份和恢复时db的大小有多大
-USE_ALL_SPACE
then database will be restored without this space preservation. You must know also that not only pages with data are allocated but also some pages are preallocated (empty) for future fast use without expensive disc allocation and fragmentation.
然后数据库将被恢复,而不需要这个空间保存。您还必须知道,不仅要分配具有数据的页面,而且还要预先分配(空)一些页面,以便将来快速使用,而不需要昂贵的磁盘分配和分段。
as "Peter G." say - database is much more then flat file and is optimized to speed up thinks.
正如“Peter G.”所说的那样——数据库比平面文件要重要得多,而且是为了加快思考而优化的。
and as "Harriv" say - you can get details about database file with gstat
正如“Harriv”所说,您可以使用gstat获得关于数据库文件的详细信息
use command like gstat - here are details about its output
使用像gstat这样的命令——这里是关于它的输出的详细信息。