导入大型mysql数据库备份的最快方法是什么?

时间:2021-02-23 19:13:31

What's the fastest way to export/import a mysql database using innodb tables?

使用innodb表导出/导入mysql数据库的最快方法是什么?

I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".

我有一个生产数据库,我经常需要下载到我的开发机器来调试客户问题。我们目前这样做的方法是下载我们的常规数据库备份,这些备份是使用“mysql -B dbname”生成的,然后是gzip。然后我们使用“gunzip -c backup.gz | mysql -u root”导入它们。

From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.

从我从“mysqldump --help”读取的内容可以看出,mysqldump默认情况下会运行-opt,看起来它会打开一些我能想到的可以使导入更快的东西,例如关闭索引并将表导入为一个大型导入语句。

Are there better ways to do this, or further optimizations we should be doing?

有没有更好的方法来做到这一点,或者我们应该做进一步的优化?

Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.

注意:我主要想要优化将数据库加载到我的开发机器上的时间(一个相对较新的macbook pro,有很多ram)。备份时间和网络传输时间目前不是大问题。

Update:

To answer some questions posed in the answers:

回答答案中提出的一些问题:

  • The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.

    生产数据库架构每周更改几次。我们正在运行rails,因此在陈旧的生产数据上运行迁移脚本相对容易。

  • We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.

    我们需要每天或每小时将生产数据放入开发环境中。这完全取决于开发人员的工作内容。我们经常遇到特定的客户问题,这些问题是分布在数据库中的许多表中的一些数据的结果,需要在开发环境中进行调试。

  • I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.

    老实说,我不知道mysqldump需要多长时间。不到2小时,因为我们目前每2小时运行一次。但是,这不是我们想要优化的,我们希望优化导入到开发人员工作站上。

  • We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.

    我们不需要完整的生产数据库,但分离我们做和不需要的东西并不是完全无关紧要的(有很多表与外键关系)。这可能是我们最终必须要去的地方,但如果可以的话,我们希望避免它延长一段时间。

2 个解决方案

#1


It depends on how you define "fastest".

这取决于你如何定义“最快”。

As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.

正如乔尔所说,开发人员的时间很昂贵。 Mysqldump可以处理和处理您自己需要处理的大量案例,或花时间评估其他产品以查看它们是否处理它们。

The pertinent questions are:

相关问题是:

How often does your production database schema change?

您的生产数据库架构多久更改一次?

Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.

注意:我指的是添加,删除或重命名表,列,视图等,即会破坏实际代码的内容。

How often do you need to put production data into a development environment?

您多久需要将生产数据放入开发环境中?

In my experience, not very often at all. I've generally found that once a month is more than sufficient.

根据我的经验,根本不是经常。我一般发现每月一次就足够了。

How long does mysqldump take?

mysqldump需要多长时间?

If it's less than 8 hours it can be done overnight as a cron job. Problem solved.

如果它不到8小时,它可以作为一个cron工作一夜之间完成。问题解决了。

Do you need all the data?

你需要所有的数据吗?

Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.

另一种优化方法是简单地获取相关的数据子集。当然,这需要编写自定义脚本以获取实体和所有相关相关实体的子集,但会产生最快的最终结果。该脚本还需要通过架构更改来维护,因此这是一个耗时的方法,应该用作绝对的最后手段。生产样本应足够大,以包含足够广泛的数据样本,并识别任何潜在的性能问题。

Conclusion

Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.

基本上,只要你绝对不能使用mysqldump。花时间在另一个解决方案上的时间不是花在开发上。

#2


Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.

考虑使用复制。这将允许您实时更新您的副本,即使您必须关闭从属服务器,MySQL复制也可以进行追赶。您还可以在普通服务器上使用并行MySQL实例,该实例将数据复制到支持在线备份的MyISAM表。只要表具有相同的定义,MySQL就允许这样做。

Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

可能值得研究的另一个选择是来自着名的MySQL性能专家Percona的XtraBackup。它是InnoDB的在线备份解决方案。但是,自己没有看过它,所以我不会保证它的稳定性,或者它甚至可以解决你的问题。

#1


It depends on how you define "fastest".

这取决于你如何定义“最快”。

As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.

正如乔尔所说,开发人员的时间很昂贵。 Mysqldump可以处理和处理您自己需要处理的大量案例,或花时间评估其他产品以查看它们是否处理它们。

The pertinent questions are:

相关问题是:

How often does your production database schema change?

您的生产数据库架构多久更改一次?

Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.

注意:我指的是添加,删除或重命名表,列,视图等,即会破坏实际代码的内容。

How often do you need to put production data into a development environment?

您多久需要将生产数据放入开发环境中?

In my experience, not very often at all. I've generally found that once a month is more than sufficient.

根据我的经验,根本不是经常。我一般发现每月一次就足够了。

How long does mysqldump take?

mysqldump需要多长时间?

If it's less than 8 hours it can be done overnight as a cron job. Problem solved.

如果它不到8小时,它可以作为一个cron工作一夜之间完成。问题解决了。

Do you need all the data?

你需要所有的数据吗?

Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.

另一种优化方法是简单地获取相关的数据子集。当然,这需要编写自定义脚本以获取实体和所有相关相关实体的子集,但会产生最快的最终结果。该脚本还需要通过架构更改来维护,因此这是一个耗时的方法,应该用作绝对的最后手段。生产样本应足够大,以包含足够广泛的数据样本,并识别任何潜在的性能问题。

Conclusion

Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.

基本上,只要你绝对不能使用mysqldump。花时间在另一个解决方案上的时间不是花在开发上。

#2


Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.

考虑使用复制。这将允许您实时更新您的副本,即使您必须关闭从属服务器,MySQL复制也可以进行追赶。您还可以在普通服务器上使用并行MySQL实例,该实例将数据复制到支持在线备份的MyISAM表。只要表具有相同的定义,MySQL就允许这样做。

Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

可能值得研究的另一个选择是来自着名的MySQL性能专家Percona的XtraBackup。它是InnoDB的在线备份解决方案。但是,自己没有看过它,所以我不会保证它的稳定性,或者它甚至可以解决你的问题。