我其实是想郑重其事的回答以下在上个礼拜分享会后,一个人最后对我的一个问题,问题是, PG 和 MYSQL 到底那个是 NO.1 ,当时我的语气不是很好,因为我觉得很无聊,在此表示道歉,但观点我是不变的。
作为一个技术人员,不是对自己的技术有要求,而是成天的讨论 PG 第一还是 MYSQL 第一,我不觉得是光彩的事情,这只能说明一件事情,你无能,提出这个问题的人,很可能在技术上是 physically challenged individual.
你在问这个问题的时候,其实你在告诉别人,这两个数据库你必然有一个你不会,不管是PG 或是 MYSQL 。为什么不能提高自己,两个都会呢?我想那时的你,不会在提出这样问题,因为你有自信,告诉任何人,"Who cares about which one is NO.1? I can operate both of them." That's all.
——————————————————————————————
技术正文
道完歉,最近在重新的review PostgreSQL和MySQL在内置备份这个层面的功能,pg_dump , mysqldump ,实话实说mysqldump 的确无法和PG的 pg_dump 的功能比较。这主要是从这几个方面来进行
1 备份的速度
2 备份的多线程
3 备份命令中直接带有压缩的功能
4 备份命令直接支持数据一致性备份,并且是在并发备份模式下(不需要带参数)
但是咱们还是得实话实说,MYSQL的市场占有率就是高,两手都要抓,两手都要硬。
当然MySQL 的MySQLDUMP从早期的5.0中,当时我使用的是是没有 --single-transaction 的这个一致性的支持是到了mysql5.1.1.0中才有的功能。当时只能只用 --lock-tables 来进行数据库的备份。
在备份中MySQL 是要通过 --single-transaction 来进行数据库的备份,如果不使用这个参数,那么你对数据库的操作,只能要导出,不能叫备份,因为此时你没有这个参数的加持,你的备份中的表输出的数据都不在一个时间点,所以没有这个参数的备份是失败的。
这里我们简单用一个备份,并跟踪整个备份的过程来看看到底备份一个数据库中经历了什么。
2023-12-30T15:22:50.379211-00:00 8 Connect root@localhost on using Socket
2023-12-30T15:22:50.379506-00:00 8 Query /*!40100 SET @@SQL_MODE='' */
2023-12-30T15:22:50.379631-00:00 8 Query /*!40103 SET TIME_ZONE='+00:00' */
2023-12-30T15:22:50.379739-00:00 8 Query /*!80000 SET SESSION information_schema_stats_expiry=0 */
2023-12-30T15:22:50.379925-00:00 8 Query SET SESSION NET_READ_TIMEOUT= 86400, SESSION NET_WRITE_TIMEOUT= 86400
2023-12-30T15:22:50.380446-00:00 8 Query SHOW VARIABLES LIKE 'gtid_mode'
2023-12-30T15:22:50.391288-00:00 8 Query FLUSH /*!40101 LOCAL */ TABLES
2023-12-30T15:22:50.391924-00:00 8 Query FLUSH TABLES WITH READ LOCK
2023-12-30T15:22:50.392361-00:00 8 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2023-12-30T15:22:50.392899-00:00 8 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2023-12-30T15:22:50.393304-00:00 8 Query SHOW BINARY LOG STATUS
2023-12-30T15:22:50.393518-00:00 8 Query UNLOCK TABLES
2023-12-30T15:22:50.393714-00:00 8 Query SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE ENGINE = 'ndbcluster' AND FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IN (SELECT DISTINCT LOGFILE_GROUP_NAME FROM INFORMATION_SCHEMA.FILES WHERE ENGINE = 'ndbcluster' AND FILE_TYPE = 'DATAFILE' AND TABLESPACE_NAME IN (SELECT DISTINCT TABLESPACE_NAME FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA IN ('test'))) GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME
2023-12-30T15:22:50.404327-00:00 8 Query SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' AND TABLESPACE_NAME IN (SELECT DISTINCT TABLESPACE_NAME FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA IN ('test')) ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME
2023-12-30T15:22:50.406627-00:00 8 Query SHOW VARIABLES LIKE 'ndbinfo\_version'
2023-12-30T15:22:50.411160-00:00 8 Init DB test
2023-12-30T15:22:50.411488-00:00 8 Query SAVEPOINT sp
2023-12-30T15:22:50.411854-00:00 8 Query show tables
2023-12-30T15:22:50.416006-00:00 8 Query show table status like 'test'
2023-12-30T15:22:50.417961-00:00 8 Query SET SQL_QUOTE_SHOW_CREATE=1
2023-12-30T15:22:50.418353-00:00 8 Query SET SESSION character_set_results = 'binary'
2023-12-30T15:22:50.418666-00:00 8 Query show create table `test`
2023-12-30T15:22:50.426797-00:00 8 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T15:22:50.427208-00:00 8 Query show fields from `test`
2023-12-30T15:22:50.431178-00:00 8 Query show fields from `test`
2023-12-30T15:22:50.432739-00:00 8 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`
2023-12-30T15:22:50.433439-00:00 8 Query SET SESSION character_set_results = 'binary'
2023-12-30T15:22:50.433681-00:00 8 Query use `test`
2023-12-30T15:22:50.433969-00:00 8 Query select @@collation_database
2023-12-30T15:22:50.434119-00:00 8 Query SHOW TRIGGERS LIKE 'test'
2023-12-30T15:22:50.439390-00:00 8 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T15:22:50.439750-00:00 8 Query SET SESSION character_set_results = 'binary'
2023-12-30T15:22:50.440197-00:00 8 Query SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"') FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test'
2023-12-30T15:22:50.441997-00:00 8 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T15:22:50.442242-00:00 8 Query ROLLBACK TO SAVEPOINT sp
2023-12-30T15:22:50.442521-00:00 8 Query RELEASE SAVEPOINT sp
Query FLUSH /*!40101 LOCAL */ TABLES
Query FLUSH TABLES WITH READ LOCK
Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
Query SELECT @@GLOBAL.GTID_EXECUTED
Query SHOW BINARY LOG STATUS
Query UNLOCK TABLES
这里在备份中可以分为几个过程
1 如上面的显示的,他需要对表执行read lock,此时获得一个全局锁,为备份做一个完整的准备,此时数据库的表将变为只读的状态,并且马上针对备份的SESSION 设置为repeatable read 的状态,方便准备在备份期间为数据表备份中的数据一致性进行准备,然后建立事务,获取当前的GTID的号并查看当前的BINLOG 当前的文件是那个。然后解锁全局。
Query SAVEPOINT sp
Query show tables
Query show table status like 'test'
Query SET SQL_QUOTE_SHOW_CREATE=1
Query SET SESSION character_set_results = 'binary'
Query show create table `test`
Query SET SESSION character_set_results = 'utf8mb4'
Query show fields from `test`
Query show fields from `test`
Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`
Query SET SESSION character_set_results = 'binary'
Query use `test`
Query select @@collation_database
Query SHOW TRIGGERS LIKE 'test'
Query SET SESSION character_set_results = 'utf8mb4'
Query SET SESSION character_set_results = 'binary'
Query SELECT COLUMN_NAME,
JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"')
FROM information_schema.COLUMN_STATISTICS
WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test'
Query SET SESSION character_set_results = 'utf8mb4'
Query ROLLBACK TO SAVEPOINT sp
Query RELEASE SAVEPOINT sp
而后面就开始针对表进行备份,这里会先建立一个当前事务的 savepoint 方便后面进行rollback, 在建立了savepoint后开始针对表的创建语句,字符集以及数据信息进行获取这里都在上面的备份的过程中有体现,并通过系统表获得表中的字段,然后rollback 到设置savepoint的点,整个数据库的备份完成。
在mysqldump 源代码中的这部分就是上面展示的操作部分,建立save point 并且针对数据库宗所有的表进行获取,并循环方式的对表进行逐一的备份。
if (flush_logs) {
if (mysql_refresh(mysql, REFRESH_LOG))
DB_error(mysql, "when doing refresh");
/* We shall continue here, if --force was given */
else
verbose_msg("-- dump_all_tables_in_db : logs flushed successfully!\n");
}
if (opt_single_transaction && mysql_get_server_version(mysql) >= 50500) {
verbose_msg("-- Setting savepoint...\n");
if (mysql_query_with_error_report(mysql, nullptr, "SAVEPOINT sp")) return 1;
}
while ((table = getTableName(0))) {
char *end = my_stpcpy(afterdot, table);
if (include_table(hash_key, end - hash_key)) {
dump_table(table, database);
if (opt_dump_triggers && mysql_get_server_version(mysql) >= 50009) {
if (dump_triggers_for_table(table, database)) {
if (path) my_fclose(md_result_file, MYF(MY_WME));
maybe_exit(EX_MYSQLERR);
}
}
下面的这部分是针对备份中在建立SAVE POINT 后备份表完成后,对数据库执行 rollback savepoint的操作,并且执行release savepoint的工作。
if (opt_single_transaction && mysql_get_server_version(mysql) >= 50500) {
verbose_msg("-- Rolling back to savepoint sp...\n");
if (mysql_query_with_error_report(mysql, nullptr,
"ROLLBACK TO SAVEPOINT sp"))
maybe_exit(EX_MYSQLERR);
}
}
if (opt_single_transaction && mysql_get_server_version(mysql) >= 50500) {
verbose_msg("-- Releasing savepoint...\n");
if (mysql_query_with_error_report(mysql, nullptr, "RELEASE SAVEPOINT sp"))
return 1;
}
这样设计的有点通过创建事务点的方式可以保证在备份数据库表的情况下,不阻塞DDL的操作。
通过在此整理这部分的内容
1 在使用--single-transaction 对数据库备份时,如果数据库的表的数量多,数据量大的情况下,并且备份线程在RR的模式下,会在此期间将大量正在运行的事务的snapshot版本,放置在UNDO LOG中进行保存,如果此时业务繁忙,并且备份的时间在 save point ---- rollback savepoint之间的时间过长,则需要注意undo log 表空间应该有足够的空间来进行存储这段时间无法被purge 的事务信息。
2 在备份中存在FLUSH TABLES WITH READ LOCK 的主要原因是系统担心在备份获取binlog信息的时刻,此时数据库进行DDL操作,而进行锁控制,此时是不能进行DDL操作的。
同时基于大库在使用MYSQLDUMP备份中失败的可能性中,需要注意如下的部分
1 增加innodb_undo_logs 的数量增加在undo表空间中日志文件的数量,提高并发性
2 在进行读锁的加载时,会对一些高并发的业务的MYSQL数据库中的DML 操作在进行备份中加锁之间的互斥性。
所以这里建议在较大的数据库备份中,不使用mysqldump来进行数据备份,并且如果是数据导出的情况下,不要使用 single-transaction参数。
这里我们在此操作备份,并去掉 --single_transaction 参数可以从下面的跟踪信息中发现
1 系统在数据备份中不在设置 RR 的隔离级别,这里默认已经使用了 RC
2 在备份中不存在设置 SAVE POINT的情况
3 最后在对数据进行导出的时候,我个人从源代码和操作过程看,MySQL在导出数据时没有lock table 的操作,这点相对于PG 要更好,减少冲突的可能性。
相关PG的pg_dump 的分析也在前两天发出,有兴趣可以查看PG的PG_DUMP 有什么优点和特点
2023-12-30T16:43:41.860155-00:00 10 Connect root@localhost on using Socket
2023-12-30T16:43:41.860568-00:00 10 Query /*!40100 SET @@SQL_MODE='' */
2023-12-30T16:43:41.861053-00:00 10 Query /*!40103 SET TIME_ZONE='+00:00' */
2023-12-30T16:43:41.861413-00:00 10 Query /*!80000 SET SESSION information_schema_stats_expiry=0 */
2023-12-30T16:43:41.861663-00:00 10 Query SET SESSION NET_READ_TIMEOUT= 86400, SESSION NET_WRITE_TIMEOUT= 86400
2023-12-30T16:43:41.862033-00:00 10 Query SHOW VARIABLES LIKE 'gtid_mode'
2023-12-30T16:43:41.865299-00:00 10 Query FLUSH /*!40101 LOCAL */ TABLES
2023-12-30T16:43:41.866062-00:00 10 Query FLUSH TABLES WITH READ LOCK
2023-12-30T16:43:41.866303-00:00 10 Query SHOW BINARY LOG STATUS
2023-12-30T16:43:41.866803-00:00 10 Query SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE ENGINE = 'ndbcluster' AND FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IS NOT NULL AND LOGFILE_GROUP_NAME IN (SELECT DISTINCT LOGFILE_GROUP_NAME FROM INFORMATION_SCHEMA.FILES WHERE ENGINE = 'ndbcluster' AND FILE_TYPE = 'DATAFILE' AND TABLESPACE_NAME IN (SELECT DISTINCT TABLESPACE_NAME FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA IN ('test'))) GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE, TOTAL_EXTENTS, INITIAL_SIZE ORDER BY LOGFILE_GROUP_NAME
2023-12-30T16:43:41.871592-00:00 10 Query SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' AND TABLESPACE_NAME IN (SELECT DISTINCT TABLESPACE_NAME FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA IN ('test')) ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME
2023-12-30T16:43:41.874070-00:00 10 Query SHOW VARIABLES LIKE 'ndbinfo\_version'
2023-12-30T16:43:41.877178-00:00 10 Init DB test
2023-12-30T16:43:41.877564-00:00 10 Query show tables
2023-12-30T16:43:41.880012-00:00 10 Query show table status like 'test'
2023-12-30T16:43:41.882630-00:00 10 Query SET SQL_QUOTE_SHOW_CREATE=1
2023-12-30T16:43:41.883007-00:00 10 Query SET SESSION character_set_results = 'binary'
2023-12-30T16:43:41.883134-00:00 10 Query show create table `test`
2023-12-30T16:43:41.883536-00:00 10 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T16:43:41.883816-00:00 10 Query show fields from `test`
2023-12-30T16:43:41.886349-00:00 10 Query show fields from `test`
2023-12-30T16:43:41.887888-00:00 10 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`
2023-12-30T16:43:41.888263-00:00 10 Query SET SESSION character_set_results = 'binary'
2023-12-30T16:43:41.888522-00:00 10 Query use `test`
2023-12-30T16:43:41.888736-00:00 10 Query select @@collation_database
2023-12-30T16:43:41.889022-00:00 10 Query SHOW TRIGGERS LIKE 'test'
2023-12-30T16:43:41.890816-00:00 10 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T16:43:41.891073-00:00 10 Query SET SESSION character_set_results = 'binary'
2023-12-30T16:43:41.891334-00:00 10 Query SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"') FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'test'
2023-12-30T16:43:41.891976-00:00 10 Query SET SESSION character_set_results = 'utf8mb4'
2023-12-30T16:43:41.893632-00:00 10 Quit
~
这里我们使用的MySQL版本为 8.2.