Greenplum的全量恢复之gpdbrestore

时间:2021-11-18 12:53:44
gpdbrestore命令是对gp_restore命令的一个包装,提供了更灵活的选项,比如,使用gpcrondump自动备份的文件来恢复。使用gpdbrestore恢复必须具备:

1. 存在gpcrondump操作生成的备份文件。
2. GPDB系统正在运行。
3. 当前恢复的GPDB系统与使用gp_dump备份时的GPDB系统具有相同数量的Instance。

gpdbrestore常用参数解释
-a (do not prompt) 

 Do not prompt the user for confirmation.

-b <YYYYMMDD> 

 Looks for dump files in the segment data directories on the Greenplum
 Database array of hosts in db_dumps/<YYYYMMDD>.

-d <master_data_directory>

 Optional. The master host data directory. If not specified, the value
 set for $MASTER_DATA_DIRECTORY will be used. 

-e (drop target database before restore) 

 Drops the target database before doing the restore and then recreates
 it. 

-G [include|only]

 Restores global objects such as roles and tablespaces if the global
 object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in
 the master data directory.

 Specify either "-G only" to only restore the global objects dump file
 or "-G include" to restore global objects along with a normal restore.
 Defaults to "include" if neither argument is provided.

-l <logfile_directory>

 The directory to write the log file. Defaults to ~/gpAdminLogs. 

-m (restore metadata only)

 Performs a restore of database metadata (schema and table definitions, SET
 statements, and so forth) without restoring data.  If the --restore-stats or
 -G options are provided as well, statistics or globals will also be restored.

 The --noplan and --noanalyze options are not supported in conjunction with
 this option, as they affect the restoration of data and no data is restored.

--prefix <prefix_string> 

 If you specified the gpcrondump option --prefix <prefix_string> to create
 the backup, you must specify this option with the <prefix_string> when
 restoring the backup. 

 If you created a full backup of a set of tables with gpcrondump and
 specified a prefix, you can use gpcrondump with the options
 --list-filter-tables and --prefix <prefix_string> to list the tables
 that were included or excluded for the backup. 

--restore-stats [include|only]

 Restores optimizer statistics if the statistics dump file
 db_dumps/<date>/gp_statistics_1_1_<timestamp> is found in the master data
 directory. Setting this option automatically skips the final analyze step,
 so it is not necessary to also set the --noanalyze flag in conjunction with
 this one.

-t <timestamp_key>

 The 14 digit timestamp key that uniquely identifies a backup set of data
 to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files
 matching this timestamp key in the segment data directories db_dumps
 directory on the Greenplum Database array of hosts. 

-T <schema>.<table_name>

 Table names to restore, specify multiple times for multiple tables. The
 named table(s) must exist in the backup set of the database being restored.
 Existing tables are not automatically truncated before data is restored
 from backup. If your intention is to replace existing data in the table
 from backup, truncate the table prior to running gpdbrestore -T. 

-S <schema>

 Schema names to restore, specify multiple times for multiple schemas.
 Existing tables are not automatically truncated before data is restored
 from backup. If your intention is to replace existing data in the table
 from backup, truncate the table prior to running gpdbrestore -S. 

--truncate

 Truncate table data before restoring data to the table from the backup.
 This option is supported only when restoring a set of tables with the
 option -T or --table-file.
 This option is not supported with the -e option.

-u <backup_directory> 

 Specifies the absolute path to the directory containing the db_dumps
 directory on each host. If not specified, defaults to the data directory
 of each instance to be backed up. Specify this option if you specified a
 backup directory with the gpcrondump option -u when creating a backup
 set. 

 If <backup_directory> is not writable, backup operation report status
 files are written to segment data directories. You can specify a
 different location where report status files are written with the
 --report-status-dir option. 
恢复初体验;结果上一篇的备份命令来操作
1. 删表操作;模拟数据丢失
[gpadmin@mdw ~]$ psql lottu gpadmin
psql ()
Type "help" for help.

lottu=# \dt
                    List of relations
 Schema |        Name        | Type  |  Owner  | Storage
--------+--------------------+-------+---------+---------
 public | gpcrondump_history | table | gpadmin | heap
 public | lottu01            | table | gpadmin | heap
 public | lottu02            | table | gpadmin | heap
( rows)

lottu=# drop table lottu01;
DROP TABLE
lottu=# \q
[gpadmin@mdw ~]$ psql lottu gpadmin
psql ()
Type "help" for help.

lottu=# select * from lottu01;
ERROR:  relation "lottu01" does not exist
LINE : select * from lottu01
2.恢复操作
gpdbrestore -a -e --prefix lottu -u /home/gpadmin/backup --restore-stats include --report-status-dir /home/gpadmin/backup -t 20160713160238
[gpadmin
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Starting gpdbrestore with args: -a -e --prefix lottu -u /home/gpadmin/backup --restore-stats include --report-status-dir /home/gpadmin/backup -t
:::: gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Greenplum database restore parameters
:::: gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Restore type               = Full Database
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Database to be restored    = lottu
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Drop and re-create db      = On
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Restore method             = Restore specific timestamp
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Restore timestamp          =
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Restore compressed dump    = On
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Restore global objects     = Off
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Array fault tolerance      = f
:::: gpdbrestore:mdw:gpadmin-[INFO]:-------------------------------------------
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Dropping Database lottu
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Dropped Database lottu
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Invoking sql file: /home/gpadmin/backup/db_dumps//lottu_gp_cdatabase_1_1_20160713160238
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Creating gp_toolkit schema for database "lottu"
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Adding --prefix
:::: gpdbrestore:mdw:gpadmin-[INFO]:-gp_restore commandline: gp_restore -i -h mdw -p  -U gpadmin --gp-i --prefix=lottu_ --gp-k= --gp-l=p --gp-d=/home/gpadmin/backup/db_dumps/ --gp-r=/home/gpadmin/backup --status=/home/gpadmin/backup --gp-c -d "lottu":
:::: gpdbrestore:mdw:gpadmin-[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Updating AO/CO statistics on master
:::: gpdbrestore:mdw:gpadmin-[INFO]:-No AO/CO tables restored, skipping statistics update...
:::: gpdbrestore:mdw:gpadmin-[INFO]:-Commencing restore of statistics
这里有个【WARNING】;暂时不管他;后面讲解
3. 数据验证是否恢复
[gpadmin@mdw ~]$ psql lottu gpadmin
psql ()
Type "help" for help.

lottu=# select * from lottu01;
 id |  name
----+---------
   | lottu1
   | lottu3
   | lottu5
   | lottu7
   | lottu9
   | lottu2
   | lottu4
   | lottu6
   | lottu8
  | lottu10
( rows)

lottu=# \dt
              List of relations
 Schema |  Name   | Type  |  Owner  | Storage
--------+---------+-------+---------+---------
 public | lottu01 | table | gpadmin | heap
 public | lottu02 | table | gpadmin | heap
( rows)
在第2步有 “recreate database”的操作;数据经验证;数据表的数据跟之前是一致的。数据数据是恢复OK的。但是表gpcrondump_history是消失了但是2步出现“[WARNING]:-gpdbrestore finished but ERRORS were found, please check the restore report file for details”
查看恢复操作日志出现error的地方 “ERROR:  constraint "lottu01_pkey" does not exist。”
lottu=# \d lottu01
           Table "public.lottu01"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 id     | integer               | not null
 name   ) |
Indexes:
    "lottu01_pkey" PRIMARY KEY, btree (id)
Distributed by: (id)
恢复后的结果lottu01表确实是存在 主键约束。这个出现是误报吗?。这个应该是存在的“BUG”.假如约束没有恢复;可以手动执行*_post_data这样一个文件。
特别注意
. gpdbrestore -e 参数表示恢复前是否执行 drop database, 然后执行 create database。所以如果目标环境没有对应的数据库的话,不需要加-e参数,否则会报错。表级恢复也不要使用-e。
. 如果 gpcrondump 时使用了-C 参数, 则恢复时会先执行DROP TABLE再执行建表的动作。
. 如果 gpcrondump 时没有使用 -C 参数,参数恢复时想先清理数据的话,可以使用gpdbrestore的--truncate参数(--truncate只能是表级恢复模式下使用, 即与-T . 或 --table-file 一同使用)
. Greenplum不允许删除模板库, 所以如果使用-e恢复模板库,会报错。 解决方法是改gpcrondump代码,对于模板库特殊处理,例如drop schema的方式清理模板库, 跳过模板库的DROP database报错以及create database 报错。

综上所述:gpdbrestore所带的参数取决于gpcrondump备份的参数是怎么选择的。这个用于数据库迁移(只限于配置相同架构的数据库)

参考文献:https://yq.aliyun.com/articles/30331?spm=5176.8067842.tagmain.48.etfAn9