目标:
1.传入3个参数:源文件路径,目标文件路径,md5文件
2.每周一实现全量备份,其余时间增量备份
1.通过传入的路径,获取该路径下面的所有目录和文件(递归)
方法一:使用os.listdir
代码如下:
#!/usr/bin/env python #coding:utf8 import os,sys def lsdir(folder): contents = os.listdir(folder) print "%s\n%s\n" % (folder, contents) for path in contents: full_path = os.path.join(folder, path) if os.path.isdir(full_path): lsdir(full_path) if __name__ == "__main__": lsdir(sys.argv[1])
•运行代码,效果如下:
[root@localhost python]# python listdir.py /a /a ['b', 'a.txt'] /a/b ['c', 'b.txt'] /a/b/c ['c.txt']
方法二:使用os.walk
代码如下:
#!/usr/bin/env python # -*- coding: utf-8 -*- import os,sys def lsdir(folder): contents = os.walk(folder) for path, folder, file in contents: print "%s\n%s\n" %(path, folder + file) if __name__ == "__main__": lsdir(sys.argv[1])
•运行代码,测试效果
[root@localhost python]# python listdir1.py /a /a ['b', 'a.txt'] /a/b ['c', 'b.txt'] /a/b/c ['c.txt']
2.如何计算文件的md5值(每次读取4K,直到读取完文件所有内容,返回一个16进制的md5值)
代码如下:
[root@localhost python]# cat md5.py
#!/usr/bin/env python # -*- coding: utf-8 -*- import hashlib import sys def md5(fname): m = hashlib.md5() with open(fname) as fobj: while True: data = fobj.read(4096) if not data: break m.update(data) return m.hexdigest() if __name__ == "__main__": print md5(sys.argv[1])
•运行代码,测试效果
[root@localhost python]# python md5.py a.txt c33da92372e700f98b006dfa5325cf0d [root@localhost python]# md5sum a.txt c33da92372e700f98b006dfa5325cf0d a.txt
*提示:使用linux自带的md5sum和自己编写的Python计算的md5值相通
3.编写全量和增量备份脚本
代码如下:
#!/usr/bin/env python #coding:utf8 import time import os import tarfile import cPickle as p import hashlib def md5check(fname): m = hashlib.md5() with open(fname) as fobj: while True: data = fobj.read(4096) if not data: break m.update(data) return m.hexdigest() def full_backup(src_dir, dst_dir, md5file): par_dir, base_dir = os.path.split(src_dir.rstrip('/')) back_name = '%s_full_%s.tar.gz' % (base_dir, time.strftime('%Y%m%d')) full_name = os.path.join(dst_dir, back_name) md5dict = {} tar = tarfile.open(full_name, 'w:gz') tar.add(src_dir) tar.close() for path, folders, files in os.walk(src_dir): for fname in files: full_path = os.path.join(path, fname) md5dict[full_path] = md5check(full_path) with open(md5file, 'w') as fobj: p.dump(md5dict, fobj) def incr_backup(src_dir, dst_dir, md5file): par_dir, base_dir = os.path.split(src_dir.rstrip('/')) back_name = '%s_incr_%s.tar.gz' % (base_dir, time.strftime('%Y%m%d')) full_name = os.path.join(dst_dir, back_name) md5new = {} for path, folders, files in os.walk(src_dir): for fname in files: full_path = os.path.join(path, fname) md5new[full_path] = md5check(full_path) with open(md5file) as fobj: md5old = p.load(fobj) with open(md5file, 'w') as fobj: p.dump(md5new, fobj) tar = tarfile.open(full_name, 'w:gz') for key in md5new: if md5old.get(key) != md5new[key]: tar.add(key) tar.close() if __name__ == '__main__': src_dir = '/Users/xkops/gxb/' dst_dir = '/tmp/' md5file = '/Users/xkops/md5.data' if time.strftime('%a') == 'Mon': full_backup(src_dir, dst_dir, md5file) else: incr_backup(src_dir, dst_dir, md5file)
•运行代码,测试效果(执行前,修改需要备份的文件和路径),运行之后检查/tmp下是否生成了当天的备份文件。