I want to get backup, on my hadoop cluster, for some folders and files. I ran this command:
我想在hadoop集群上获得一些文件夹和文件的备份。我跑这个命令:
hadoop distcp -p -update -f hdfs://cluster1:8020/srclist hdfs://cluster2:8020/hdpBackup/
My srclist file :
我srclist文件:
hdfs://cluster1:8020/user/user1/folder1
hdfs://cluster1:8020/user/user1/folder2
hdfs://cluster1:8020/user/user1/file1
folder1
contains two files : part-00000 and part-00001
folder1包含两个文件:部分00000和部分00001。
folder2
contains two files : file and file_old
folder2包含两个文件:file和file_old。
That command works but explodes all folders contents.
该命令可以工作,但会爆炸所有文件夹的内容。
Result :
结果:
--hdpBackup
- part-00000
- part-00001
- file1
- file
- file_old
But I want to get result :
但是我想要得到结果:
--hdpBackup
- folder1
- folder2
- file1
I can not use hdfs://cluster1:8020/user/user1/* because user1 contains many folders and files.
我不能使用hdfs://cluster1:8020/user/user1/*,因为user1包含许多文件夹和文件。
How can I solve this problem ?
我该如何解决这个问题?
1 个解决方案
#1
2
Use the script below, it is shell programming:
使用下面的脚本,它是shell编程:
#!/bin/sh
for line in `awk '{print $1}' /home/Desktop/distcp/srclist`;
do
line1=$(echo $line | awk 'BEGIN{FS="/"}{print $NF}')
echo "$line $line1 file are source dest"
hadoop distcp $line hdfs://10.20.53.157/user/root/backup1/$line1
done
srclist
file needs to be in the local file system contails paths like:
srclist文件需要在本地文件系统中,比如:
hdfs://10.20.53.157/user/root/Wholefileexaple_1
hdfs://10.20.53.157/user/root/Wholefileexaple_2
#1
2
Use the script below, it is shell programming:
使用下面的脚本,它是shell编程:
#!/bin/sh
for line in `awk '{print $1}' /home/Desktop/distcp/srclist`;
do
line1=$(echo $line | awk 'BEGIN{FS="/"}{print $NF}')
echo "$line $line1 file are source dest"
hadoop distcp $line hdfs://10.20.53.157/user/root/backup1/$line1
done
srclist
file needs to be in the local file system contails paths like:
srclist文件需要在本地文件系统中,比如:
hdfs://10.20.53.157/user/root/Wholefileexaple_1
hdfs://10.20.53.157/user/root/Wholefileexaple_2