So i'm attempting to create an alias/script to download all specific extensions from a website/directory using wget but i feel like there must be an easier way than what i've come up with.
所以我试图创建一个别名/脚本,使用wget从网站/目录下载所有特定扩展,但我觉得必须有一个比我想象的更简单的方法。
Right now the code i've come up with from searching Google and the man pages is:
现在我搜索谷歌和手册页的代码是:
wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
So in the example above i'm trying to download all the .tar.gz files from the OpenVZ precreated templates directory.
所以在上面的例子中,我试图从OpenVZ precreated templates目录下载所有.tar.gz文件。
The above code works correctly but I have to manually specify --cut-dirs=2 which would cut out the /template/precreated/ directory structure that would normally be created and it also downloads the robots.txt file.
上面的代码工作正常,但我必须手动指定--cut-dirs = 2,它会删除通常会创建的/ template / precreated /目录结构,并且还会下载robots.txt文件。
Now this isn't necessarily a problem and it's easy to just remove the robots.txt file but i was hoping i just missed something in the man pages that would allow me to do this same things without specifying the directory structure to cut out...
现在这不一定是一个问题,只是删除robots.txt文件很容易,但我希望我只是错过了手册页中的一些内容,这些内容可以让我做同样的事情,而无需指定要剪切的目录结构。 。
Thanks for any help ahead of time, it's greatly appreciated!
感谢您提前提供任何帮助,非常感谢!
2 个解决方案
#1
7
Use the -R
option
使用-R选项
-R robots.txt,unwanted-file.txt
as a reject list of files you don't want (comma-separated).
作为您不想要的文件的拒绝列表(以逗号分隔)。
As for scripting this:
至于编写脚本:
URL=http://download.openvz.org/template/precreated/
CUTS=`echo ${URL#http://} | awk -F '/' '{print NF -2}'`
wget -r -l1 -nH --cut-dirs=${CUTS} --no-parent -A.tar.gz --no-directories -R robots.txt ${URL}
That should work based on the subdirectories in your URL.
这应该基于您的URL中的子目录。
#2
2
I would suggest, if this is really annoying and you're having to do it a lot, to just write a really short two-line script to delete it for you:
我建议,如果这真的很烦人而且你不得不做很多事情,那就写一个非常简短的两行脚本来为你删除它:
wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
rm robots.txt
#1
7
Use the -R
option
使用-R选项
-R robots.txt,unwanted-file.txt
as a reject list of files you don't want (comma-separated).
作为您不想要的文件的拒绝列表(以逗号分隔)。
As for scripting this:
至于编写脚本:
URL=http://download.openvz.org/template/precreated/
CUTS=`echo ${URL#http://} | awk -F '/' '{print NF -2}'`
wget -r -l1 -nH --cut-dirs=${CUTS} --no-parent -A.tar.gz --no-directories -R robots.txt ${URL}
That should work based on the subdirectories in your URL.
这应该基于您的URL中的子目录。
#2
2
I would suggest, if this is really annoying and you're having to do it a lot, to just write a really short two-line script to delete it for you:
我建议,如果这真的很烦人而且你不得不做很多事情,那就写一个非常简短的两行脚本来为你删除它:
wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
rm robots.txt