I'm trying to make a shell script that reads a list of download URLs to find if they're still active. I'm not sure what's wrong with my current script, (I'm new to this) and any pointers would be a huge help!
我正在尝试创建一个shell脚本来读取下载URL列表,以查找它们是否仍处于活动状态。我不确定我当前的脚本有什么问题,(我是新手),任何指针都会有很大的帮助!
user@pc:~/test# cat sites.list
user @ pc:〜/ test#cat sites.list
http://www.google.com/images/srpr/logo3w.png
http://www.google.com/doesnt.exist
notasite
Script:
#!/bin/bash
for i in `cat sites.list`
do
wget --spider $i -b
if grep --quiet "200 OK" wget-log; then
echo $i >> ok.txt
else
echo $i >> notok.txt
fi
rm wget-log
done
As is, the script outputs everything to notok.txt - (the first google site should go to ok.txt). But if I run:
原样,脚本将所有内容输出到notok.txt - (第一个google网站应该转到ok.txt)。但如果我跑:
wget --spider http://www.google.com/images/srpr/logo3w.png -b
And then do:
然后做:
grep "200 OK" wget-log
It greps the string without any problems. What noob mistake did I make with the syntax? Thanks m8s!
它没有任何问题地抓住字符串。我用语法做了什么noob错误?谢谢m8s!
2 个解决方案
#1
6
The -b option is sending wget to the background, so you're doing the grep before wget has finished.
-b选项是将wget发送到后台,所以你在wget完成之前就做了grep。
Try without the -b option:
尝试不使用-b选项:
if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then
#2
4
There are a few issues with what you're doing.
你正在做的事情有几个问题。
- Your
for i in
will have problems with lines that contain whitespace. Better to usewhile read
to read individual lines of a file. - You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
- Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
- wget isn't necessarily the best tool for this. I'd advise using
curl
instead.
你的in in将会出现包含空格的行的问题。最好在读取时使用以读取文件的各行。
你没有引用你的变量。如果文件中的一行(或一行中的单词)以连字符开头怎么办?然后wget会将其解释为一个选项。这里存在潜在的安全风险,也存在错误。
创建和删除文件并不是必需的。如果您所做的只是检查URL是否可访问,则可以在没有临时文件和额外代码的情况下执行此操作以删除它们。
wget不一定是最好的工具。我建议使用curl代替。
So here's a better way to handle this...
所以这是一个更好的方法来处理这个......
#!/bin/bash
sitelist="sites.list"
curl="/usr/bin/curl"
# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
echo "ERROR: Sitelist is missing." >&2
exit 1
elif [[ ! -s "$sitelist" ]]; then
echo "ERROR: Sitelist is empty." >&2
exit 1
elif [[ ! -x "$curl" ]]; then
echo "ERROR: I can't work under these conditions." >&2
exit 1
fi
# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar
while read url; do
# remove comments
url=${url%%#*}
# skip empty lines
if [[ -z "$url" ]]; then
continue
fi
# Handle just ftp, http and https.
# We could do full URL pattern matching, but meh.
case "$url" in
@(f|ht)tp?(s)://*)
# Get just the numeric HTTP response code
http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
case "$http_code" in
200|226)
# You'll get a 226 in ${http_code} from a valid FTP URL.
# If all you really care about is that the response is in the 200's,
# you could match against "2??" instead.
echo "$url" >> ok.txt
;;
*)
# You might want different handling for redirects (301/302).
echo "$url" >> notok.txt
;;
esac
;;
*)
# If we're here, we didn't get a URL we could read.
echo "WARNING: invalid url: $url" >&2
;;
esac
done < "$sitelist"
This is untested. For educational purposes only. May contain nuts.
这是未经测试的。仅用于教育目的。可能含有坚果。
#1
6
The -b option is sending wget to the background, so you're doing the grep before wget has finished.
-b选项是将wget发送到后台,所以你在wget完成之前就做了grep。
Try without the -b option:
尝试不使用-b选项:
if wget --spider $i 2>&1 | grep --quiet "200 OK" ; then
#2
4
There are a few issues with what you're doing.
你正在做的事情有几个问题。
- Your
for i in
will have problems with lines that contain whitespace. Better to usewhile read
to read individual lines of a file. - You aren't quoting your variables. What if a line in the file (or word in a line) starts with a hyphen? Then wget will interpret that as an option. You have a potential security risk here, as well as an error.
- Creating and removing files isn't really necessary. If all you're doing is checking whether a URL is reachable, you can do that without temp files and the extra code to remove them.
- wget isn't necessarily the best tool for this. I'd advise using
curl
instead.
你的in in将会出现包含空格的行的问题。最好在读取时使用以读取文件的各行。
你没有引用你的变量。如果文件中的一行(或一行中的单词)以连字符开头怎么办?然后wget会将其解释为一个选项。这里存在潜在的安全风险,也存在错误。
创建和删除文件并不是必需的。如果您所做的只是检查URL是否可访问,则可以在没有临时文件和额外代码的情况下执行此操作以删除它们。
wget不一定是最好的工具。我建议使用curl代替。
So here's a better way to handle this...
所以这是一个更好的方法来处理这个......
#!/bin/bash
sitelist="sites.list"
curl="/usr/bin/curl"
# Some errors, for good measure...
if [[ ! -f "$sitelist" ]]; then
echo "ERROR: Sitelist is missing." >&2
exit 1
elif [[ ! -s "$sitelist" ]]; then
echo "ERROR: Sitelist is empty." >&2
exit 1
elif [[ ! -x "$curl" ]]; then
echo "ERROR: I can't work under these conditions." >&2
exit 1
fi
# Allow more advanced pattern matching (for case..esac below)
shopt -s globstar
while read url; do
# remove comments
url=${url%%#*}
# skip empty lines
if [[ -z "$url" ]]; then
continue
fi
# Handle just ftp, http and https.
# We could do full URL pattern matching, but meh.
case "$url" in
@(f|ht)tp?(s)://*)
# Get just the numeric HTTP response code
http_code=$($curl -sL -w '%{http_code}' "$url" -o /dev/null)
case "$http_code" in
200|226)
# You'll get a 226 in ${http_code} from a valid FTP URL.
# If all you really care about is that the response is in the 200's,
# you could match against "2??" instead.
echo "$url" >> ok.txt
;;
*)
# You might want different handling for redirects (301/302).
echo "$url" >> notok.txt
;;
esac
;;
*)
# If we're here, we didn't get a URL we could read.
echo "WARNING: invalid url: $url" >&2
;;
esac
done < "$sitelist"
This is untested. For educational purposes only. May contain nuts.
这是未经测试的。仅用于教育目的。可能含有坚果。