I've been playing around with a little shell script to get some info out of a HTML page downloaded with lynx.
我一直在使用一个小shell脚本从lynx下载的HTML页面中获取一些信息。
My problem is that I get this string: <span class="val3">MPPTN: 0.9384</span></td>
我的问题是我得到了这个字符串:MPPTN: 0.9384
I can trim the first part of that using:
我可以把第一部分修剪一下:
trimmed_info=`echo ${info/'<span class="val3">'/}`
And the string becomes: "MPPTN: 0.9384"
字符串变成"MPPTN: 0。9384"
But how can I trim the last part? Seem like the "/" is messing up with the echo command... I tried:
但是我怎么才能把最后的部分修剪一下呢?好像"/"把echo命令搞砸了……我试着:
echo ${finalt/'</span></td>'/};
4 个解决方案
#1
4
The behavior of ${VARIABLE/PATTERN/REPLACEMENT}
depends on what shell you're using, and for bash what version. Under ksh, or under recent enough (I think ≥ 4.0) versions of bash, ${finalt/'</span></td>'/}
strips that substring as desired. Under older versions of bash, the quoting is rather quirky; you need to write ${finalt/<\/span><\/td>/}
(which still works in newer versions).
${VARIABLE/PATTERN/ replace}的行为取决于您使用的shell以及bash的版本。ksh,或在最近足够(我认为≥4.0)版本的bash,$ { finalt / ' < / span > < / td > / }条,子字符串。在旧版本的bash中,引用是相当古怪的;您需要编写${finalt/<\/span><\/td>/}(它仍然适用于更新的版本)。
Since you're stripping a suffix, you can use the ${VARIABLE%PATTERN}
or ${VARIABLE%%PATTERN}
construct instead. Here, you're removing everything after the first </
, i.e. the longest suffix that matches the pattern </*
. Similarly, you can strip the leading HTML tags with ${VARIABLE##PATTERN}
.
因为要去掉后缀,所以可以使用${VARIABLE%PATTERN}或${VARIABLE%PATTERN}构造。这里,您要删除第一个
trimmed=${finalt%%</*}; trimmed=${trimmed##*>}
Added benefit: unlike ${…/…/…}
, which is specific to bash/ksh/zsh and works slightly differently in all three, ${…#…}
and ${…%…}
are fully portable. They don't do as much, but here they're sufficient.
附加好处:与${…/…}不同,${…/…}是针对bash/ksh/zsh的,并且在这三种方法中工作方式略有不同,${……}和${…%}是完全可移植的。它们做的不多,但在这里它们是充分的。
Side note: although it didn't cause any problem in this particular instance, you should always put double quotes around variable substitutions, e.g.
旁注:虽然在这个特殊的例子中它不会引起任何问题,但是你应该在变量替换前后加上双引号,例如
echo "${finalt/'</span></td>'/}"
Otherwise the shell will expand wildcards and spaces in the result. The simple rule is that if you don't have a good reason to leave the double quotes out, you put them.
否则,shell将在结果中展开通配符和空格。简单的规则是,如果你没有很好的理由把双引号去掉,那就把它们放进去。
#2
6
Not sure if using sed is ok -- one way to extract out the number could be something like ...
不确定使用sed是否可以——提取数字的一种方法可能是……
echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//'
#3
2
The solution largely depends on what exactly you want to do. If all your strings are going to be of the form <span class="val3">XXXXX: X.XXXX</span></td>
, then the simplest solution is
解决方案很大程度上取决于你想做什么。如果所有的字符串都是 XXXXX: X。XXXX,那么最简单的解决方案是 类="val3">
echo $info | cut -c 20-32
If they're of the form <span class="val3">variable length</span></td>
, then the simplest solution is
如果它们是变量长度,那么最简单的解决方案是。
echo $info | sed 's/<span class="val3">//' | sed 's/<\/span><\/td>//'
If it's more general, you can use regexes like in Sai's answer.
如果是更一般的,可以使用Sai的回答。
#4
1
I'd recommend using the sed
command for this kind of thing:
我建议你使用sed命令来做这样的事情:
echo "$string" | sed "s/$regex/$replace/"
#1
4
The behavior of ${VARIABLE/PATTERN/REPLACEMENT}
depends on what shell you're using, and for bash what version. Under ksh, or under recent enough (I think ≥ 4.0) versions of bash, ${finalt/'</span></td>'/}
strips that substring as desired. Under older versions of bash, the quoting is rather quirky; you need to write ${finalt/<\/span><\/td>/}
(which still works in newer versions).
${VARIABLE/PATTERN/ replace}的行为取决于您使用的shell以及bash的版本。ksh,或在最近足够(我认为≥4.0)版本的bash,$ { finalt / ' < / span > < / td > / }条,子字符串。在旧版本的bash中,引用是相当古怪的;您需要编写${finalt/<\/span><\/td>/}(它仍然适用于更新的版本)。
Since you're stripping a suffix, you can use the ${VARIABLE%PATTERN}
or ${VARIABLE%%PATTERN}
construct instead. Here, you're removing everything after the first </
, i.e. the longest suffix that matches the pattern </*
. Similarly, you can strip the leading HTML tags with ${VARIABLE##PATTERN}
.
因为要去掉后缀,所以可以使用${VARIABLE%PATTERN}或${VARIABLE%PATTERN}构造。这里,您要删除第一个
trimmed=${finalt%%</*}; trimmed=${trimmed##*>}
Added benefit: unlike ${…/…/…}
, which is specific to bash/ksh/zsh and works slightly differently in all three, ${…#…}
and ${…%…}
are fully portable. They don't do as much, but here they're sufficient.
附加好处:与${…/…}不同,${…/…}是针对bash/ksh/zsh的,并且在这三种方法中工作方式略有不同,${……}和${…%}是完全可移植的。它们做的不多,但在这里它们是充分的。
Side note: although it didn't cause any problem in this particular instance, you should always put double quotes around variable substitutions, e.g.
旁注:虽然在这个特殊的例子中它不会引起任何问题,但是你应该在变量替换前后加上双引号,例如
echo "${finalt/'</span></td>'/}"
Otherwise the shell will expand wildcards and spaces in the result. The simple rule is that if you don't have a good reason to leave the double quotes out, you put them.
否则,shell将在结果中展开通配符和空格。简单的规则是,如果你没有很好的理由把双引号去掉,那就把它们放进去。
#2
6
Not sure if using sed is ok -- one way to extract out the number could be something like ...
不确定使用sed是否可以——提取数字的一种方法可能是……
echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//'
#3
2
The solution largely depends on what exactly you want to do. If all your strings are going to be of the form <span class="val3">XXXXX: X.XXXX</span></td>
, then the simplest solution is
解决方案很大程度上取决于你想做什么。如果所有的字符串都是 XXXXX: X。XXXX,那么最简单的解决方案是 类="val3">
echo $info | cut -c 20-32
If they're of the form <span class="val3">variable length</span></td>
, then the simplest solution is
如果它们是变量长度,那么最简单的解决方案是。
echo $info | sed 's/<span class="val3">//' | sed 's/<\/span><\/td>//'
If it's more general, you can use regexes like in Sai's answer.
如果是更一般的,可以使用Sai的回答。
#4
1
I'd recommend using the sed
command for this kind of thing:
我建议你使用sed命令来做这样的事情:
echo "$string" | sed "s/$regex/$replace/"