如何使用shell脚本编制“/”?

时间:2022-10-16 00:13:34

I've been playing around with a little shell script to get some info out of a HTML page downloaded with lynx.

我一直在使用一个小shell脚本从lynx下载的HTML页面中获取一些信息。

My problem is that I get this string: <span class="val3">MPPTN: 0.9384</span></td>

我的问题是我得到了这个字符串:MPPTN: 0.9384

I can trim the first part of that using:

我可以把第一部分修剪一下:

trimmed_info=`echo ${info/'<span class="val3">'/}`

And the string becomes: "MPPTN: 0.9384"

字符串变成"MPPTN: 0。9384"

But how can I trim the last part? Seem like the "/" is messing up with the echo command... I tried:

但是我怎么才能把最后的部分修剪一下呢?好像"/"把echo命令搞砸了……我试着:

echo ${finalt/'</span></td>'/};

4 个解决方案

#1


4  

The behavior of ${VARIABLE/PATTERN/REPLACEMENT} depends on what shell you're using, and for bash what version. Under ksh, or under recent enough (I think ≥ 4.0) versions of bash, ${finalt/'</span></td>'/} strips that substring as desired. Under older versions of bash, the quoting is rather quirky; you need to write ${finalt/<\/span><\/td>/} (which still works in newer versions).

${VARIABLE/PATTERN/ replace}的行为取决于您使用的shell以及bash的版本。ksh,或在最近足够(我认为≥4.0)版本的bash,$ { finalt / ' < / span > < / td > / }条,子字符串。在旧版本的bash中,引用是相当古怪的;您需要编写${finalt/<\/span><\/td>/}(它仍然适用于更新的版本)。

Since you're stripping a suffix, you can use the ${VARIABLE%PATTERN} or ${VARIABLE%%PATTERN} construct instead. Here, you're removing everything after the first </, i.e. the longest suffix that matches the pattern </*. Similarly, you can strip the leading HTML tags with ${VARIABLE##PATTERN}.

因为要去掉后缀,所以可以使用${VARIABLE%PATTERN}或${VARIABLE%PATTERN}构造。这里,您要删除第一个

trimmed=${finalt%%</*}; trimmed=${trimmed##*>}

Added benefit: unlike ${…/…/…}, which is specific to bash/ksh/zsh and works slightly differently in all three, ${…#…} and ${…%…} are fully portable. They don't do as much, but here they're sufficient.

附加好处:与${…/…}不同,${…/…}是针对bash/ksh/zsh的,并且在这三种方法中工作方式略有不同,${……}和${…%}是完全可移植的。它们做的不多,但在这里它们是充分的。

Side note: although it didn't cause any problem in this particular instance, you should always put double quotes around variable substitutions, e.g.

旁注:虽然在这个特殊的例子中它不会引起任何问题,但是你应该在变量替换前后加上双引号,例如

echo "${finalt/'</span></td>'/}"

Otherwise the shell will expand wildcards and spaces in the result. The simple rule is that if you don't have a good reason to leave the double quotes out, you put them.

否则,shell将在结果中展开通配符和空格。简单的规则是,如果你没有很好的理由把双引号去掉,那就把它们放进去。

#2


6  

Not sure if using sed is ok -- one way to extract out the number could be something like ...

不确定使用sed是否可以——提取数字的一种方法可能是……

 echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//'

#3


2  

The solution largely depends on what exactly you want to do. If all your strings are going to be of the form <span class="val3">XXXXX: X.XXXX</span></td>, then the simplest solution is

解决方案很大程度上取决于你想做什么。如果所有的字符串都是 XXXXX: X。XXXX,那么最简单的解决方案是 类="val3">

echo $info | cut -c 20-32

If they're of the form <span class="val3">variable length</span></td>, then the simplest solution is

如果它们是变量长度,那么最简单的解决方案是。

echo $info | sed 's/<span class="val3">//' | sed 's/<\/span><\/td>//'

If it's more general, you can use regexes like in Sai's answer.

如果是更一般的,可以使用Sai的回答。

#4


1  

I'd recommend using the sed command for this kind of thing:

我建议你使用sed命令来做这样的事情:

echo "$string" | sed "s/$regex/$replace/"

#1


4  

The behavior of ${VARIABLE/PATTERN/REPLACEMENT} depends on what shell you're using, and for bash what version. Under ksh, or under recent enough (I think ≥ 4.0) versions of bash, ${finalt/'</span></td>'/} strips that substring as desired. Under older versions of bash, the quoting is rather quirky; you need to write ${finalt/<\/span><\/td>/} (which still works in newer versions).

${VARIABLE/PATTERN/ replace}的行为取决于您使用的shell以及bash的版本。ksh,或在最近足够(我认为≥4.0)版本的bash,$ { finalt / ' < / span > < / td > / }条,子字符串。在旧版本的bash中,引用是相当古怪的;您需要编写${finalt/<\/span><\/td>/}(它仍然适用于更新的版本)。

Since you're stripping a suffix, you can use the ${VARIABLE%PATTERN} or ${VARIABLE%%PATTERN} construct instead. Here, you're removing everything after the first </, i.e. the longest suffix that matches the pattern </*. Similarly, you can strip the leading HTML tags with ${VARIABLE##PATTERN}.

因为要去掉后缀,所以可以使用${VARIABLE%PATTERN}或${VARIABLE%PATTERN}构造。这里,您要删除第一个

trimmed=${finalt%%</*}; trimmed=${trimmed##*>}

Added benefit: unlike ${…/…/…}, which is specific to bash/ksh/zsh and works slightly differently in all three, ${…#…} and ${…%…} are fully portable. They don't do as much, but here they're sufficient.

附加好处:与${…/…}不同,${…/…}是针对bash/ksh/zsh的,并且在这三种方法中工作方式略有不同,${……}和${…%}是完全可移植的。它们做的不多,但在这里它们是充分的。

Side note: although it didn't cause any problem in this particular instance, you should always put double quotes around variable substitutions, e.g.

旁注:虽然在这个特殊的例子中它不会引起任何问题,但是你应该在变量替换前后加上双引号,例如

echo "${finalt/'</span></td>'/}"

Otherwise the shell will expand wildcards and spaces in the result. The simple rule is that if you don't have a good reason to leave the double quotes out, you put them.

否则,shell将在结果中展开通配符和空格。简单的规则是,如果你没有很好的理由把双引号去掉,那就把它们放进去。

#2


6  

Not sure if using sed is ok -- one way to extract out the number could be something like ...

不确定使用sed是否可以——提取数字的一种方法可能是……

 echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//'

#3


2  

The solution largely depends on what exactly you want to do. If all your strings are going to be of the form <span class="val3">XXXXX: X.XXXX</span></td>, then the simplest solution is

解决方案很大程度上取决于你想做什么。如果所有的字符串都是 XXXXX: X。XXXX,那么最简单的解决方案是 类="val3">

echo $info | cut -c 20-32

If they're of the form <span class="val3">variable length</span></td>, then the simplest solution is

如果它们是变量长度,那么最简单的解决方案是。

echo $info | sed 's/<span class="val3">//' | sed 's/<\/span><\/td>//'

If it's more general, you can use regexes like in Sai's answer.

如果是更一般的,可以使用Sai的回答。

#4


1  

I'd recommend using the sed command for this kind of thing:

我建议你使用sed命令来做这样的事情:

echo "$string" | sed "s/$regex/$replace/"