Linux命令sed中的正则表达式

I have a shell variable:

我有一个shell变量：

all_apk_file="a 1 2.apk x.apk y m.apk"

I want to replace the a 1 2.apk with TEST, using the command:

我想用TEST替换a 1 2.apk，使用命令：

echo $all_apk_file | sed 's/(.*apk ){1}/TEST/g'

The .*apk means end with apk, {1} means only match one time, but it doesn't work; I only got the original variable as output: a 1 2.apk x.apk y m.apk

。* apk表示以apk结尾，{1}表示只匹配一次，但不起作用;我只得到原始变量作为输出：a 1 2.apk x.apk y m.apk

Can anyone tell me why?

谁能告诉我为什么？

3 个解决方案

#1

One part of the problem is that in regular sed, the () and {} are ordinary characters in patterns until escaped with backslashes. Since there are no parentheses in the variable's value, the regex never matches. With GNU sed, you can also enable extended regular expressions with the -r flag. If you fix that problem, you will then run into the problem that .* is greedy, and the g modifier actually doesn't change anything:

问题的一部分是在常规sed中，（）和{}是模式中的普通字符，直到用反斜杠转义。由于变量值中没有括号，因此正则表达式永远不会匹配。使用GNU sed，您还可以使用-r标志启用扩展正则表达式。如果您解决了这个问题，那么您将遇到。*贪婪的问题，并且g修饰符实际上不会改变任何东西：

$ echo $all_apk_file | sed 's/\(.*apk \)\{1\}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/'
TESTy m.apk
$

It only stops there because there isn't a space after m.apk in the echoed value of the variable.

它只在那里停止，因为在变量的回显值中m.apk之后没有空格。

The issue now is: what is it that you want replaced? It sounds like 'everything up to and including the first occurrence of apk at the end of a word. This is probably most easily done with trailing context or non-greedy matching as found in Perl regular expressions. If switching to Perl is an option, do so. If not, it is not trivial in normal sed regular expressions.

现在的问题是：你想要替换的是什么？这听起来像'一切都包括在一个单词结尾处第一次出现的apk。对于Perl正则表达式中的尾随上下文或非贪婪匹配，这可能是最容易完成的。如果可以选择切换到Perl，请执行此操作。如果不是，那么在普通的sed正则表达式中它并不是微不足道的。

$ echo $all_apk_file | sed 's/^[^.]* [^.][^.]*\.apk /TEST /'
TEST x.apk y m.apk
$

This looks for anything without dots in it, followed by a blank, followed by no dots again, and .apk; this means that the first dot allowed is the one in 2.apk. It works for the sample data; it would not work if the variable contained:

这会查找没有点的任何内容，然后是空白，然后再没有点，和.apk;这意味着允许的第一个点是2.apk中的那个。它适用于样本数据;如果变量包含：

all_apk_file="a 1.2 2.apk m.apk y.apk 37"

You'll need to tune this to meet your requirements.

您需要对其进行调整以满足您的要求。

#2

First, to enable the regular expressions you're familiar with in sed, you need to use the -r switch (sed -r ...):

首先，要在sed中启用您熟悉的正则表达式，您需要使用-r开关（sed -r ...）：

echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
# returns TESTy m.apk

Look at what it returns: TESTy m.apk. This is because the .* is greedy, so it matches as much as it possibly can. That is, the .* matches a 1 2.apk x, and you've said you want to replace .*apk, being a 1 2.apk x.apk with 'TEST', resulting in TESTy m.apk (note the following space after the '.apk' in your regular expression, which is why the match doesn't extend all the way to the last '.apk', which has no space following it).

看看它的回报：TESTy m.apk。这是因为。*是贪婪的，所以它尽可能匹配。也就是说，。*匹配1 2.apk x，你说你要替换。* apk，是一个带有'TEST'的1 2.apk x.apk，导致TESTy m.apk（注意在正则表达式中的'.apk'之后跟空格，这就是为什么匹配不会一直延伸到最后一个'.apk'，后面没有空格的原因）。

Usually one could change the .* to .*? to make it non-greedy, but this behaviour is not supported in sed.

通常可以将。*更改为。*？使它非贪婪，但sed不支持此行为。

So, to fix it you just have to make your regex more restrictive.

所以，要修复它，你只需要让你的正则表达式更具限制性。

It is hard to tell what you want to do - remove the first three words where the third ends in '.apk' and replace with 'TEST'? In that case, one could use the regular expression:

很难说你想做什么 - 删除前三个单词，其中第三个单词以'.apk'结尾，并替换为'TEST'？在这种情况下，可以使用正则表达式：

[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk

in combination with the 'i' switch (case insensitive).

与'i'开关结合使用（不区分大小写）。

You will have to give your logic for deciding what to remove (first three words, any number of words up to the first '.apk' word, etc) in order for us to help you further with the regex.

你必须给出你的逻辑来决定删除什么（前三个单词，任意数量的单词直到第一个'.apk'单词等），以便我们帮助你进一步使用正则表达式。

Secondly, you've put the 'g' switch in your regex. This means that all matching patterns will be replaced, and you seem to only want the first to be replaced. So remove the 'g' switch.

其次，你把'g'开关放在你的正则表达式中。这意味着将替换所有匹配的模式，并且您似乎只希望第一个被替换。所以删除'g'开关。

Finally, all of thse in combination:

最后，所有这些组合：

echo $all_apk_file | sed -r 's/[a-z0-9]+ +[a-z0-9]+ +[a-z0-9]+\.apk/TEST/i'
# TEST x.apk y m.apk

#3

This might work for you:

这可能对你有用：

echo "$all_apk_file" | sed 's/apk/\n/;s/.*\n/TEST/'
TEST x.apk y m.apk

As to why your regexp did not work see @mathematical.coffee and @Jonathan Leffler's excellent explanations.

至于为什么你的正则表达式不起作用，请参阅@ mathematical.coffee和@Jonathan Leffler的优秀解释。

s/apk/\n/ is synonymous with s/apk/\n/1 which means replace the first occurence of apk with \n. As sed uses the \n as a record separator we know that it cannot occur in any initial strings passed to the sed commands. With these two facts under our belts we can split strings.

s / apk / \ n /是s / apk / \ n / 1的同义词，这意味着用\ n替换apk的第一次出现。由于sed使用\ n作为记录分隔符，我们知道它不会出现在传递给sed命令的任何初始字符串中。有了这两个事实，我们可以分裂字符串。

N.B. If you wanted to replace upto the second apk then s/apk/\n/2 would fit the bill. Of course for the last occurence of apk then .*apk comes into play.

注：如果你想要替换第二个apk，那么s / apk / \ n / 2将适合该法案。当然对于apk的最后一次出现然后。* apk开始发挥作用。

#1

$ echo $all_apk_file | sed 's/\(.*apk \)\{1\}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/g'
TESTy m.apk
$ echo $all_apk_file | sed -r 's/(.*apk ){1}/TEST/'
TESTy m.apk
$