你如何在sed中指定非捕获组?

时间:2022-01-09 12:11:22

is it possible to specify non-capturing groups in sed?

是否可以在sed中指定非捕获组?

if so, how?

如果是这样,怎么样?

4 个解决方案

#1


26  

Parentheses can be used for grouping alternatives. For example:

括号可用于分组备选方案。例如:

sed 's/a\(bc\|de\)f/X/'

says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.

说要用“X”替换“abcf”或“adef”,但括号也会被捕获。在没有捕获的情况下,sed中没有设施进行这样的分组。如果你有一个复杂的正则表达式同时进行替代分组和捕获,你只需要在替换中选择正确的捕获组时要小心。

Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.

也许您可以更多地说明您正在尝试完成的内容(您对非捕获组的需求是什么)以及您希望避免捕获组的原因。

Edit:

编辑:

There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).

有一种非捕获括号((?:pattern))是Perl兼容正则表达式(PCRE)的一部分。 sed不支持它们(但是在使用grep -P时)。

#2


5  

The answer, is that as of writing, you can't - sed does not support it. Sed supports BRE, and ERE, but not PCRE.

答案是,在写作时,你不能 - sed不支持它。 Sed支持BRE和ERE,但不支持PCRE。

(Note- One answer points out that BRE is also known as POSIX sed, and ERE is is a GNU extension via sed -r. Point remains that PCRE is not supported by sed. )

(注意 - 一个答案指出BRE也称为POSIX sed,ERE是通过sed -r的GNU扩展。点仍然是sed不支持PCRE。)

Perl will work, for windows or linux

对于Windows或Linux,Perl可以工作

examples here

这里的例子

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

e.g. this from cygwin in windows

例如这来自windows中的cygwin

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c

There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.

虽然Windows有一个程序,它可以在命令行上进行搜索和替换,并且支持PCRE。它被称为rxrepl。它当然不是sed,但它确实搜索并替换PCRE支持。

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c

C:\blah\rxrepl>

The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line

作者(不是我)在这里的答案中提到了他的程序https://superuser.com/questions/339118/regex-replace-from-command-line

It has a really good syntax.

它有一个非常好的语法。

The standard thing to use would be perl, or almost any other programming language that people use.

使用的标准内容是perl,或几乎任何人们使用的编程语言。

#3


3  

I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]

我假设你说的是反驳语法,它是圆括号()而不是括号[]

By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:

默认情况下,sed会逐字地解释()而不会尝试从它们进行反向反射。您将需要转义它们以使其特殊,如\(\)只有当您使用GNU sed -r选项时才会撤消转义。使用sed -r,非escapeped()将产生backrefences并转义\(\)将被视为文字。要遵循的例子:

POSIX sed

$ echo "foo(###)bar" | sed 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)

GNU sed -r

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)

$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

Update

From the comments:

来自评论:

Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:

仅限组,非捕获括号(),因此您可以使用诸如区间{n,m}之类的内容而不创建反向引用\ 1不存在。首先,间隔不是POSIX sed的一部分,你必须使用GNU -r扩展来启用它们。一旦启用-r,任何分组括号也将捕获以进行反向引用。例子:

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789

#4


0  

As said, it is not possible to have non-capturing groups in sed. It could be obvious but non-capturing groups are not a necessity. One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing. For reference, nested capturing groups are numbered by the position-order of "(".

如上所述,sed中不可能有非捕获组。这可能是显而易见的,但非捕获组不是必需的。可以使用所需的捕获并忽略非期望的捕获,就好像它们是非捕获的一样。作为参考,嵌套捕获组按“(”的位置顺序编号。

E.g.,

例如。,

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"

applex and bananasx and monkeys (note: "s" in bananas, first bigger group)

applex和bananasx和猴子(注意:香蕉中的“s”,第一个更大的组)

vs

VS

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"

applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

applex和bananax和猴子(注意:香蕉中没有“s”,第二小组)

#1


26  

Parentheses can be used for grouping alternatives. For example:

括号可用于分组备选方案。例如:

sed 's/a\(bc\|de\)f/X/'

says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.

说要用“X”替换“abcf”或“adef”,但括号也会被捕获。在没有捕获的情况下,sed中没有设施进行这样的分组。如果你有一个复杂的正则表达式同时进行替代分组和捕获,你只需要在替换中选择正确的捕获组时要小心。

Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.

也许您可以更多地说明您正在尝试完成的内容(您对非捕获组的需求是什么)以及您希望避免捕获组的原因。

Edit:

编辑:

There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).

有一种非捕获括号((?:pattern))是Perl兼容正则表达式(PCRE)的一部分。 sed不支持它们(但是在使用grep -P时)。

#2


5  

The answer, is that as of writing, you can't - sed does not support it. Sed supports BRE, and ERE, but not PCRE.

答案是,在写作时,你不能 - sed不支持它。 Sed支持BRE和ERE,但不支持PCRE。

(Note- One answer points out that BRE is also known as POSIX sed, and ERE is is a GNU extension via sed -r. Point remains that PCRE is not supported by sed. )

(注意 - 一个答案指出BRE也称为POSIX sed,ERE是通过sed -r的GNU扩展。点仍然是sed不支持PCRE。)

Perl will work, for windows or linux

对于Windows或Linux,Perl可以工作

examples here

这里的例子

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal

e.g. this from cygwin in windows

例如这来自windows中的cygwin

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a

$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c

There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.

虽然Windows有一个程序,它可以在命令行上进行搜索和替换,并且支持PCRE。它被称为rxrepl。它当然不是sed,但它确实搜索并替换PCRE支持。

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.

C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c

C:\blah\rxrepl>

The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line

作者(不是我)在这里的答案中提到了他的程序https://superuser.com/questions/339118/regex-replace-from-command-line

It has a really good syntax.

它有一个非常好的语法。

The standard thing to use would be perl, or almost any other programming language that people use.

使用的标准内容是perl,或几乎任何人们使用的编程语言。

#3


3  

I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]

我假设你说的是反驳语法,它是圆括号()而不是括号[]

By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:

默认情况下,sed会逐字地解释()而不会尝试从它们进行反向反射。您将需要转义它们以使其特殊,如\(\)只有当您使用GNU sed -r选项时才会撤消转义。使用sed -r,非escapeped()将产生backrefences并转义\(\)将被视为文字。要遵循的例子:

POSIX sed

$ echo "foo(###)bar" | sed 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)

GNU sed -r

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/@@@@/'
@@@@

$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)

$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe

Update

From the comments:

来自评论:

Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:

仅限组,非捕获括号(),因此您可以使用诸如区间{n,m}之类的内容而不创建反向引用\ 1不存在。首先,间隔不是POSIX sed的一部分,你必须使用GNU -r扩展来启用它们。一旦启用-r,任何分组括号也将捕获以进行反向引用。例子:

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789

$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789

#4


0  

As said, it is not possible to have non-capturing groups in sed. It could be obvious but non-capturing groups are not a necessity. One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing. For reference, nested capturing groups are numbered by the position-order of "(".

如上所述,sed中不可能有非捕获组。这可能是显而易见的,但非捕获组不是必需的。可以使用所需的捕获并忽略非期望的捕获,就好像它们是非捕获的一样。作为参考,嵌套捕获组按“(”的位置顺序编号。

E.g.,

例如。,

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"

applex and bananasx and monkeys (note: "s" in bananas, first bigger group)

applex和bananasx和猴子(注意:香蕉中的“s”,第一个更大的组)

vs

VS

echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"

applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

applex和bananax和猴子(注意:香蕉中没有“s”,第二小组)