如何使用awk的正则表达式提取括号之间的子串？

In the following Bash command line, I am able to obtain the index for the substring, when the substring is between double quotes.

在下面的Bash命令行中,当子字符串在双引号之间时,我能够获得子字符串的索引。

text='123ABCabc((XYZabc((((((abc123(((123'

echo $text | awk '{ print index($0, "((((a" )}'  # 20 is the result.

However, in my application, I will not know what character will be where the "a" is in this example. Therefore, I thought I could replace the "a" with a regex that accepted any character other than "(". I thought that /[^(}/ would be what I needed. However, I have been unable to get the Awk index command to work with any form of regex in place of the "((((a" in the example.

但是,在我的应用程序中,我不知道在这个例子中“a”的字符是什么。因此,我认为我可以用接受“(”之外的任何字符的正则表达式替换“a”。我认为/ [^(} /将是我需要的。但是,我无法获得Awk索引命令使用任何形式的正则表达式代替“((((在示例中为”a“)。

UPDATE: It was pointed out by William Pursell that the index operation does not accept a regex as the second operand.

更新:William Pursell指出索引操作不接受正则表达式作为第二个操作数。

Ultimately, what I was trying to accomplish was to extract the substring that was located after four or more "(", followed by one or more ")". Dennis Williamson provided the solution with the following code:

最终,我试图完成的是提取位于四个或更多“(”,后跟一个或多个“)”之后的子串。 Dennis Williamson使用以下代码提供了解决方案:

echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | 
mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}'

Thanks to all for their help!

感谢所有人的帮助!

3 个解决方案

#1

To get the position of the first non-open-parenthesis after a sequence of them:

要在一系列序列之后得到第一个非开括号的位置:

$ echo "$text" | awk '{ print match($0, /\(\(\(\(([^(])/, arr); print arr[1, "start"]}'
20
24

This show the position of the substring "((([^(]" (20) and the position of the character after the parentheses (24).

这显示了子串“((([[^(]”(20))的位置以及括号(24)后面的字符的位置。

The ability to do this with match() is a GNU (gawk) extension.

使用match()执行此操作的能力是GNU(gawk)扩展。

Edit:

echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | 
    mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}'

#2

You want match instead of index. And you need to escape the (s. For example:

你想要匹配而不是索引。你需要逃避(s。例如:

echo $text | awk '{ print match($0, /\(\(\(\([^(]/) }'

Note that this does not give the index of the character after the string ((((, but the index of the first (.

请注意,这不会给出字符串后面的字符的索引((((,但是第一个的索引(。

#3

If you want to match four or more open-parentheses in order to find the start of yet another substring within the match, you actually have to calculate the value.

如果要匹配四个或更多个空括号以便在匹配中找到另一个子字符串的开头,则实际上必须计算该值。

# Use GNU AWK to index the character after the end of a substring.
echo "$text" |
awk --re-interval 'match( $0, /\({4,}/ ) { print RSTART + RLENGTH }'

This should give you the correct starting index of the character following the sequence of parentheses, which in this case is 24.

这应该为您提供括号序列后面的字符的正确起始索引,在本例中为24。

#1