如何使用sed从字符串中提取文本?

My example string is as follows:

我的示例字符串如下:

This is 02G05 a test string 20-Jul-2012

Now from the above string I want to extract 02G05. For that I tried the following regex with sed

现在我想从上面的字符串中提取02G05。为此，我使用sed尝试了以下regex

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'

But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.

但是上面的命令不打印任何内容，我认为原因是它不能与我提供给sed的模式匹配任何内容。

So, my question is what am I doing wrong here and how to correct it.

所以，我的问题是我哪里做错了，以及如何改正。

When I try the above string and pattern with python I get my result

当我使用python尝试上面的字符串和模式时，我得到了我的结果

>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>

5 个解决方案

#1

The pattern \d might not be supported by your sed. Try [0-9] or [[:digit:]] instead.

您的sed可能不支持模式\d。试试[0-9]或[[[:digit:]]]。

To only print the actual match (not the entire matching line), use a substitution.

若要仅打印实际匹配(而不是整个匹配行)，请使用替换。

sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'

#2

How about using egrep?

使用egrep怎么样?

echo "This is 02G05 a test string 20-Jul-2012" | egrep -o '[0-9]+G[0-9]+'

#3

sed doesn't recognize \d, use [[:digit:]] instead. You will also need to escape the + or use the -r switch (-E on OS X).

sed不识别\d，使用[[:数字:]]。您还需要转义+或使用-r开关(OS X上的-E)。

Note that [0-9] works as well for Arabic-Hindu numerals.

注意[0-9]对于阿拉伯-印度教数字也同样适用。

#4

Try this instead:

试试这个:

echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'

But note, if there is two pattern on one line, it will prints the 2nd.

但是注意，如果一行上有两个图案，它会打印第二行。

#5

Try using rextract ( https://github.com/kata198/rextract )

尝试使用rextract (https://github.com/kata19/rextract)

which will let you extract text using a regular expression and reformat it.

它将允许您使用正则表达式提取文本并重新格式化它。

Example:

例子:

[$] echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'

[美元]回声“这是02 g05测试字符串20 - 7 - 2012”|。/ rextract’((\ d)+ G(\ d)+)的“$ { 1 }”

2G05

2 g05

#1