My example string is as follows:
我的示例字符串如下:
This is 02G05 a test string 20-Jul-2012
Now from the above string I want to extract 02G05
. For that I tried the following regex with sed
现在我想从上面的字符串中提取02G05。为此,我使用sed尝试了以下regex
$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'
But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.
但是上面的命令不打印任何内容,我认为原因是它不能与我提供给sed的模式匹配任何内容。
So, my question is what am I doing wrong here and how to correct it.
所以,我的问题是我哪里做错了,以及如何改正。
When I try the above string and pattern with python I get my result
当我使用python尝试上面的字符串和模式时,我得到了我的结果
>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>
5 个解决方案
#1
58
The pattern \d
might not be supported by your sed
. Try [0-9]
or [[:digit:]]
instead.
您的sed可能不支持模式\d。试试[0-9]或[[[:digit:]]]。
To only print the actual match (not the entire matching line), use a substitution.
若要仅打印实际匹配(而不是整个匹配行),请使用替换。
sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'
#2
68
How about using egrep
?
使用egrep怎么样?
echo "This is 02G05 a test string 20-Jul-2012" | egrep -o '[0-9]+G[0-9]+'
#3
4
sed
doesn't recognize \d
, use [[:digit:]]
instead. You will also need to escape the +
or use the -r
switch (-E
on OS X).
sed不识别\d,使用[[:数字:]]。您还需要转义+或使用-r开关(OS X上的-E)。
Note that [0-9]
works as well for Arabic-Hindu numerals.
注意[0-9]对于阿拉伯-印度教数字也同样适用。
#4
4
Try this instead:
试试这个:
echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'
But note, if there is two pattern on one line, it will prints the 2nd.
但是注意,如果一行上有两个图案,它会打印第二行。
#5
0
Try using rextract ( https://github.com/kata198/rextract )
尝试使用rextract (https://github.com/kata19/rextract)
which will let you extract text using a regular expression and reformat it.
它将允许您使用正则表达式提取文本并重新格式化它。
Example:
例子:
[$] echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'
[美元]回声“这是02 g05测试字符串20 - 7 - 2012”|。/ rextract’((\ d)+ G(\ d)+)的“$ { 1 }”
2G05
2 g05
#1
58
The pattern \d
might not be supported by your sed
. Try [0-9]
or [[:digit:]]
instead.
您的sed可能不支持模式\d。试试[0-9]或[[[:digit:]]]。
To only print the actual match (not the entire matching line), use a substitution.
若要仅打印实际匹配(而不是整个匹配行),请使用替换。
sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'
#2
68
How about using egrep
?
使用egrep怎么样?
echo "This is 02G05 a test string 20-Jul-2012" | egrep -o '[0-9]+G[0-9]+'
#3
4
sed
doesn't recognize \d
, use [[:digit:]]
instead. You will also need to escape the +
or use the -r
switch (-E
on OS X).
sed不识别\d,使用[[:数字:]]。您还需要转义+或使用-r开关(OS X上的-E)。
Note that [0-9]
works as well for Arabic-Hindu numerals.
注意[0-9]对于阿拉伯-印度教数字也同样适用。
#4
4
Try this instead:
试试这个:
echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'
But note, if there is two pattern on one line, it will prints the 2nd.
但是注意,如果一行上有两个图案,它会打印第二行。
#5
0
Try using rextract ( https://github.com/kata198/rextract )
尝试使用rextract (https://github.com/kata19/rextract)
which will let you extract text using a regular expression and reformat it.
它将允许您使用正则表达式提取文本并重新格式化它。
Example:
例子:
[$] echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'
[美元]回声“这是02 g05测试字符串20 - 7 - 2012”|。/ rextract’((\ d)+ G(\ d)+)的“$ { 1 }”
2G05
2 g05