I am new to regex and I am trying to come up with something that will match a text like below:
我是regex的新手,我正在尝试找到一些匹配如下文本的东西:
ABC: (z) jan 02 1999 \n
ABC: (z) 1999年1月02日\n
Notes:
注:
- text will always begin with "ABC:"
- 文本总是以“ABC:”开头
- there may be zero, one or more spaces between ':' and (z).
- 在':'和(z)之间可能有一个或多个空格。
- Variations of (z) also possible - (zz), (zzzzzz).. etc but always a non-digit character enclosed in "()"
- (z)的变化也可能——(zz)、(zzzzzz)…但总是包含在"()"中的非数字字符
- there may be zero,one or more spaces between (z) and jan
- 在(z)和jan之间可能有一个或多个空格。
- jan could be jan, january, etc
- jan可以是jan, january等等
- date couldbe in any format and may/may not contain other text as part of it so I would really like to know if there is a regex I can use to capture anything and everything that is found between '(z)' and '\n'
- 日期可以是任何格式,也可以/不可以包含其他文本作为它的一部分,所以我很想知道是否有一个regex可以用于捕获'(z)'和'\n'之间的所有内容
Any help is greatly appreciated! Thank you
非常感谢您的帮助!谢谢你!
3 个解决方案
#1
30
The following should work:
以下工作:
ABC: *\([a-zA-Z]+\) *(.+)
Explanation:
解释:
ABC: # match literal characters 'ABC:'
* # zero or more spaces
\([a-zA-Z]+\) # one or more letters inside of parentheses
* # zero or more spaces
(.+) # capture one or more of any character (except newlines)
To get your desired grouping based on the comments below, you can use the following:
要根据下面的评论获得所需的分组,您可以使用以下方法:
(ABC:) *(\([a-zA-Z]+\).+)
#2
4
Without knowing the exact regex implementation you're making use of, I can only give general advice. (The syntax I will be perl as that's what I know, some languages will require tweaking)
如果不知道您正在使用的确切的regex实现,我只能给出一般的建议。(我将使用perl作为语法,因为我知道,有些语言需要进行调整)
Looking at ABC: (z) jan 02 1999 \n
看ABC: (z) 1999年1月02日
- The first thing to match is ABC: So using our regex is
/ABC:/
- 首先要匹配的是ABC:所以使用我们的regex是/ABC:/。
-
You say ABC is always at the start of the string so
/^ABC/
will ensure that ABC is at the start of the string.你说美国广播公司总是在字符串的开始/ ^校正/将确保在字符串的开始。
-
You can match spaces with the
\s
(note the case) directive. With all directives you can match one or more with+
(or 0 or more with*
)您可以将空格与\s(注意到案例)指令相匹配。使用所有指令,您可以将一个或多个指令与+(或多个指令与*)匹配
-
You need to escape the usage of
(
and)
as it's a reserved character. so\(\)
您需要避免使用(and),因为它是一个保留字符。所以\(\)
-
You can match any non space or newline character with
.
您可以匹配任何非空格或换行字符。
-
You can match anything at all with
.*
but you need to be careful you're not too greedy and capture everything.你可以和任何东西搭配。*但是你需要小心,你不是太贪心,什么都能捕捉到。
So in order to capture what you've asked. I would use /^ABC:\s*\(.+?\)\s*(.+)$/
为了抓住你的问题。我将使用/ ^ ABC:\ s * \(+ ? \)\ s *(. +)/美元
Which I read as:
我理解为:
Begins with ABC:
始于美国广播公司(ABC):
May have some spaces
可能有一些空间
has (
有(
has some characters
有一些字符
has )
)
may have some spaces
可能有一些空间
then capture everything until the end of the line (which is
$
).然后捕获所有内容,直到行尾(即$)。
I highly recommend keeping a copy of the following laying about http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
我强烈建议保存以下关于http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/的文章
#3
0
This should fulfill your requirements.
这应该满足您的要求。
ABC:\s*(\(\D+\)\s*.*?)\\n
美国广播公司(ABC):\ s *(\ \ D + \ \ s *。* ?)\ \ n
Here it is with some tests http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRiEjiUM/index.html
这里有一些测试http://www.reg解释器.com/cookbook/ahjzfnjlz2v4cgxhbmv0lwhyzhnydgbljly2lwzriejium/index.html
Futher reading on regular expressions: http://www.regular-expressions.info/characters.html
进一步阅读正则表达式:http://www.regular-expressions.info/characters.html
#1
30
The following should work:
以下工作:
ABC: *\([a-zA-Z]+\) *(.+)
Explanation:
解释:
ABC: # match literal characters 'ABC:'
* # zero or more spaces
\([a-zA-Z]+\) # one or more letters inside of parentheses
* # zero or more spaces
(.+) # capture one or more of any character (except newlines)
To get your desired grouping based on the comments below, you can use the following:
要根据下面的评论获得所需的分组,您可以使用以下方法:
(ABC:) *(\([a-zA-Z]+\).+)
#2
4
Without knowing the exact regex implementation you're making use of, I can only give general advice. (The syntax I will be perl as that's what I know, some languages will require tweaking)
如果不知道您正在使用的确切的regex实现,我只能给出一般的建议。(我将使用perl作为语法,因为我知道,有些语言需要进行调整)
Looking at ABC: (z) jan 02 1999 \n
看ABC: (z) 1999年1月02日
- The first thing to match is ABC: So using our regex is
/ABC:/
- 首先要匹配的是ABC:所以使用我们的regex是/ABC:/。
-
You say ABC is always at the start of the string so
/^ABC/
will ensure that ABC is at the start of the string.你说美国广播公司总是在字符串的开始/ ^校正/将确保在字符串的开始。
-
You can match spaces with the
\s
(note the case) directive. With all directives you can match one or more with+
(or 0 or more with*
)您可以将空格与\s(注意到案例)指令相匹配。使用所有指令,您可以将一个或多个指令与+(或多个指令与*)匹配
-
You need to escape the usage of
(
and)
as it's a reserved character. so\(\)
您需要避免使用(and),因为它是一个保留字符。所以\(\)
-
You can match any non space or newline character with
.
您可以匹配任何非空格或换行字符。
-
You can match anything at all with
.*
but you need to be careful you're not too greedy and capture everything.你可以和任何东西搭配。*但是你需要小心,你不是太贪心,什么都能捕捉到。
So in order to capture what you've asked. I would use /^ABC:\s*\(.+?\)\s*(.+)$/
为了抓住你的问题。我将使用/ ^ ABC:\ s * \(+ ? \)\ s *(. +)/美元
Which I read as:
我理解为:
Begins with ABC:
始于美国广播公司(ABC):
May have some spaces
可能有一些空间
has (
有(
has some characters
有一些字符
has )
)
may have some spaces
可能有一些空间
then capture everything until the end of the line (which is
$
).然后捕获所有内容,直到行尾(即$)。
I highly recommend keeping a copy of the following laying about http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
我强烈建议保存以下关于http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/的文章
#3
0
This should fulfill your requirements.
这应该满足您的要求。
ABC:\s*(\(\D+\)\s*.*?)\\n
美国广播公司(ABC):\ s *(\ \ D + \ \ s *。* ?)\ \ n
Here it is with some tests http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRiEjiUM/index.html
这里有一些测试http://www.reg解释器.com/cookbook/ahjzfnjlz2v4cgxhbmv0lwhyzhnydgbljly2lwzriejium/index.html
Futher reading on regular expressions: http://www.regular-expressions.info/characters.html
进一步阅读正则表达式:http://www.regular-expressions.info/characters.html