这个正则表达式意味着什么？

/\ATo\:\s+(.*)/

Also, how do you work it out, what's the approach?

另外,你是如何解决的,方法是什么?

6 个解决方案

#1

You start from left and look for any escaped (ie \A) characters. The rest are normal characters. \A means the start of the input. So To: must be matched at the very beginning of the input. I think the : is escaped for nothing. \s is a character group for all spaces (tabs, spaces, possibly newlines) and the + that follows it means you must have one or more space characters. After that you capture all the rest of the line in a group (marked with ( )).

你从左边开始,寻找任何转义(即\ A)字符。其余的都是普通人物。 \ A表示输入的开始。所以To:必须在输入的最开始匹配。我认为:无所事事。 \ s是所有空格(制表符,空格,可能是换行符)的字符组,其后面的+表示您必须有一个或多个空格字符。之后,您将捕获组中的所有其余行(标有())。

If the input was

如果输入是

To:   progo@home

the capture group would contain "progo@home"

捕获组将包含“progo @ home”

#2

In multi-line regular expressions, \A matches the start of the string (and \Z is end of string, while ^/$ matches the start/end of the string or the start/end of a line). In single line variants, you just use ^ and $ for start and end of string/line since there is no distinction.

在多行正则表达式中,\ A匹配字符串的开头(\ Z是字符串的结尾,而^ / $匹配字符串的开头/结尾或行的开头/结尾)。在单行变体中,您只需使用^和$作为字符串/行的开头和结尾,因为没有区别。

To is literal, \: is an escaped :.

To是字面意思,\:是一个转义:。

\s means whitespace and the + means one or more of the preceding "characters" (white space in this case).

\ s表示空格,+表示前面的“字符”中的一个或多个(在这种情况下为空格)。

() is a capturing group, meaning everything in here will be stored in a "register" that you can use. Hence, this is the meat that will be extracted.

()是一个捕获组,这意味着这里的所有内容都将存储在您可以使用的“寄存器”中。因此,这是将被提取的肉。

.* simply means any non newline character ., zero or more times *.

。*仅表示任何非换行符。,零次或多次*。

So, what this regex will do is process a string like:

那么,这个正则表达式将要做的是处理一个字符串,如:

To: paxdiablo
Re: you are so cool!

and return the text paxdiablo.

并返回文本paxdiablo。

As to how to learn how to work this out yourself, the Perl regex tutorial^(a) is a good start, and then practise, practise, practise :-)

至于如何自己学习如何解决这个问题,Perl正则表达式教程(a)是一个好的开始,然后练习,练习,练习:-)

^(a) You haven't actually stated which regex implementation you're using but most modern ones are very similar to Perl. If you can find a specific tutorial for your particular flavour, that would obviously be better.

(a)您实际上没有说明您正在使用哪种正则表达式实现,但大多数现代实现与Perl非常相似。如果您能找到适合您特定风味的特定教程,那显然会更好。

#3

\A is a zero-width assertion and means "Match only at beginning of string".

\ A是零宽度断言,表示“仅在字符串的开头匹配”。

The regex reads: On a line beginning with "To:" followed by one or more whitespaces (\s), capture the remainder of the line ((.*)).

正则表达式读取:在以“To:”开头,后跟一个或多个空格(\ s)的行上,捕获行的其余部分((。*))。

#4

First, you need to know what the different character classes and quantifiers are. Character classes are the backslash-prefixed characters, \A from your regex, for instance. Quantifiers are for instance the +. There are several references on the internet, for instance this one.

首先,您需要知道不同的字符类和量词是什么。例如,字符类是反斜杠前缀字符,来自正则表达式的\ A.量词例如是+。互联网上有几个参考文献,例如这个参考文献。

Using that, we can see what happens by going left to right:

使用它,我们可以看到从左到右发生的事情:

\A matches a beginning of the string.

\ A匹配字符串的开头。

To matches the text "To" literally

从字面上匹配文本“To”

\: escapes the ":", so it loses it's special meaning and becomes "just a colon"

\:逃避“:”,所以它失去了它的特殊意义并变成“只是一个冒号”

\s matches whitespace (space, tab, etc)

\ s匹配空格(空格,制表符等)

+ means to match the previous class one or more times, so \s+ means one or more spaces

+表示匹配前一个类一次或多次,因此\ s +表示一个或多个空格

() is a capture group, anything matched within the parens is saved for later use

()是一个捕获组,保留在parens中的任何内容以供以后使用

. means "any character"

。意思是“任何角色”

* is like the +, but zero or more times, so .* means any number of any characters

*类似于+,但是零次或多次,所以。*表示任意数量的任何字符

Taking that together, the regex will match a string beginning with "To:", then at least one space, and the anything, which it will save. So, with the string "To: JaneKealum", you'll be able to extract "JaneKealum".

将它们放在一起,正则表达式将匹配以“To:”开头的字符串,然后是至少一个空格,以及它将保存的任何内容。因此,使用字符串“To:JaneKealum”,您将能够提取“JaneKealum”。

#5

It matches To: at the beginning of the input, followed by at least one whitespace, followed by any number of characters as a group.

它匹配To:在输入的开头,后跟至少一个空格,后跟任意数量的字符作为一组。

#6

The initial and trailing / characters delimit the regular expression.

初始值和尾随/字符分隔正则表达式。

A \ inside the expression means to treat the following character specially or treat it as a literal if it normally has a special meaning.

表达式中的\表示特别处理后面的字符,或者如果它通常具有特殊含义则将其视为文字。

The \A means match only at the beginning of a string.

\ A表示仅在字符串的开头匹配。

To means match the literal "To"

意味着匹配文字“To”

\: means match a literal ':'. A colon is normally a literal and has no special meaning it can be given.

\:表示匹配文字':'。冒号通常是文字,没有特殊意义可以给出。

\s means match a whitespace character.

\ s表示匹配空格字符。

+ means match as many as possible but at least one of whatever it follows, so \s+ means match one or more whitespace characters.

+表示尽可能多地匹配,但至少跟随其中的一个,所以\ s +表示匹配一个或多个空格字符。

The ( and ) define a group of characters that will be captured and returned by the expression evaluator.

(和)定义将由表达式赋值器捕获并返回的一组字符。

And finally the . matches any character and the * means match as many as possible but can be zero. Therefore the (.*) will capture all characters to the end of the input string.

最后是。匹配任何字符,*表示尽可能多的匹配,但可以为零。因此(。*)将捕获所有字符到输入字符串的末尾。

So therefore the pattern will match a string that starts "To:" and capture all characters that occur after the first succeeding non-whitespace character.

因此,模式将匹配以“To:”开头的字符串,并捕获在第一个后续非空白字符之后出现的所有字符。

The only way to really understand these things is to go through them one bit at a time and check the meaning of each component.

真正理解这些东西的唯一方法是一次一个地检查它们并检查每个组件的含义。

#1