正则表达式修饰词m和s的区别?

时间:2022-05-27 20:13:54

I often forget about the regular expression modifiers m and s and their differences. What is a good way to remember them?

我经常忘记正则表达式修饰符m和s以及它们之间的差异。什么是记住它们的好方法?

As I understand them, they are:

据我了解,他们是:

'm' is for multiline, so that ^ and $ will match beginning of string and end of string multiple times. (as divided by \n)

“m”是为多行,所以^和$匹配字符串的开始和结束的字符串多次。(如除以\ n)

's' is so that the dot will match even the newline character

's'是这样的,所以点甚至会匹配换行符。

Often, I just use

通常,我只是使用

/some_pattern/ism

But it probably is better to use them accordingly (usually "s" in my cases).

但最好是相应地使用它们(在我的例子中通常是“s”)。

What do you think can be a good way to remember them, instead of forgetting which is which every time?

你认为什么是记住它们的好方法,而不是每次都忘记它?

3 个解决方案

#1


16  

It's not uncommon to find someone who's been using regexes for years who still doesn't understand how those two modifiers work. As you observed, the names "multiline" and "singleline" are not very helpful. They sound like they must be mutually exclusive, but they're completely independent. I suggest you ignore the names and concentrate on what they do: m changes the behavior of the anchors (^ and $), and s changes the behavior of the dot (.).

找到一个使用regexes数年的人仍然不明白这两个修饰符是如何工作的,这并不少见。正如您所看到的,“multiline”和“singleline”的名称并不是很有帮助。它们听起来像是相互排斥的,但它们是完全独立的。我建议你忽略的名字和专注于他们的工作:m变化的行为锚(^和$),和s变化点(.)的行为。

One prominent person who mixed up the modes is the author of Ruby. He created his own regex implementation based on Perl's, except he decided to have ^ and $ always be line anchors--that is, multiline mode is always on. Unfortunately, he also incorrectly named the dot-matches-everything mode multiline. So Ruby has no s modifier, but its m modifier does what s does in other flavors.

其中一个突出的人物是Ruby的作者。他在Perl的基础上创建了自己的regex实现,但他决定要使用“$”和“$”始终是行锚——也就是说,多行模式始终是on。不幸的是,他也不正确地命名了dot-matches- all模式multiline。所以Ruby没有s修饰符,但是它的m修饰符做的是其他的味道。

As for always using /ism, I recommend against it. It's mostly harmless, as you've discovered, but it sends a confusing message to anyone else who's trying to figure out what the regex was supposed to do (or even to yourself, in the future).

至于经常使用/使用,我建议你不要使用它。这基本上是无害的,正如您所发现的,但是它向其他人发送了一个令人困惑的消息,他们试图弄清楚regex应该做什么(或者甚至是对您自己,在未来)。

#2


10  

I like the explanation in 'man perlre':

我喜欢《man perlre》中的解释:

m Treat string as multiple lines.
s Treat string as single line.

m把字符串看成多行。将字符串视为单行。

With multiple lines, ^ and $ apply to individual lines (i.e. just before and after newlines).
With a single line, ^ and $ apply to the whole, and \n just becomes another character you can match.

与多个行,^和$适用于个人行(即前后换行)。一行,^和$适用于整个,\ n就变成了另一个角色可以匹配。

[Wrong]By using both m and s as you described, I would expect the second one to take precedence, so you would always be in multiline mode with /ism.[/Wrong]

[错误]通过使用m和s来描述,我期望第二个优先级,所以您总是处于多行模式。[/错误]

I didn't read far enough:
The "/s" and "/m" modifiers both override the $* setting. That is, no matter what $* contains, "/s" without "/m" will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

我读得不够多:“/s”和“/m”修饰符都覆盖了$*设置。也就是说,不管$*包含什么,“/s”没有“/m”将迫使“”只在字符串的开头匹配,而“$”只能在字符串的末尾(或者在末尾的换行符之前)匹配。/女士在一起,他们让“。”匹配任何字符,同时仍然允许“^”和“$”比赛,分别后,就在字符串中的换行。

#3


1  

maybe this way, i will never forget:

也许这样,我永远不会忘记:

when i want to match across lines (usually using .*? to match something that doesn't matter if it span across multiple line), i will naturally think of multiline, and therefore, 'm'. Well, 'm' is actually not the one, so it is 's'.

当我想要跨线匹配时(通常是用*?)为了匹配不影响跨多行的内容,我自然会想到多行,因此,“m”。“m”实际上不是1,所以它是s。

(since i already remember 'ism' so well... so i can always remember it is not 'm', then it must be 's').

(因为我已经记得“ism”好了……所以我可以永远记住它不是'm',那么它一定是's'。

other lame attempt includes:

其他的尝试包括:

s is for DOTALL, it is for DOT to match ALL.
m is multiline -- it is for ^ and $ to match a lot of times.

s代表DOTALL,它是圆点匹配所有。m是多行——这是^和$匹配很多时候。

#1


16  

It's not uncommon to find someone who's been using regexes for years who still doesn't understand how those two modifiers work. As you observed, the names "multiline" and "singleline" are not very helpful. They sound like they must be mutually exclusive, but they're completely independent. I suggest you ignore the names and concentrate on what they do: m changes the behavior of the anchors (^ and $), and s changes the behavior of the dot (.).

找到一个使用regexes数年的人仍然不明白这两个修饰符是如何工作的,这并不少见。正如您所看到的,“multiline”和“singleline”的名称并不是很有帮助。它们听起来像是相互排斥的,但它们是完全独立的。我建议你忽略的名字和专注于他们的工作:m变化的行为锚(^和$),和s变化点(.)的行为。

One prominent person who mixed up the modes is the author of Ruby. He created his own regex implementation based on Perl's, except he decided to have ^ and $ always be line anchors--that is, multiline mode is always on. Unfortunately, he also incorrectly named the dot-matches-everything mode multiline. So Ruby has no s modifier, but its m modifier does what s does in other flavors.

其中一个突出的人物是Ruby的作者。他在Perl的基础上创建了自己的regex实现,但他决定要使用“$”和“$”始终是行锚——也就是说,多行模式始终是on。不幸的是,他也不正确地命名了dot-matches- all模式multiline。所以Ruby没有s修饰符,但是它的m修饰符做的是其他的味道。

As for always using /ism, I recommend against it. It's mostly harmless, as you've discovered, but it sends a confusing message to anyone else who's trying to figure out what the regex was supposed to do (or even to yourself, in the future).

至于经常使用/使用,我建议你不要使用它。这基本上是无害的,正如您所发现的,但是它向其他人发送了一个令人困惑的消息,他们试图弄清楚regex应该做什么(或者甚至是对您自己,在未来)。

#2


10  

I like the explanation in 'man perlre':

我喜欢《man perlre》中的解释:

m Treat string as multiple lines.
s Treat string as single line.

m把字符串看成多行。将字符串视为单行。

With multiple lines, ^ and $ apply to individual lines (i.e. just before and after newlines).
With a single line, ^ and $ apply to the whole, and \n just becomes another character you can match.

与多个行,^和$适用于个人行(即前后换行)。一行,^和$适用于整个,\ n就变成了另一个角色可以匹配。

[Wrong]By using both m and s as you described, I would expect the second one to take precedence, so you would always be in multiline mode with /ism.[/Wrong]

[错误]通过使用m和s来描述,我期望第二个优先级,所以您总是处于多行模式。[/错误]

I didn't read far enough:
The "/s" and "/m" modifiers both override the $* setting. That is, no matter what $* contains, "/s" without "/m" will force "^" to match only at the beginning of the string and "$" to match only at the end (or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character whatsoever, while still allowing "^" and "$" to match, respectively, just after and just before newlines within the string.

我读得不够多:“/s”和“/m”修饰符都覆盖了$*设置。也就是说,不管$*包含什么,“/s”没有“/m”将迫使“”只在字符串的开头匹配,而“$”只能在字符串的末尾(或者在末尾的换行符之前)匹配。/女士在一起,他们让“。”匹配任何字符,同时仍然允许“^”和“$”比赛,分别后,就在字符串中的换行。

#3


1  

maybe this way, i will never forget:

也许这样,我永远不会忘记:

when i want to match across lines (usually using .*? to match something that doesn't matter if it span across multiple line), i will naturally think of multiline, and therefore, 'm'. Well, 'm' is actually not the one, so it is 's'.

当我想要跨线匹配时(通常是用*?)为了匹配不影响跨多行的内容,我自然会想到多行,因此,“m”。“m”实际上不是1,所以它是s。

(since i already remember 'ism' so well... so i can always remember it is not 'm', then it must be 's').

(因为我已经记得“ism”好了……所以我可以永远记住它不是'm',那么它一定是's'。

other lame attempt includes:

其他的尝试包括:

s is for DOTALL, it is for DOT to match ALL.
m is multiline -- it is for ^ and $ to match a lot of times.

s代表DOTALL,它是圆点匹配所有。m是多行——这是^和$匹配很多时候。