美元符号在正则表达式和新行字符

时间:2023-02-08 21:47:26

I know that the dollar sign is used to match the character at the end of the string, to make sure that search does not stop in the middle of the string but instead goes on till the end of the string.

我知道美元符号用于匹配字符串末尾的字符,以确保搜索不会在字符串的中间停止,而是一直持续到字符串的末尾。

But how does it deal with the newline character, does it match just before the new line character or does it take that into account.

但是它是如何处理换行字符的,它是在换行字符之前匹配还是考虑到这一点。

I checked it in eclipse regex, for a regex matching array of strings ([A-Za-z ]+)$\n worked, not the other way around ([A-Za-z ]+\n)$

我在eclipse regex中检查了它,发现regex匹配字符串数组([a- za -z]+)$\n起作用,而不是相反([a- za -z]+\n)$

2 个解决方案

#1


11  

Note that ^ and $ are zero-width tokens. So, they don't match any character, but rather matches a position.

注意,^和$任意令牌。所以,它们不匹配任何字符,而是匹配一个位置。

  • ^ matches the position before the first character in a string.
  • ^匹配字符串中的第一个字符前的位置。
  • $ matches the position before the first newline in the string.
  • $匹配字符串中第一个换行之前的位置。

So, the String before the $ would of course not include the newline, and that is why ([A-Za-z ]+\n)$ regex of yours failed, and ([A-Za-z ]+)$\n succeeded.

因此,$之前的字符串当然不包含换行符,这就是为什么([A-Za-z]+\n)$ regex失败,以及([A-Za-z]+)$\n成功的原因。

In simple words, your $ should be followed by a newline, and no other character.

简单地说,您的$应该后跟一个换行符,而不是其他字符。

#2


9  

If the pattern ends with a newline then $ usually matches before that character. That goes at least for Perl, PCRE, Java and .NET. (edit: as Tim Pietzker points out in a comment, \r is not considered a line break by .NET)

如果模式以换行结束,那么$通常会在该字符之前匹配。这至少适用于Perl、PCRE、Java和。net。(编辑:正如Tim Pietzker在评论中指出的,\r不被认为是。net的断行)。

This was introduced, because input that is read from a line is terminated with a newline (at least in Perl), which can be conveniently ignored this way.

这是引入的,因为从一行读取的输入以换行符(至少在Perl中是这样)结束,这样可以很方便地忽略换行符。

Use \z to signify the very end of the string (if it's supported by your regex engine).

使用\z来表示字符串的末尾(如果它受到regex引擎的支持)。

Source

#1


11  

Note that ^ and $ are zero-width tokens. So, they don't match any character, but rather matches a position.

注意,^和$任意令牌。所以,它们不匹配任何字符,而是匹配一个位置。

  • ^ matches the position before the first character in a string.
  • ^匹配字符串中的第一个字符前的位置。
  • $ matches the position before the first newline in the string.
  • $匹配字符串中第一个换行之前的位置。

So, the String before the $ would of course not include the newline, and that is why ([A-Za-z ]+\n)$ regex of yours failed, and ([A-Za-z ]+)$\n succeeded.

因此,$之前的字符串当然不包含换行符,这就是为什么([A-Za-z]+\n)$ regex失败,以及([A-Za-z]+)$\n成功的原因。

In simple words, your $ should be followed by a newline, and no other character.

简单地说,您的$应该后跟一个换行符,而不是其他字符。

#2


9  

If the pattern ends with a newline then $ usually matches before that character. That goes at least for Perl, PCRE, Java and .NET. (edit: as Tim Pietzker points out in a comment, \r is not considered a line break by .NET)

如果模式以换行结束,那么$通常会在该字符之前匹配。这至少适用于Perl、PCRE、Java和。net。(编辑:正如Tim Pietzker在评论中指出的,\r不被认为是。net的断行)。

This was introduced, because input that is read from a line is terminated with a newline (at least in Perl), which can be conveniently ignored this way.

这是引入的,因为从一行读取的输入以换行符(至少在Perl中是这样)结束,这样可以很方便地忽略换行符。

Use \z to signify the very end of the string (if it's supported by your regex engine).

使用\z来表示字符串的末尾(如果它受到regex引擎的支持)。

Source