C#正则表达式中的组

时间:2021-08-29 20:15:09

I'm using the following tester to try and figure out this regex: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

我正在使用以下测试人员来试图找出这个正则表达式:http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

My input: 123stringA 456 stringB

我的输入:123stringA 456 stringB

My pattern: ([0-9]{3})(.*?)

我的模式:([0-9] {3})(。*?)

The pattern will eventually be a date but for this question's sake, I'll keep it simple and use my simplified input.

该模式最终将是一个日期,但为了这个问题,我将保持简单并使用我的简化输入。

The way I understand this pattern, it's "give me 3 numbers [0-9]{3}, followed by any number of characters of any kind .*, until it reaches the next match ?

我理解这种模式的方式是“给我3个数字[0-9] {3},然后是任意数量的任意数字。*,直到它到达下一个匹配?

What I want/expect out of this test is 2 matches with 2 groups each:
Match 1
   Group 1 - 123
   Group 2 - stringA
Match2
   Group 1 - 456
   Group 2 - stringB

我想要/期望的测试是2场比赛,每场2组:比赛1组1 - 123组2 - stringA Match2组1 - 456组2 - stringB

For some reason, the tester at the link I provided sees that there is a second group, but it's coming up blank. I have done this with PHP before and it seemed to work as I described, but in C# I'm seeing different results. Any help you can provide would be appreciated.

出于某种原因,我提供的链接上的测试人员看到有第二组,但它是空白的。我以前用PHP做过这个,它似乎按照我的描述工作,但在C#中我看到了不同的结果。您可以提供的任何帮助将不胜感激。

I should also note that this could expand multiple lines...

我还应该注意,这可以扩展多条线......

  • EDIT *
  • 编辑*

Here's the actual input: 2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager

以下是实际输入:2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - 加载扩展时出错2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager

For match 1 I'm wanting to get: 2011-08-09 09:25:57 and ,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension

对于匹配1我想得到:2011-08-09 09:25:57和,069 [9] Orchard.Environment.Extensions.ExtensionManager - 加载扩展程序时出错

and for match 2: 2011-08-09 09:25:57 and ,493 [8] Orchard.Environment.Extensions.ExtensionManager

并为匹配2:2011-08-09 09:25:57和,493 [8] Orchard.Environment.Extensions.ExtensionManager

I'm trying to find a good way to parse an error log file that's in one giant text file and maintain the date the error happened and the details that went along with it

我正在尝试找到一种解析错误日志文件的好方法,该文件位于一个巨大的文本文件中,并保持错误发生的日期以及随之而来的详细信息

4 个解决方案

#1


2  

The first group matches 3 digits and the second group matches the remainder of the string because there's nothing in the pattern to prevent the .*? from not matching the remainder of the string.

第一组匹配3位数,第二组匹配字符串的其余部分,因为模式中没有任何内容可以阻止。*?从不匹配字符串的其余部分。

CORRECTION: The second group matches an empty string because there's nothing in the pattern to prevent the .*? from not matching an empty string.

更正:第二组匹配一个空字符串,因为模式中没有任何内容可以阻止。*?从不匹配空字符串。

#2


1  

.* means match anything zero or more times. ? Mean to find the minimal number of times, so it chooses zero matches as the minimum.

。*表示匹配任何零次或多次。 ?意味着找到最小次数,因此它选择零匹配作为最小值。

Try this pattern, ([0-9]{3})([a-zA-Z]*)

试试这个模式,([0-9] {3})([a-zA-Z] *)

#3


0  

According to your comment, this is what you want to match

根据你的评论,这是你想要匹配的

2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - Error loading extension

2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - 加载扩展时出错2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - 加载扩展时出错

This expression will match the Date in the first capturing group and the rest till the next date OR till the end of the string in the second capturing group.

此表达式将匹配第一个捕获组中的日期和其余日期,直到下一个日期或直到第二个捕获组中的字符串结尾。

(\d{4}(?:-\d{2}){2})(.*?)(?=(?:\d{4}(?:-\d{2}){2}|$))

See it here on Regexr

在Regexr上看到它

#4


0  

Not sure why the tool gives you that, but you can switch to this alternative pattern that works in .Net

不知道为什么该工具会为您提供,但您可以切换到在.Net中工作的替代模式

([0-9]{3})([^0-9]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

Explanation:

说明:

In your previous pattern, the nongreedy version was matching 0 characters.

在您之前的模式中,nongreedy版本匹配0个字符。

In the new one, [^0-9] says match any character other than the range 0-9 (note the negation ^ specifier).

在新的一个中,[^ 0-9]表示匹配除0-9范围之外的任何字符(注意否定^说明符)。

Update: Given the actual input string (in comments), the pattern changes to (its a guess assuming what the OP wants to do:

更新:给定实际输入字符串(在注释中),模式更改为(它猜测假设OP想要做什么:

,([0-9]{3})([^\n]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

#1


2  

The first group matches 3 digits and the second group matches the remainder of the string because there's nothing in the pattern to prevent the .*? from not matching the remainder of the string.

第一组匹配3位数,第二组匹配字符串的其余部分,因为模式中没有任何内容可以阻止。*?从不匹配字符串的其余部分。

CORRECTION: The second group matches an empty string because there's nothing in the pattern to prevent the .*? from not matching an empty string.

更正:第二组匹配一个空字符串,因为模式中没有任何内容可以阻止。*?从不匹配空字符串。

#2


1  

.* means match anything zero or more times. ? Mean to find the minimal number of times, so it chooses zero matches as the minimum.

。*表示匹配任何零次或多次。 ?意味着找到最小次数,因此它选择零匹配作为最小值。

Try this pattern, ([0-9]{3})([a-zA-Z]*)

试试这个模式,([0-9] {3})([a-zA-Z] *)

#3


0  

According to your comment, this is what you want to match

根据你的评论,这是你想要匹配的

2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - Error loading extension 2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - Error loading extension

2011-08-09 09:25:57,069 [9] Orchard.Environment.Extensions.ExtensionManager - 加载扩展时出错2011-08-09 09:25:57,493 [8] Orchard.Environment.Extensions.ExtensionManager - 加载扩展时出错

This expression will match the Date in the first capturing group and the rest till the next date OR till the end of the string in the second capturing group.

此表达式将匹配第一个捕获组中的日期和其余日期,直到下一个日期或直到第二个捕获组中的字符串结尾。

(\d{4}(?:-\d{2}){2})(.*?)(?=(?:\d{4}(?:-\d{2}){2}|$))

See it here on Regexr

在Regexr上看到它

#4


0  

Not sure why the tool gives you that, but you can switch to this alternative pattern that works in .Net

不知道为什么该工具会为您提供,但您可以切换到在.Net中工作的替代模式

([0-9]{3})([^0-9]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

Explanation:

说明:

In your previous pattern, the nongreedy version was matching 0 characters.

在您之前的模式中,nongreedy版本匹配0个字符。

In the new one, [^0-9] says match any character other than the range 0-9 (note the negation ^ specifier).

在新的一个中,[^ 0-9]表示匹配除0-9范围之外的任何字符(注意否定^说明符)。

Update: Given the actual input string (in comments), the pattern changes to (its a guess assuming what the OP wants to do:

更新:给定实际输入字符串(在注释中),模式更改为(它猜测假设OP想要做什么:

,([0-9]{3})([^\n]*)

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1

http://regexhero.net/tester/?id=155b8e2b-b851-46b9-8a84-b82f8d6963a1