正则表达式返回非捕获组?

时间:2021-06-21 22:33:37

Okay, so I am new to regex, or at least to actually writing them, but here is what I have:

好吧,所以我是正则表达式的新手,或者至少是真正写它们的,但这就是我所拥有的:

The string:

LDAP://CN=Doe\, John,OU=Users,DC=my,DC=domain

The regex (that is not working as expected):

正则表达式(没有按预期工作):

(?:LDAP://CN=)([a-zA-Z]+\\?[,\s]?\s?[a-zA-Z]+)

Groups matched:

LDAP://CN=Doe\, Joe
Doe\, John

Captured group:

LDAP://CN=Doe\, John

What I want to return:

我要回报的内容:

Doe, John

By my understanding (which is obviously not correct) I was under the impression that if I included ?: for a captured group it would not return it in the match; and likewise, I do not want to return \ before the , in the middle of the name – which I actually do not know how to exclude a character in a returned result as such. Anyone able to shine some light on the matter?

根据我的理解(这显然不正确)我的印象是,如果我包括?:对于一个被捕获的组,它不会在比赛中返回它;同样,我不想在名称中间返回\之前 - 我实际上不知道如何在返回的结果中排除字符。谁能够对这件事情有所了解?

#

[update]

I was able to get the results being doing the following (I'm using powershell btw):

我能够得到以下结果(我正在使用powershell btw):

$qryResult = "LDAP://CN=Doe\, John,OU=Users,DC=my,DC=domain"  
[regex]$re = "LDAP://CN=(.*?),OU"  
$result = $re.Match($qryResult)  
(($result.Value -replace "LDAP://CN=","") -replace "\\","") -replace ",OU",""  

But it would nice to use regex from start to finish replacing the text like so. It's possible?

但是从头到尾用正则表达式替换文本就好了。这是可能的?

2 个解决方案

#1


1  

Change your PowerShell to this:

将PowerShell更改为:

$qryResult = "LDAP://CN=Doe\, John,OU=Users,DC=my,DC=domain"  
[regex]$re = "LDAP://CN=(.*?),OU"  
$result = $re.Match($qryResult).Groups[1]
($result.Value -replace "LDAP://CN=","") -replace "\\",""

It will target only the second group (your capture group): .Groups[1]

它将仅针对第二组(您的捕获组):。组[1]

#2


1  

You may get the required results with a single -replace:

您可以通过单个替换获得所需的结果:

PS> $rx = "LDAP://CN=(\p{L}+)(?:\\?,)?(\s*\p{L}+)?,OU=.*"
PS> $res = $qryResult -replace $rx, '$1$2'
PS> $res
Doe John

Details:

  • LDAP://CN= - a literal character sequence
  • LDAP:// CN = - 文字字符序列

  • (\p{L}+) - Group 1 capturing 1+ letters
  • (\ p {L} +) - 第1组捕获1+个字母

  • (?:\\?,)? - an optional sequence of an optional \ and a comma
  • (?:\\?)? - 可选\和逗号的可选序列

  • (\s*\p{L}+)? - an optional Group 2 capturing 0+ whitespaces and 1+ letters
  • (\ S * \ p {L} +)? - 可选的第2组,捕获0+空格和1+个字母

  • ,OU=.* - ,OU= literal char sequdnce and then any 0+ chars other than a newline symbol.
  • ,OU =。* - ,OU = literal char sequdnce,然后是换行符号以外的任何0+字符。

#1


1  

Change your PowerShell to this:

将PowerShell更改为:

$qryResult = "LDAP://CN=Doe\, John,OU=Users,DC=my,DC=domain"  
[regex]$re = "LDAP://CN=(.*?),OU"  
$result = $re.Match($qryResult).Groups[1]
($result.Value -replace "LDAP://CN=","") -replace "\\",""

It will target only the second group (your capture group): .Groups[1]

它将仅针对第二组(您的捕获组):。组[1]

#2


1  

You may get the required results with a single -replace:

您可以通过单个替换获得所需的结果:

PS> $rx = "LDAP://CN=(\p{L}+)(?:\\?,)?(\s*\p{L}+)?,OU=.*"
PS> $res = $qryResult -replace $rx, '$1$2'
PS> $res
Doe John

Details:

  • LDAP://CN= - a literal character sequence
  • LDAP:// CN = - 文字字符序列

  • (\p{L}+) - Group 1 capturing 1+ letters
  • (\ p {L} +) - 第1组捕获1+个字母

  • (?:\\?,)? - an optional sequence of an optional \ and a comma
  • (?:\\?)? - 可选\和逗号的可选序列

  • (\s*\p{L}+)? - an optional Group 2 capturing 0+ whitespaces and 1+ letters
  • (\ S * \ p {L} +)? - 可选的第2组,捕获0+空格和1+个字母

  • ,OU=.* - ,OU= literal char sequdnce and then any 0+ chars other than a newline symbol.
  • ,OU =。* - ,OU = literal char sequdnce,然后是换行符号以外的任何0+字符。