匹配配置块的多行regex

时间:2022-09-30 22:28:52

I am having some issues trying to match a certain config block (multiple ones) from a file. Below is the block that I'm trying to extract from the config file:

我在试图匹配文件中的某个配置块(多个)时遇到了一些问题。下面是我试图从配置文件中提取的块:

ap71xx 00-01-23-45-67-89
 use profile PROFILE
 use rf-domain DOMAIN
 hostname ACCESSPOINT
 area inside
!

There are multiple ones just like this, each with a different MAC address. How do I match a config block across multiple lines?

有多个这样的,每个都有不同的MAC地址。如何跨多行匹配配置块?

3 个解决方案

#1


32  

The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:

您可能遇到的第一个问题是,为了在多个行之间进行匹配,您需要将文件的内容作为单个字符串处理,而不是单独的一行。例如,如果你使用Get-Content来读取文件的内容,那么默认情况下,它会给你一个字符串数组——每一行的一个元素。要跨行匹配,需要将文件放在一个字符串中(并希望文件不是太大)。你可以这样做:

$fileContent = [io.file]::ReadAllText("C:\file.txt")

Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:

或者在PowerShell 3.0中,您可以使用带有-Raw参数的Get-Content:

$fileContent = Get-Content c:\file.txt -Raw

Then you need to specify a regex option to match across line terminators i.e.

然后您需要指定一个regex选项,以便跨行终止器进行匹配,例如。

  • SingleLine mode (. matches any char including line feed), as well as
  • (单行模式模式。匹配任何char,包括行提要),以及。
  • Multiline mode (^ and $ match embedded line terminators), e.g.
  • 多行模式(^和$匹配嵌入式行结束符),例如:
  • (?smi) - note the "i" is to ignore case
  • (?smi) -注意“i”是忽略大小写。

e.g.:

例如:

C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
        Foreach {$_.Matches} | Foreach {$_.Value}

00-01-23-45-67-89
 use profile PROFILE
 use rf-domain DOMAIN
 hostname ACCESSPOINT
 area inside
!
00-01-23-45-67-89
 use profile PROFILE
 use rf-domain DOMAIN
 hostname ACCESSPOINT
 area inside
!

Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.

使用Select-String cmdlet进行搜索,因为您可以指定-AllMatches,它将输出所有匹配项,而-match操作符将在第一个匹配项之后停止。这是有意义的,因为它是一个布尔运算符,只需要确定是否存在匹配。

#2


2  

This regex will search for the text ap followed by any number of characters and new lines ending with a !:

这个regex将搜索文本ap,后面跟着任意数量的字符和以a结尾的新行!

(?si)(a).+?\!{1}

So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:

所以我有点无聊。我编写了一个脚本,它将按照您所描述的拆分文本文件(只要它只包含您所显示的行)。它可能与其他随机行一起工作,只要它们不包含关键字:ap、概要、域、主机名或区域。它将导入它们,并逐行检查每个属性(MAC、Profile、domain、hostname、area),并将它们放入一个可以稍后使用的对象中。我知道这不是你想要的,但是因为我花了很多时间在这上面,希望它能被用来做一些好事。如果有人感兴趣,这是剧本。需要根据您的具体需求进行调整:

$Lines = Get-Content "c:\test\test.txt"
$varObjs = @()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
    #Checks to make sure the line isn't blank or a !. If it is, it skips to next line
    if ($Lines[$num] -match "!") {
        $varLast++
        continue
    }
    if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
        $varLast++
        continue
    }
    $Index = [array]::IndexOf($lines, $lines[$num])
    $b=0
    $varObj = New-Object System.Object
    while ($Lines[$num + $b] -notmatch "!" ) {
        #Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
        if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
        if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
        if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
        if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
        if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
        $b ++
    } #end While
    #Adds the $varObj to $varObjs for future use
    $varObjs += $varObj
    $varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs

#3


1  

In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:

如果这对某人来说仍然是有价值的,根据实际的需求,Keith的回答中regex并不需要那么复杂。如果用户只是想要输出每个块,以下内容就足够了:

$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
    Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Value }

The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.

regex ap71xx[^ !]* !不建议在正则表达式中使用.*,因为它可能产生意外结果。模式[^ !]+ !将匹配除感叹号之外的任何字符,后跟感叹号。

If the start of the block isn't required in the output, the updated script is:

如果输出中不需要开始块,则更新后的脚本为:

$fileContent |
    Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Groups[1] } |
    %{ $_.Value }

Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.

组[0]包含整个匹配的字符串,组[1]将在regex中的括号中包含字符串匹配。

If $fileContent isn't required for any further processing, the variable can be eliminated:

如果不需要$fileContent进行任何进一步处理,则可以消除该变量:

[io.file]::ReadAllText("c:\file.txt") |
    Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Groups[1] } |
    %{ $_.Value }

#1


32  

The first problem you may run into is that in order to match across multiple lines, you need to process the file's contents as a single string rather than by individual line. For example, if you use Get-Content to read the contents of the file then by default it will give you an array of strings - one element for each line. To match across lines you want the file in a single string (and hope the file isn't too huge). You can do this like so:

您可能遇到的第一个问题是,为了在多个行之间进行匹配,您需要将文件的内容作为单个字符串处理,而不是单独的一行。例如,如果你使用Get-Content来读取文件的内容,那么默认情况下,它会给你一个字符串数组——每一行的一个元素。要跨行匹配,需要将文件放在一个字符串中(并希望文件不是太大)。你可以这样做:

$fileContent = [io.file]::ReadAllText("C:\file.txt")

Or in PowerShell 3.0 you can use Get-Content with the -Raw parameter:

或者在PowerShell 3.0中,您可以使用带有-Raw参数的Get-Content:

$fileContent = Get-Content c:\file.txt -Raw

Then you need to specify a regex option to match across line terminators i.e.

然后您需要指定一个regex选项,以便跨行终止器进行匹配,例如。

  • SingleLine mode (. matches any char including line feed), as well as
  • (单行模式模式。匹配任何char,包括行提要),以及。
  • Multiline mode (^ and $ match embedded line terminators), e.g.
  • 多行模式(^和$匹配嵌入式行结束符),例如:
  • (?smi) - note the "i" is to ignore case
  • (?smi) -注意“i”是忽略大小写。

e.g.:

例如:

C:\> $fileContent | Select-String '(?smi)([0-9a-f]{2}(-|\s*$)){6}.*?!' -AllMatches |
        Foreach {$_.Matches} | Foreach {$_.Value}

00-01-23-45-67-89
 use profile PROFILE
 use rf-domain DOMAIN
 hostname ACCESSPOINT
 area inside
!
00-01-23-45-67-89
 use profile PROFILE
 use rf-domain DOMAIN
 hostname ACCESSPOINT
 area inside
!

Use the Select-String cmdlet to do the search because you can specify -AllMatches and it will output all matches whereas the -match operator stops after the first match. Makes sense because it is a Boolean operator that just needs to determine if there is a match.

使用Select-String cmdlet进行搜索,因为您可以指定-AllMatches,它将输出所有匹配项,而-match操作符将在第一个匹配项之后停止。这是有意义的,因为它是一个布尔运算符,只需要确定是否存在匹配。

#2


2  

This regex will search for the text ap followed by any number of characters and new lines ending with a !:

这个regex将搜索文本ap,后面跟着任意数量的字符和以a结尾的新行!

(?si)(a).+?\!{1}

So I was a little bored. I wrote a script that will break up the text file as you described (as long as it only contains the lines you displayed). It might work with other random lines, as long as they don't contain the key words: ap, profile, domain, hostname, or area. It will import them, and check line by line for each of the properties (MAC, Profile, domain, hostname, area) and place them into an object that can be used later. I know this isn't what you asked for, but since I spent time working on it, hopefully it can be used for some good. Here is the script if anyone is interested. It will need to be tweaked to your specific needs:

所以我有点无聊。我编写了一个脚本,它将按照您所描述的拆分文本文件(只要它只包含您所显示的行)。它可能与其他随机行一起工作,只要它们不包含关键字:ap、概要、域、主机名或区域。它将导入它们,并逐行检查每个属性(MAC、Profile、domain、hostname、area),并将它们放入一个可以稍后使用的对象中。我知道这不是你想要的,但是因为我花了很多时间在这上面,希望它能被用来做一些好事。如果有人感兴趣,这是剧本。需要根据您的具体需求进行调整:

$Lines = Get-Content "c:\test\test.txt"
$varObjs = @()
for ($num = 0; $num -lt $lines.Count; $num =$varLast ) {
    #Checks to make sure the line isn't blank or a !. If it is, it skips to next line
    if ($Lines[$num] -match "!") {
        $varLast++
        continue
    }
    if (([regex]::Match($Lines[$num],"^\s.*$")).success) {
        $varLast++
        continue
    }
    $Index = [array]::IndexOf($lines, $lines[$num])
    $b=0
    $varObj = New-Object System.Object
    while ($Lines[$num + $b] -notmatch "!" ) {
        #Checks line by line to see what it matches, adds to the $varObj when it finds what it wants.
        if ($Lines[$num + $b] -match "ap") { $varObj | Add-Member -MemberType NoteProperty -Name Mac -Value $([regex]::Split($lines[$num + $b],"\s"))[1] }
        if ($lines[$num + $b] -match "profile") { $varObj | Add-Member -MemberType NoteProperty -Name Profile -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
        if ($Lines[$num + $b] -match "domain") { $varObj | Add-Member -MemberType NoteProperty -Name rf-domain -Value $([regex]::Split($lines[$num + $b],"\s"))[3] }
        if ($Lines[$num + $b] -match "hostname") { $varObj | Add-Member -MemberType NoteProperty -Name hostname -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
        if ($Lines[$num + $b] -match "area") { $varObj | Add-Member -MemberType NoteProperty -Name area -Value $([regex]::Split($lines[$num + $b],"\s"))[2] }
        $b ++
    } #end While
    #Adds the $varObj to $varObjs for future use
    $varObjs += $varObj
    $varLast = ($b + $Index) + 2
}#End for ($num = 0; $num -lt $lines.Count; $num = $varLast)
#displays the $varObjs
$varObjs

#3


1  

In case this may still be of value to someone and depending on the actual requirement, the regex in Keith's answer doesn't need to be that complicated. If the user simply wants to output each block the following will suffice:

如果这对某人来说仍然是有价值的,根据实际的需求,Keith的回答中regex并不需要那么复杂。如果用户只是想要输出每个块,以下内容就足够了:

$fileContent = [io.file]::ReadAllText("c:\file.txt")
$fileContent |
    Select-String '(?smi)ap71xx[^!]+!' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Value }

The regex ap71xx[^!]*! will perform better and the use of .* in a regular expression is not recommended because it can generate unexpected results. The pattern [^!]+! will match any character except the exclamation mark, followed by the exclamation mark.

regex ap71xx[^ !]* !不建议在正则表达式中使用.*,因为它可能产生意外结果。模式[^ !]+ !将匹配除感叹号之外的任何字符,后跟感叹号。

If the start of the block isn't required in the output, the updated script is:

如果输出中不需要开始块,则更新后的脚本为:

$fileContent |
    Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Groups[1] } |
    %{ $_.Value }

Groups[0] contains the whole matched string, Groups[1] will contain the string match within the parentheses in the regex.

组[0]包含整个匹配的字符串,组[1]将在regex中的括号中包含字符串匹配。

If $fileContent isn't required for any further processing, the variable can be eliminated:

如果不需要$fileContent进行任何进一步处理,则可以消除该变量:

[io.file]::ReadAllText("c:\file.txt") |
    Select-String '(?smi)ap71xx([^!]+!)' -AllMatches |
    %{ $_.Matches } |
    %{ $_.Groups[1] } |
    %{ $_.Value }