Findstr - Return only a regex match

时间:2021-05-16 08:56:48

I have this string in a text file (test.txt):

我在文本文件(test.txt)中有这个字符串:

BLA BLA BLA
BLA BLA
Found 11 errors and 7 warnings

I perform this command:

我执行这个命令:

findstr /r "[0-9]+ errors" test.txt

In order to get just 11 errors string.

为了得到11个错误字符串。

Instead, the output is:

相反,输出是:

Found 11 errors and 7 warnings

Can someone assist?

有人可以帮忙吗?

2 个解决方案

#1


1  

The findstr tool cannot be used to extract matches only. It is much easier to use Powershell for this.

findstr工具不能仅用于提取匹配项。为此,使用Powershell要容易得多。

Here is an example:

这是一个例子:

$input_path = 'c:\ps\in.txt'
$output_file = 'c:\ps\out.txt'
$regex = '[0-9]+ errors'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

See the Windows PowerShell: Extracting Strings Using Regular Expressions article on how to use the script above.

请参阅Windows PowerShell:使用正则表达式提取字符串有关如何使用上述脚本的文章。

#2


4  

findstr always returns every full line that contains a match, it is not capable of returning sub-strings only. Hence you need to do the sub-string extraction on your own. Anyway, there are some issues in your findstr command line, which I want to point out:

findstr总是返回包含匹配的每个完整行,它不能仅返回子字符串。因此,您需要自己进行子字符串提取。无论如何,你的findstr命令行中存在一些问题,我想指出:

The string parameter of findstr actually defines multiple search strings separated by white-spaces, so one search string is [0-9]+ and the other one is error. The line Found 11 errors and 7 warnings in your text file is returned because of the word error only, the numeric part is not part of the match, because findstr does not support the + character (one or more occurrences of previous character or class), you need to change that part of the search string to [0-9][0-9]* to achieve that. To treat the whole string as one search string, you need to provide the /C option; since this defaults to literal search mode, you additionally need to add the /R option explicitly.

findstr的字符串参数实际上定义了由空格分隔的多个搜索字符串,因此一个搜索字符串是[0-9] +而另一个是错误。由于只有单词错误,返回了在文本文件中找到11个错误和7个警告的行,数字部分不是匹配的一部分,因为findstr不支持+字符(前一个字符或类的一个或多个出现) ,你需要将搜索字符串的那一部分改为[0-9] [0-9] *来实现这一点。要将整个字符串视为一个搜索字符串,您需要提供/ C选项;由于默认为文字搜索模式,因此您还需要显式添加/ R选项。

findstr /R /C:"[0-9][0-9]* errors" "test.txt"

Changing all this would however also match strings like x5 errorse; to avoid that you could use word boundaries like \< (beginning of word) and \> (end of word). (Alternatively you could also include a space on either side of the search string, so /C:" [0-9][0-9]* errors ", but this might cause trouble if the search string appears at the very beginning or end of the applicable line.)

然而,改变所有这些也会匹配像x5 errorse这样的字符串;避免你可以使用像\ <(词的开头)和\>(词的结尾)这样的词边界。 (或者你也可以在搜索字符串的两边加一个空格,所以/ C:“[0-9] [0-9] * errors”,但如果搜索字符串出现在最开头或者这可能会造成麻烦适用行的结尾。)

So regarding all of the above, the corrected and improved command line looks like this:

因此,对于以上所有内容,更正和改进的命令行如下所示:

findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"

This will return the entire line containing a match:

这将返回包含匹配项的整行:

Found 11 errors and 7 warnings

If you want to return such lines only and exclude lines like 2 errors are enough or 35 warnings but less than 3 errors, you could of course extend the search string accordingly:

如果您只想返回这些行并排除2个错误就足够的行或35个警告但少于3个错误,您当然可以相应地扩展搜索字符串:

findstr /R /C:"^Found [0-9][0-9]* errors and [0-9][0-9]* warnings$" "test.txt"

Anyway, to extract the portion 11 errors there are several options:

无论如何,要提取部分11错误,有几个选项:

  1. a for /F loop could parse the output of findstr and extract certain tokens:

    for / F循环可以解析f​​indstr的输出并提取某些标记:

    for /F "tokens=2-3 delims= " %%E in ('
        findstr/R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do echo(%%E %%F
    
  2. the sub-string replacement syntax could also be used:

    也可以使用子字符串替换语法:

    for /F "delims=" %%L in ('
        findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do set "LINE=%%L"
    set "LINE=%LINE:* =%"
    set "LINE=%LINE: and =" & rem "%"
    echo(%LINE%
    

#1


1  

The findstr tool cannot be used to extract matches only. It is much easier to use Powershell for this.

findstr工具不能仅用于提取匹配项。为此,使用Powershell要容易得多。

Here is an example:

这是一个例子:

$input_path = 'c:\ps\in.txt'
$output_file = 'c:\ps\out.txt'
$regex = '[0-9]+ errors'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

See the Windows PowerShell: Extracting Strings Using Regular Expressions article on how to use the script above.

请参阅Windows PowerShell:使用正则表达式提取字符串有关如何使用上述脚本的文章。

#2


4  

findstr always returns every full line that contains a match, it is not capable of returning sub-strings only. Hence you need to do the sub-string extraction on your own. Anyway, there are some issues in your findstr command line, which I want to point out:

findstr总是返回包含匹配的每个完整行,它不能仅返回子字符串。因此,您需要自己进行子字符串提取。无论如何,你的findstr命令行中存在一些问题,我想指出:

The string parameter of findstr actually defines multiple search strings separated by white-spaces, so one search string is [0-9]+ and the other one is error. The line Found 11 errors and 7 warnings in your text file is returned because of the word error only, the numeric part is not part of the match, because findstr does not support the + character (one or more occurrences of previous character or class), you need to change that part of the search string to [0-9][0-9]* to achieve that. To treat the whole string as one search string, you need to provide the /C option; since this defaults to literal search mode, you additionally need to add the /R option explicitly.

findstr的字符串参数实际上定义了由空格分隔的多个搜索字符串,因此一个搜索字符串是[0-9] +而另一个是错误。由于只有单词错误,返回了在文本文件中找到11个错误和7个警告的行,数字部分不是匹配的一部分,因为findstr不支持+字符(前一个字符或类的一个或多个出现) ,你需要将搜索字符串的那一部分改为[0-9] [0-9] *来实现这一点。要将整个字符串视为一个搜索字符串,您需要提供/ C选项;由于默认为文字搜索模式,因此您还需要显式添加/ R选项。

findstr /R /C:"[0-9][0-9]* errors" "test.txt"

Changing all this would however also match strings like x5 errorse; to avoid that you could use word boundaries like \< (beginning of word) and \> (end of word). (Alternatively you could also include a space on either side of the search string, so /C:" [0-9][0-9]* errors ", but this might cause trouble if the search string appears at the very beginning or end of the applicable line.)

然而,改变所有这些也会匹配像x5 errorse这样的字符串;避免你可以使用像\ <(词的开头)和\>(词的结尾)这样的词边界。 (或者你也可以在搜索字符串的两边加一个空格,所以/ C:“[0-9] [0-9] * errors”,但如果搜索字符串出现在最开头或者这可能会造成麻烦适用行的结尾。)

So regarding all of the above, the corrected and improved command line looks like this:

因此,对于以上所有内容,更正和改进的命令行如下所示:

findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"

This will return the entire line containing a match:

这将返回包含匹配项的整行:

Found 11 errors and 7 warnings

If you want to return such lines only and exclude lines like 2 errors are enough or 35 warnings but less than 3 errors, you could of course extend the search string accordingly:

如果您只想返回这些行并排除2个错误就足够的行或35个警告但少于3个错误,您当然可以相应地扩展搜索字符串:

findstr /R /C:"^Found [0-9][0-9]* errors and [0-9][0-9]* warnings$" "test.txt"

Anyway, to extract the portion 11 errors there are several options:

无论如何,要提取部分11错误,有几个选项:

  1. a for /F loop could parse the output of findstr and extract certain tokens:

    for / F循环可以解析f​​indstr的输出并提取某些标记:

    for /F "tokens=2-3 delims= " %%E in ('
        findstr/R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do echo(%%E %%F
    
  2. the sub-string replacement syntax could also be used:

    也可以使用子字符串替换语法:

    for /F "delims=" %%L in ('
        findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do set "LINE=%%L"
    set "LINE=%LINE:* =%"
    set "LINE=%LINE: and =" & rem "%"
    echo(%LINE%