perl脚本在文本文件中搜索特定字符串并将整行复制到新文件中?

时间:2021-04-19 23:59:08

The main problem I'm having is that my script runs, opens the text file, finds the string, and copies it to a new file, but sometimes it doesn't copy the whole line. It gets cut off at different points in the line. I believe is a problem with my regex.

我遇到的主要问题是我的脚本运行,打开文本文件,找到字符串,并将其复制到新文件,但有时它不会复制整行。它在线路的不同点被切断。我相信我的正则表达式存在问题。

A line of txt may look like this:

一行txt可能如下所示:

E03020039: Unable to load C:\Documents and Settings\rja07\Desktop\DSMProduct\project\Database\Schema\Source\MDB_data_type.dsm into \DSM R17\projects\Databases\Schema\Source\MDB_data_type.dsm . Text file contains invalid characters .

E03020039:无法加载C:\ Documents和Settings \ rja07 \桌面\ DSMProduct \项目\数据库\架构\来源\ MDB_data_type.dsm到\ DSM R17 \项目\数据库\架构\来源\ MDB_data_type.dsm。文本文件包含无效字符。

However, when the Perl script runs it sometimes only copies up until the words "text file" or "text file contains", and the last part of the line is cut off. I need the complete line. This is what i have so far:

但是,当Perl脚本运行时,它有时只会复制到“文本文件”或“文本文件包含”字样,并且该行的最后部分被截断。我需要完整的一行。这是我到目前为止:

if ($error =~ /E03020039/)
{
    print $error;
    open (MF, '>>G:/perl/error.txt');
    print MF $error;
    $count ++;
    }   

This is all inside a for each loop which scans each line of the file:

对于扫描文件每一行的每个循环,这都在一个内部:

I tried:

if ($error =~ /E03020039/&&/characters\s\.\n/)

but that doesnt help me at all. Thanks for any help.

但这对我没有任何帮助。谢谢你的帮助。

6 个解决方案

#1


While we wait for the information brian d foy suggested you provide, here's a few possibly things you should check.

当我们等待brian d foy建议您提供的信息时,这里有一些可能的事情你应该检查。

Why?

Well, looking at the code snippet you posted, style-wise at least, you appear to be using some more traditional Perlisms, instead of modern improved ones, and doing things the modern way will generally make your life easier.

好吧,看看你发布的代码片段,至少在样式方面,你似乎使用了一些更传统的Perlisms,而不是现代改进的Perlisms,而现代方式做事通常会让你的生活更轻松。

Are You using Strictures?

use strict; 
use warnings; 

These 2 lines at the top of your code can help point out many silly mistakes.

代码顶部的这两行可以帮助指出许多愚蠢的错误。

If you cant afford to turn them on everywhere because you have too many errors, you can do them within a scope, ie:

如果由于错误太多而无法在任何地方打开它们,您可以在范围内执行它们,即:

 blah;  #no strict or warnings

 {   # scope 

     use strict; 
     use warnings; 
     code(); # with strict and warnings

 }

 blah; # no strict or warnings

Use lexical file-handles

Bare filehandles are untidy because they're globally unique, and that can get a bit messy.

裸露的文件句柄是不整洁的,因为它们是全球唯一的,并且可能会有点混乱。

{  #scope

  open my $fh , '>' , 'bar.txt'; 
  print $fh "Hello\n";

}  # file cleaned up and closed by perl!

Use 3-Arg open where possible

Good:

open my $fh, '>', 'bar.txt'; 
open my $otherfh, '<', 'foo.txt'; 
open my $iofh , '-|' , 'ls', '-la' ; 

Not Recommended:

open my $fh, '>bar.txt'; 
open my $otherfh , '<foo.txt'; 
open my $iofh , 'ls -la |'; 

See perldoc -f open for details

有关详细信息,请参阅perldoc -f open

Check to see if Opens actually worked or not

Generally, if open for any reason dies, default behavior is to keep on trucking, and this can be a bit weird.

一般来说,如果因任何原因开放而死亡,默认行为是继续卡车运输,这可能有点奇怪。

There are several ways to handle this:

有几种方法可以解决这个问题:

Option 1:

 use Carp(); 
 open my $fh , '>', $filename  or Carp::croak("Oh no! cant open $filename , $! $@"); 

Option 2:

 use autodie;
 open my $fh , '>', $filename;

As For that second regex

Thats probably not doing what you think its doing.

多数民众赞成可能没有按照你的想法去做。

 if ($error =~ /E03020039/&&/characters\s\.\n/)

Is fundamentally the same as

基本上是一样的

 if (  
         ( $error =~ /E03020039/ ) 
     &&  ( $_     =~ /characters\s\.\n/ ) 
 ) 

Which is probably not what you intended.

这可能不是你想要的。

I think you meant:

我想你的意思是:

 if (  
          ( $error =~ /E03020039/ ) 
      &&  ( $error =~ /characters\s\.\n/) 
 ) 

#2


I don't think your regex has anything to do with this. Are you at least getting all the right lines in your new file, even if they are truncated?

我不认为你的正则表达式与此有任何关系。您是否至少在新文件中获得了所有正确的行,即使它们被截断了?

I think you need to go through the normal debugging steps:

我认为你需要经历正常的调试步骤:

  • Can you show us a complete but minimal program that demonstrates the error? The problem might be somewhere else.

    你能告诉我们一个完整但最小的程序来证明错误吗?问题可能出在其他地方。

  • What is in $error? Does it have all of the line when you print it to stdout? If not, work backward until you find the point where the text goes missing. Print its value before and after the suspect operations and work backward until you find the problem.

    什么是$ error?将它打印到标准输出时,它是否具有所有行?如果没有,请向后工作,直到找到文本丢失的位置。在可疑操作之前和之后打印其值并向后工作,直到找到问题为止。

  • Are you sure all of that text is on one line, or there aren't any extra weird characters in the file? What does $error have in it on the next line read?

    你确定所有文本都在一行上,或者文件中没有任何额外的奇怪字符?读取下一行的$ error有什么作用?

  • What happens if you print everything to the new file (i.e. match all lines)? Does all the text end up in the new file?

    如果您将所有内容打印到新文件(即匹配所有行)会发生什么?所有文本是否都以新文件结尾?

  • Are the lines always truncated at the same point?

    线条是否总是在同一点被截断?

#3


If you are using a match pattern (// is the same as m//), the ~= operator should not modify the error string.

如果使用匹配模式(//与m //相同),则〜=运算符不应修改错误字符串。

Are you 100% confident you aren't mangling it prior to the regex check? I'd stick a print line prior to the match and ensure you're accurately duplicating the input.

在正则表达式检查之前,您是否100%确信自己没有进行破坏?我会在比赛前贴上一条打印线,确保你准确地复制输入。

Are you 100% confident that you aren't running into IO buffering issues? Typically perl file IO is buffered, so if you're expecting to see the full, last, line of the logfile via tail -f or something you may be disappointed until the program exits.

您是否100%确信您没有遇到IO缓冲问题?通常perl文件IO是缓冲的,所以如果你希望通过tail -f看到日志文件的完整,最后一行,你可能会失望,直到程序退出。

See http://www.rocketaware.com/perl/perlfaq5/How_do_I_flush_unbuffer_a_fileha.htm for some options for how to enable auto-flushing for your file handle.

有关如何为文件句柄启用自动刷新的一些选项,请参见http://www.rocketaware.com/perl/perlfaq5/How_do_I_flush_unbuffer_a_fileha.htm。

#4


If the intention is simply to get the job done - rather than to learn how to program in Perl - then use 'grep' to find the lines. That also assumes you aren't doing anything else in the script. If the intention is to learn about Perl, then you would ignore this advice and pay heed to the other answers.

如果只是为了完成工作 - 而不是学习如何在Perl中编程 - 那么使用'grep'来找到这些行。这也假设您在脚本中没有做任何其他事情。如果打算了解Perl,那么你会忽略这个建议并注意其他答案。

#5


I see a couple of things that stand out immediately:

我看到一些立即突出的事情:

  1. You are using a global filehandle and not closing it when done.
  2. 您正在使用全局文件句柄,并在完成后不关闭它。

  3. You are using a two argument open (this isn't causing your issue, but it is best to avoid).
  4. 您正在使用两个参数打开(这不会导致您的问题,但最好避免)。

  5. Your altered regex does not do anything like you seem to think it does.
  6. 你改变的正则表达式并没有像你认为的那样做任何事情。

For 1 and 2:

对于1和2:

# For loop around this:
if ($error =~ /E03020039/) {
    print $error;

    open(my $mf, '>>', 'G:/perl/error.txt') 
        or die "Unable to open error file - $!\n";

    print $mf $error;
    $count ++;

    close $mf
        or die "Unable to close error file - $!\n";
}

By using a lexical handle you prevent any other code from touching your handle without having passed explicitly. By closing the handle, you flush the handle's buffers. By checking for errors opening and closing the handle, you prevent uncaught errors leading to lost data.

通过使用词法句柄,您可以防止任何其他代码在未明确传递的情况下触摸您的句柄。通过关闭手柄,可以刷新手柄的缓冲区。通过检查打开和关闭句柄的错误,可以防止未捕获的错误导致数据丢失。

You may wish to move the open and close outside your for loop:

您可能希望在for循环外移动打开和关闭:

my $count = 0;
open( my $mh, '>>', 'errorlog.log' ) or die "oops $!\n";
for my $error ( <$log_h> ) {

    if ( $error =~ /E23323232323/ ) {
         print $mh $error;
         print $error;
         $count++;
    } 

}
close $mh or die "oops $!\n";

Your code was reopening the same file into a global filehandle. This could easily be the cause of the problems you are seeing. It might not be. Does the correct information for error print to STDOUT?

您的代码将同一个文件重新打开到全局文件句柄中。这可能很容易成为您所看到的问题的原因。它可能不是。错误的正确信息是否打印到STDOUT?

Regarding issue 3, $error =~ /E03020039/&&/characters\s\.\n/ is equivalent to:

关于问题3,$ error =〜/ E03020039 / &&/characters\s\.\n/相当于:

($error =~ /E03020039/) && ($_ =~ /characters\s\.\n/)

If you had warnings enabled you would (probably) have gotten the Use of uninitialized value in pattern match (m//) error message. It may have been surprising, but it would have been a clue that something was wrong.

如果您启用了警告,则可能(可能)在模式匹配(m //)错误消息中使用未初始化的值。这可能是令人惊讶的,但这可能是一个错误的线索。

I believe you wanted something like:

我相信你想要的东西:

$error =~ /E03020039.*?characters\s.$/

$ error =〜/ E03020039.*?characters\s.$/

But there is no reason to extend the match, since you are not capturing any part of the match. It will have no effect on the value in $error or what will be written to the file.

但是没有理由延长比赛,因为你没有抓住比赛的任何部分。它对$ error中的值或将写入文件的内容没有影响。

Unless you have a specific reason not to, always start your perl programs with these two pragmas:

除非您有特殊原因,否则请始终使用以下两个pragma启动perl程序:

use strict;
use warnings;

Even if you have a good reason not to use them, it is nearly always best to disable these pragmas only over a limited scope:

即使你有充分的理由不使用它们,最好只在有限的范围内禁用这些pragma:

use strict;
use warnings;

{    no warnings 'uninitialized';
     no strict 'vars';
     print "$foo\n";
}

#6


Your regex is fine.

你的正则表达没问题。

There can be 2 other issues:

还有其他两个问题:

  1. Your outer foreach loop has some error.
  2. 你的外部foreach循环有一些错误。

  3. You append to error.txt using open (MF, '>>G:/perl/error.txt');. So if you have multiple instances of this script running in parallel, that may cause problems with the output if all of them try to write to the file at the same time.
  4. 使用open(MF,'>> G:/perl/error.txt');附加到error.txt。因此,如果此脚本的多个实例并行运行,如果所有这些实例都尝试同时写入该文件,则可能会导致输出出现问题。

Alternatively you can use this simple Perl one-liner which will achieve what you wish to do:

或者你可以使用这个简单的Perl单线程,它可以达到你想要的目的:

perl -nle 'print if /E03020039/' inputFile.txt >> G:/perl/error.txt

#1


While we wait for the information brian d foy suggested you provide, here's a few possibly things you should check.

当我们等待brian d foy建议您提供的信息时,这里有一些可能的事情你应该检查。

Why?

Well, looking at the code snippet you posted, style-wise at least, you appear to be using some more traditional Perlisms, instead of modern improved ones, and doing things the modern way will generally make your life easier.

好吧,看看你发布的代码片段,至少在样式方面,你似乎使用了一些更传统的Perlisms,而不是现代改进的Perlisms,而现代方式做事通常会让你的生活更轻松。

Are You using Strictures?

use strict; 
use warnings; 

These 2 lines at the top of your code can help point out many silly mistakes.

代码顶部的这两行可以帮助指出许多愚蠢的错误。

If you cant afford to turn them on everywhere because you have too many errors, you can do them within a scope, ie:

如果由于错误太多而无法在任何地方打开它们,您可以在范围内执行它们,即:

 blah;  #no strict or warnings

 {   # scope 

     use strict; 
     use warnings; 
     code(); # with strict and warnings

 }

 blah; # no strict or warnings

Use lexical file-handles

Bare filehandles are untidy because they're globally unique, and that can get a bit messy.

裸露的文件句柄是不整洁的,因为它们是全球唯一的,并且可能会有点混乱。

{  #scope

  open my $fh , '>' , 'bar.txt'; 
  print $fh "Hello\n";

}  # file cleaned up and closed by perl!

Use 3-Arg open where possible

Good:

open my $fh, '>', 'bar.txt'; 
open my $otherfh, '<', 'foo.txt'; 
open my $iofh , '-|' , 'ls', '-la' ; 

Not Recommended:

open my $fh, '>bar.txt'; 
open my $otherfh , '<foo.txt'; 
open my $iofh , 'ls -la |'; 

See perldoc -f open for details

有关详细信息,请参阅perldoc -f open

Check to see if Opens actually worked or not

Generally, if open for any reason dies, default behavior is to keep on trucking, and this can be a bit weird.

一般来说,如果因任何原因开放而死亡,默认行为是继续卡车运输,这可能有点奇怪。

There are several ways to handle this:

有几种方法可以解决这个问题:

Option 1:

 use Carp(); 
 open my $fh , '>', $filename  or Carp::croak("Oh no! cant open $filename , $! $@"); 

Option 2:

 use autodie;
 open my $fh , '>', $filename;

As For that second regex

Thats probably not doing what you think its doing.

多数民众赞成可能没有按照你的想法去做。

 if ($error =~ /E03020039/&&/characters\s\.\n/)

Is fundamentally the same as

基本上是一样的

 if (  
         ( $error =~ /E03020039/ ) 
     &&  ( $_     =~ /characters\s\.\n/ ) 
 ) 

Which is probably not what you intended.

这可能不是你想要的。

I think you meant:

我想你的意思是:

 if (  
          ( $error =~ /E03020039/ ) 
      &&  ( $error =~ /characters\s\.\n/) 
 ) 

#2


I don't think your regex has anything to do with this. Are you at least getting all the right lines in your new file, even if they are truncated?

我不认为你的正则表达式与此有任何关系。您是否至少在新文件中获得了所有正确的行,即使它们被截断了?

I think you need to go through the normal debugging steps:

我认为你需要经历正常的调试步骤:

  • Can you show us a complete but minimal program that demonstrates the error? The problem might be somewhere else.

    你能告诉我们一个完整但最小的程序来证明错误吗?问题可能出在其他地方。

  • What is in $error? Does it have all of the line when you print it to stdout? If not, work backward until you find the point where the text goes missing. Print its value before and after the suspect operations and work backward until you find the problem.

    什么是$ error?将它打印到标准输出时,它是否具有所有行?如果没有,请向后工作,直到找到文本丢失的位置。在可疑操作之前和之后打印其值并向后工作,直到找到问题为止。

  • Are you sure all of that text is on one line, or there aren't any extra weird characters in the file? What does $error have in it on the next line read?

    你确定所有文本都在一行上,或者文件中没有任何额外的奇怪字符?读取下一行的$ error有什么作用?

  • What happens if you print everything to the new file (i.e. match all lines)? Does all the text end up in the new file?

    如果您将所有内容打印到新文件(即匹配所有行)会发生什么?所有文本是否都以新文件结尾?

  • Are the lines always truncated at the same point?

    线条是否总是在同一点被截断?

#3


If you are using a match pattern (// is the same as m//), the ~= operator should not modify the error string.

如果使用匹配模式(//与m //相同),则〜=运算符不应修改错误字符串。

Are you 100% confident you aren't mangling it prior to the regex check? I'd stick a print line prior to the match and ensure you're accurately duplicating the input.

在正则表达式检查之前,您是否100%确信自己没有进行破坏?我会在比赛前贴上一条打印线,确保你准确地复制输入。

Are you 100% confident that you aren't running into IO buffering issues? Typically perl file IO is buffered, so if you're expecting to see the full, last, line of the logfile via tail -f or something you may be disappointed until the program exits.

您是否100%确信您没有遇到IO缓冲问题?通常perl文件IO是缓冲的,所以如果你希望通过tail -f看到日志文件的完整,最后一行,你可能会失望,直到程序退出。

See http://www.rocketaware.com/perl/perlfaq5/How_do_I_flush_unbuffer_a_fileha.htm for some options for how to enable auto-flushing for your file handle.

有关如何为文件句柄启用自动刷新的一些选项,请参见http://www.rocketaware.com/perl/perlfaq5/How_do_I_flush_unbuffer_a_fileha.htm。

#4


If the intention is simply to get the job done - rather than to learn how to program in Perl - then use 'grep' to find the lines. That also assumes you aren't doing anything else in the script. If the intention is to learn about Perl, then you would ignore this advice and pay heed to the other answers.

如果只是为了完成工作 - 而不是学习如何在Perl中编程 - 那么使用'grep'来找到这些行。这也假设您在脚本中没有做任何其他事情。如果打算了解Perl,那么你会忽略这个建议并注意其他答案。

#5


I see a couple of things that stand out immediately:

我看到一些立即突出的事情:

  1. You are using a global filehandle and not closing it when done.
  2. 您正在使用全局文件句柄,并在完成后不关闭它。

  3. You are using a two argument open (this isn't causing your issue, but it is best to avoid).
  4. 您正在使用两个参数打开(这不会导致您的问题,但最好避免)。

  5. Your altered regex does not do anything like you seem to think it does.
  6. 你改变的正则表达式并没有像你认为的那样做任何事情。

For 1 and 2:

对于1和2:

# For loop around this:
if ($error =~ /E03020039/) {
    print $error;

    open(my $mf, '>>', 'G:/perl/error.txt') 
        or die "Unable to open error file - $!\n";

    print $mf $error;
    $count ++;

    close $mf
        or die "Unable to close error file - $!\n";
}

By using a lexical handle you prevent any other code from touching your handle without having passed explicitly. By closing the handle, you flush the handle's buffers. By checking for errors opening and closing the handle, you prevent uncaught errors leading to lost data.

通过使用词法句柄,您可以防止任何其他代码在未明确传递的情况下触摸您的句柄。通过关闭手柄,可以刷新手柄的缓冲区。通过检查打开和关闭句柄的错误,可以防止未捕获的错误导致数据丢失。

You may wish to move the open and close outside your for loop:

您可能希望在for循环外移动打开和关闭:

my $count = 0;
open( my $mh, '>>', 'errorlog.log' ) or die "oops $!\n";
for my $error ( <$log_h> ) {

    if ( $error =~ /E23323232323/ ) {
         print $mh $error;
         print $error;
         $count++;
    } 

}
close $mh or die "oops $!\n";

Your code was reopening the same file into a global filehandle. This could easily be the cause of the problems you are seeing. It might not be. Does the correct information for error print to STDOUT?

您的代码将同一个文件重新打开到全局文件句柄中。这可能很容易成为您所看到的问题的原因。它可能不是。错误的正确信息是否打印到STDOUT?

Regarding issue 3, $error =~ /E03020039/&&/characters\s\.\n/ is equivalent to:

关于问题3,$ error =〜/ E03020039 / &&/characters\s\.\n/相当于:

($error =~ /E03020039/) && ($_ =~ /characters\s\.\n/)

If you had warnings enabled you would (probably) have gotten the Use of uninitialized value in pattern match (m//) error message. It may have been surprising, but it would have been a clue that something was wrong.

如果您启用了警告,则可能(可能)在模式匹配(m //)错误消息中使用未初始化的值。这可能是令人惊讶的,但这可能是一个错误的线索。

I believe you wanted something like:

我相信你想要的东西:

$error =~ /E03020039.*?characters\s.$/

$ error =〜/ E03020039.*?characters\s.$/

But there is no reason to extend the match, since you are not capturing any part of the match. It will have no effect on the value in $error or what will be written to the file.

但是没有理由延长比赛,因为你没有抓住比赛的任何部分。它对$ error中的值或将写入文件的内容没有影响。

Unless you have a specific reason not to, always start your perl programs with these two pragmas:

除非您有特殊原因,否则请始终使用以下两个pragma启动perl程序:

use strict;
use warnings;

Even if you have a good reason not to use them, it is nearly always best to disable these pragmas only over a limited scope:

即使你有充分的理由不使用它们,最好只在有限的范围内禁用这些pragma:

use strict;
use warnings;

{    no warnings 'uninitialized';
     no strict 'vars';
     print "$foo\n";
}

#6


Your regex is fine.

你的正则表达没问题。

There can be 2 other issues:

还有其他两个问题:

  1. Your outer foreach loop has some error.
  2. 你的外部foreach循环有一些错误。

  3. You append to error.txt using open (MF, '>>G:/perl/error.txt');. So if you have multiple instances of this script running in parallel, that may cause problems with the output if all of them try to write to the file at the same time.
  4. 使用open(MF,'>> G:/perl/error.txt');附加到error.txt。因此,如果此脚本的多个实例并行运行,如果所有这些实例都尝试同时写入该文件,则可能会导致输出出现问题。

Alternatively you can use this simple Perl one-liner which will achieve what you wish to do:

或者你可以使用这个简单的Perl单线程,它可以达到你想要的目的:

perl -nle 'print if /E03020039/' inputFile.txt >> G:/perl/error.txt