使用Perl如何读取文件并解析日志以查找错误日志并输出到log.txt文件

时间:2020-12-31 14:01:24

I am trying to use Perl to create a program that will read in data for a file that is 40,000+ lines long and parse through each message to extract the error messages from it.

我正在尝试使用Perl创建一个程序,该程序将读取长度超过40,000行的文件的数据,并解析每条消息以从中提取错误消息。

A sample of the data I am using looks like this:

我正在使用的数据样本如下所示:

--------All Messages---------
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
ERROR: there was an error transferring data .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
ERROR: there was an error transferring the data and the error message spans
more than 1 line of code and may also contain newline characters as well .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
---------END REPOSITORY---------

each message in the log has the following in common:

日志中的每条消息都有以下共同点:

1) it starts with either SUCCESS or ERROR depending on the outcome

1)根据结果,它以SUCCESS或ERROR开始

2) all messages will end with <whitespace><period><newline>

2)所有消息将以 结束

The following is code that I have written but for some reason I can't seem to debug it. Any help is greatly appreciated.

以下是我编写的代码但由于某种原因我似乎无法调试它。任何帮助是极大的赞赏。

open(FH,$filetoparse);
{
# following line is supposed to change the delimiter for the file
    $/ = " .";
# the follow statement will create an error log of all error messages in log and save it
# to a file named errorlog.txt
    while(<FH>)
    {
        push (@msgarray, $_);
    }
if ($outputtype == 1)
{
    $outputfile="errorlog.txt";
    open(OUTPUT,">>$outputfile");
    $errorcount=0;
    $errortarget="ERROR";
    print OUTPUT "-----------Error Log-----------\n";

    for ($i=0;$i<@msgarray;$i++)
    {
    if ($msgarray[$i] =~ /^$errortarget/)
    {

        print OUTPUT "$msgarray[$i]\n";
#       print OUTPUT "next code is: \n";
        $errorcount++;

    }
    print OUTPUT "\nError Count : $errorcount\n";

    close (OUTPUT);
    }
}

3 个解决方案

#1


Add the newline character to your delimiter. Change:

将换行符添加到分隔符。更改:

$/ = " .";

to:

$/ = " .\n";

And if you want to remove the delimiter, you can chomp.

如果你想删除分隔符,你可以选择。

while(<FH>)
{
    chomp;
    push (@msgarray, $_);
}

#2


The problem with setting $/ = " ." is that the lines you read will end at that closing dot, and the following line will start with the newline character after it. That means none of your lines except possibly the first will start with "ERROR" - they will start with "\nERROR" instead, and so your test will always fail

设置$ / =“。”的问题。是您读取的行将在该结束点处结束,并且后面的行将以其后面的换行符开头。这意味着你的所有行除了第一行之外都不会以“ERROR”开头 - 它们将以“\ nERROR”开头,所以你的测试总会失败

There are some other issues with your code that you will want to understand.

您还需要了解代码中的其他一些问题。

  • You must always use strict and use warnings, and declare all your variables with my as close as possible to their first point of use

    您必须始终使用strict并使用警告,并使用尽可能接近第一个使用点的声明声明所有变量

  • You should always use lexical file handles with the three-parameter form of open. You also need to check the status of every open and put $! in the die string so that you know why it failed. So

    您应该始终使用具有三参数形式的打开的词法文件句柄。你还需要检查每个打开的状态并放入$!在模拟字符串中,以便您知道它失败的原因。所以

    open(FH,$filetoparse);
    

    becomes

    open my $in_fh, '<', $filetoparse or die qq{Unable to open "$filetoparse" for input: $!};
    
  • It is better to process text files line by line unless you have good reasons to read them into memory in their entirety — for instance, if you need to do multiple passes through the data, or if you need random access to the contents instead of processing them linearly.

    最好逐行处理文本文件,除非您有充分的理由将它们全部读入内存 - 例如,如果您需要对数据进行多次传递,或者您需要随机访问内容而不是处理他们是线性的。

    It's also worth noting that, instead of writing

    值得注意的是,而不是写作

    while ( <$in_fh> ) {
        push @msgarray, $_;
    }
    

    you can say just

    你可以说

    @msgarray = <$in_fh>;
    

    which has exactly the same result

    结果完全相同

  • It is often better to iterate over the contents of an array rather than over its indices. So instead of

    迭代数组的内容而不是遍历其索引通常更好。而不是

    for ( my $i = 0; $i < @msgarray; ++$i ) {
        # Do stuff with $msgarray[$i];
    }
    

    you could write

    你可以写

    for my $message ( @msgarray ) {
        # Do stuff with $message;
    }
    

Here's a rewrite of your code that demonstrates these points

这是重写代码,用于演示这些要点

open my $in_fh, '<', $filetoparse
        or die qq{Unable to open "$filetoparse" for input: $!};

{
    if ( $outputtype == 1 ) {

        my $outputfile  = 'errorlog.txt';
        my $errorcount  = 0;
        my $errortarget = 'ERROR';

        open my $out_fh, '>>', $outputfile
                or die qq{Unable to open "$outputfile" for output: $!};

        print $out_fh "-----------Error Log-----------\n";

        while ( <$in_fh> ) {
          next unless /^\Q$errortarget/;

          s/\s*\.\s*\z//;       # Remove trailing detail
          print $out_fh "$_\n";
          ++$errorcount;
        }

        print $out_fh "\nError Count : $errorcount\n";

        close ($out_fh) or die $!;
    }
}

#3


The file handle OUTPUT is closed within the for loop which you access for every iteration after closing. Move it outside the loop and try it

文件句柄OUTPUT在for循环中关闭,您在关闭后每次迭代都会访问该循环。将它移到循环外面并尝试它

#1


Add the newline character to your delimiter. Change:

将换行符添加到分隔符。更改:

$/ = " .";

to:

$/ = " .\n";

And if you want to remove the delimiter, you can chomp.

如果你想删除分隔符,你可以选择。

while(<FH>)
{
    chomp;
    push (@msgarray, $_);
}

#2


The problem with setting $/ = " ." is that the lines you read will end at that closing dot, and the following line will start with the newline character after it. That means none of your lines except possibly the first will start with "ERROR" - they will start with "\nERROR" instead, and so your test will always fail

设置$ / =“。”的问题。是您读取的行将在该结束点处结束,并且后面的行将以其后面的换行符开头。这意味着你的所有行除了第一行之外都不会以“ERROR”开头 - 它们将以“\ nERROR”开头,所以你的测试总会失败

There are some other issues with your code that you will want to understand.

您还需要了解代码中的其他一些问题。

  • You must always use strict and use warnings, and declare all your variables with my as close as possible to their first point of use

    您必须始终使用strict并使用警告,并使用尽可能接近第一个使用点的声明声明所有变量

  • You should always use lexical file handles with the three-parameter form of open. You also need to check the status of every open and put $! in the die string so that you know why it failed. So

    您应该始终使用具有三参数形式的打开的词法文件句柄。你还需要检查每个打开的状态并放入$!在模拟字符串中,以便您知道它失败的原因。所以

    open(FH,$filetoparse);
    

    becomes

    open my $in_fh, '<', $filetoparse or die qq{Unable to open "$filetoparse" for input: $!};
    
  • It is better to process text files line by line unless you have good reasons to read them into memory in their entirety — for instance, if you need to do multiple passes through the data, or if you need random access to the contents instead of processing them linearly.

    最好逐行处理文本文件,除非您有充分的理由将它们全部读入内存 - 例如,如果您需要对数据进行多次传递,或者您需要随机访问内容而不是处理他们是线性的。

    It's also worth noting that, instead of writing

    值得注意的是,而不是写作

    while ( <$in_fh> ) {
        push @msgarray, $_;
    }
    

    you can say just

    你可以说

    @msgarray = <$in_fh>;
    

    which has exactly the same result

    结果完全相同

  • It is often better to iterate over the contents of an array rather than over its indices. So instead of

    迭代数组的内容而不是遍历其索引通常更好。而不是

    for ( my $i = 0; $i < @msgarray; ++$i ) {
        # Do stuff with $msgarray[$i];
    }
    

    you could write

    你可以写

    for my $message ( @msgarray ) {
        # Do stuff with $message;
    }
    

Here's a rewrite of your code that demonstrates these points

这是重写代码,用于演示这些要点

open my $in_fh, '<', $filetoparse
        or die qq{Unable to open "$filetoparse" for input: $!};

{
    if ( $outputtype == 1 ) {

        my $outputfile  = 'errorlog.txt';
        my $errorcount  = 0;
        my $errortarget = 'ERROR';

        open my $out_fh, '>>', $outputfile
                or die qq{Unable to open "$outputfile" for output: $!};

        print $out_fh "-----------Error Log-----------\n";

        while ( <$in_fh> ) {
          next unless /^\Q$errortarget/;

          s/\s*\.\s*\z//;       # Remove trailing detail
          print $out_fh "$_\n";
          ++$errorcount;
        }

        print $out_fh "\nError Count : $errorcount\n";

        close ($out_fh) or die $!;
    }
}

#3


The file handle OUTPUT is closed within the for loop which you access for every iteration after closing. Move it outside the loop and try it

文件句柄OUTPUT在for循环中关闭,您在关闭后每次迭代都会访问该循环。将它移到循环外面并尝试它