I am trying to use Perl to create a program that will read in data for a file that is 40,000+ lines long and parse through each message to extract the error messages from it.
我正在尝试使用Perl创建一个程序,该程序将读取长度超过40,000行的文件的数据,并解析每条消息以从中提取错误消息。
A sample of the data I am using looks like this:
我正在使用的数据样本如下所示:
--------All Messages---------
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
ERROR: there was an error transferring data .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
ERROR: there was an error transferring the data and the error message spans
more than 1 line of code and may also contain newline characters as well .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
SUCCESS: data transferred successfully .
---------END REPOSITORY---------
each message in the log has the following in common:
日志中的每条消息都有以下共同点:
1) it starts with either SUCCESS or ERROR depending on the outcome
1)根据结果,它以SUCCESS或ERROR开始
2) all messages will end with <whitespace><period><newline>
2)所有消息将以
The following is code that I have written but for some reason I can't seem to debug it. Any help is greatly appreciated.
以下是我编写的代码但由于某种原因我似乎无法调试它。任何帮助是极大的赞赏。
open(FH,$filetoparse);
{
# following line is supposed to change the delimiter for the file
$/ = " .";
# the follow statement will create an error log of all error messages in log and save it
# to a file named errorlog.txt
while(<FH>)
{
push (@msgarray, $_);
}
if ($outputtype == 1)
{
$outputfile="errorlog.txt";
open(OUTPUT,">>$outputfile");
$errorcount=0;
$errortarget="ERROR";
print OUTPUT "-----------Error Log-----------\n";
for ($i=0;$i<@msgarray;$i++)
{
if ($msgarray[$i] =~ /^$errortarget/)
{
print OUTPUT "$msgarray[$i]\n";
# print OUTPUT "next code is: \n";
$errorcount++;
}
print OUTPUT "\nError Count : $errorcount\n";
close (OUTPUT);
}
}
3 个解决方案
#1
Add the newline character to your delimiter. Change:
将换行符添加到分隔符。更改:
$/ = " .";
to:
$/ = " .\n";
And if you want to remove the delimiter, you can chomp
.
如果你想删除分隔符,你可以选择。
while(<FH>)
{
chomp;
push (@msgarray, $_);
}
#2
The problem with setting $/ = " ."
is that the lines you read will end at that closing dot, and the following line will start with the newline character after it. That means none of your lines except possibly the first will start with "ERROR"
- they will start with "\nERROR"
instead, and so your test will always fail
设置$ / =“。”的问题。是您读取的行将在该结束点处结束,并且后面的行将以其后面的换行符开头。这意味着你的所有行除了第一行之外都不会以“ERROR”开头 - 它们将以“\ nERROR”开头,所以你的测试总会失败
There are some other issues with your code that you will want to understand.
您还需要了解代码中的其他一些问题。
-
You must always
use strict
anduse warnings
, and declare all your variables withmy
as close as possible to their first point of use您必须始终使用strict并使用警告,并使用尽可能接近第一个使用点的声明声明所有变量
-
You should always use lexical file handles with the three-parameter form of
open
. You also need to check the status of everyopen
and put$!
in thedie
string so that you know why it failed. So您应该始终使用具有三参数形式的打开的词法文件句柄。你还需要检查每个打开的状态并放入$!在模拟字符串中,以便您知道它失败的原因。所以
open(FH,$filetoparse);
becomes
open my $in_fh, '<', $filetoparse or die qq{Unable to open "$filetoparse" for input: $!};
-
It is better to process text files line by line unless you have good reasons to read them into memory in their entirety — for instance, if you need to do multiple passes through the data, or if you need random access to the contents instead of processing them linearly.
最好逐行处理文本文件,除非您有充分的理由将它们全部读入内存 - 例如,如果您需要对数据进行多次传递,或者您需要随机访问内容而不是处理他们是线性的。
It's also worth noting that, instead of writing
值得注意的是,而不是写作
while ( <$in_fh> ) { push @msgarray, $_; }
you can say just
你可以说
@msgarray = <$in_fh>;
which has exactly the same result
结果完全相同
-
It is often better to iterate over the contents of an array rather than over its indices. So instead of
迭代数组的内容而不是遍历其索引通常更好。而不是
for ( my $i = 0; $i < @msgarray; ++$i ) { # Do stuff with $msgarray[$i]; }
you could write
你可以写
for my $message ( @msgarray ) { # Do stuff with $message; }
Here's a rewrite of your code that demonstrates these points
这是重写代码,用于演示这些要点
open my $in_fh, '<', $filetoparse
or die qq{Unable to open "$filetoparse" for input: $!};
{
if ( $outputtype == 1 ) {
my $outputfile = 'errorlog.txt';
my $errorcount = 0;
my $errortarget = 'ERROR';
open my $out_fh, '>>', $outputfile
or die qq{Unable to open "$outputfile" for output: $!};
print $out_fh "-----------Error Log-----------\n";
while ( <$in_fh> ) {
next unless /^\Q$errortarget/;
s/\s*\.\s*\z//; # Remove trailing detail
print $out_fh "$_\n";
++$errorcount;
}
print $out_fh "\nError Count : $errorcount\n";
close ($out_fh) or die $!;
}
}
#3
The file handle OUTPUT
is closed within the for
loop which you access for every iteration after closing. Move it outside the loop and try it
文件句柄OUTPUT在for循环中关闭,您在关闭后每次迭代都会访问该循环。将它移到循环外面并尝试它
#1
Add the newline character to your delimiter. Change:
将换行符添加到分隔符。更改:
$/ = " .";
to:
$/ = " .\n";
And if you want to remove the delimiter, you can chomp
.
如果你想删除分隔符,你可以选择。
while(<FH>)
{
chomp;
push (@msgarray, $_);
}
#2
The problem with setting $/ = " ."
is that the lines you read will end at that closing dot, and the following line will start with the newline character after it. That means none of your lines except possibly the first will start with "ERROR"
- they will start with "\nERROR"
instead, and so your test will always fail
设置$ / =“。”的问题。是您读取的行将在该结束点处结束,并且后面的行将以其后面的换行符开头。这意味着你的所有行除了第一行之外都不会以“ERROR”开头 - 它们将以“\ nERROR”开头,所以你的测试总会失败
There are some other issues with your code that you will want to understand.
您还需要了解代码中的其他一些问题。
-
You must always
use strict
anduse warnings
, and declare all your variables withmy
as close as possible to their first point of use您必须始终使用strict并使用警告,并使用尽可能接近第一个使用点的声明声明所有变量
-
You should always use lexical file handles with the three-parameter form of
open
. You also need to check the status of everyopen
and put$!
in thedie
string so that you know why it failed. So您应该始终使用具有三参数形式的打开的词法文件句柄。你还需要检查每个打开的状态并放入$!在模拟字符串中,以便您知道它失败的原因。所以
open(FH,$filetoparse);
becomes
open my $in_fh, '<', $filetoparse or die qq{Unable to open "$filetoparse" for input: $!};
-
It is better to process text files line by line unless you have good reasons to read them into memory in their entirety — for instance, if you need to do multiple passes through the data, or if you need random access to the contents instead of processing them linearly.
最好逐行处理文本文件,除非您有充分的理由将它们全部读入内存 - 例如,如果您需要对数据进行多次传递,或者您需要随机访问内容而不是处理他们是线性的。
It's also worth noting that, instead of writing
值得注意的是,而不是写作
while ( <$in_fh> ) { push @msgarray, $_; }
you can say just
你可以说
@msgarray = <$in_fh>;
which has exactly the same result
结果完全相同
-
It is often better to iterate over the contents of an array rather than over its indices. So instead of
迭代数组的内容而不是遍历其索引通常更好。而不是
for ( my $i = 0; $i < @msgarray; ++$i ) { # Do stuff with $msgarray[$i]; }
you could write
你可以写
for my $message ( @msgarray ) { # Do stuff with $message; }
Here's a rewrite of your code that demonstrates these points
这是重写代码,用于演示这些要点
open my $in_fh, '<', $filetoparse
or die qq{Unable to open "$filetoparse" for input: $!};
{
if ( $outputtype == 1 ) {
my $outputfile = 'errorlog.txt';
my $errorcount = 0;
my $errortarget = 'ERROR';
open my $out_fh, '>>', $outputfile
or die qq{Unable to open "$outputfile" for output: $!};
print $out_fh "-----------Error Log-----------\n";
while ( <$in_fh> ) {
next unless /^\Q$errortarget/;
s/\s*\.\s*\z//; # Remove trailing detail
print $out_fh "$_\n";
++$errorcount;
}
print $out_fh "\nError Count : $errorcount\n";
close ($out_fh) or die $!;
}
}
#3
The file handle OUTPUT
is closed within the for
loop which you access for every iteration after closing. Move it outside the loop and try it
文件句柄OUTPUT在for循环中关闭,您在关闭后每次迭代都会访问该循环。将它移到循环外面并尝试它