I am working on a Perl script to read CSV file and do some calculations. CSV file has only two columns, something like below.
我正在研究Perl脚本来读取CSV文件并进行一些计算。 CSV文件只有两列,如下所示。
One Two
1.00 44.000
3.00 55.000
Now this CSV file is very big ,can be from 10 MB to 2GB.
现在这个CSV文件很大,可以从10 MB到2GB。
Currently I am taking CSV file of size 700 MB. I tried to open this file in notepad, excel but it looks like no software is going to open it.
目前我正在使用大小为700 MB的CSV文件。我试着在记事本中打开这个文件,excel但看起来好像没有软件打开它。
I want to read may be last 1000 lines from CSV file and see the values. How can I do that? I cannot open file in notepad or any other program.
我想阅读可能是CSV文件中的最后1000行并查看值。我怎样才能做到这一点?我无法在记事本或任何其他程序中打开文件。
If I write a Perl script then I need to process complete file to go to end of file and then read last 1000 lines.
如果我编写一个Perl脚本,那么我需要处理完整的文件以转到文件末尾然后读取最后1000行。
Is there any better way to that? I am new to Perl and any suggestions will be appreciated.
有没有更好的方法呢?我是Perl的新手,任何建议都将不胜感激。
I have searched net and there are some scripts available like File::Tail but I don't know they will work on windows ?
我搜索过网络,有一些脚本可用,比如File :: Tail,但我不知道它们会在windows上运行吗?
11 个解决方案
#1
#2
25
The File::ReadBackwards module allows you to read a file in reverse order. This makes it easy to get the last N lines as long as you aren't order dependent. If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse
it.
File :: ReadBackwards模块允许您以相反的顺序读取文件。只要您不依赖于顺序,就可以轻松获得最后N行。如果你和所需的数据足够小(在你的情况下应该是这样)你可以将最后1000行读入数组然后反转它。
#3
9
This is only tangentially related to your main question, but when you want to check if a module such as File::Tail works on your platform, check the results from CPAN Testers. The links at the top of the module page in CPAN Search lead you to
这仅与您的主要问题相关,但是当您想要检查诸如File :: Tail之类的模块是否适用于您的平台时,请检查CPAN测试人员的结果。 CPAN搜索模块页面顶部的链接将引导您
Looking at the matrix, you see that indeed this module has a problem on Windows on all version of Perl tested:
查看矩阵,您会看到在所测试的所有Perl版本上,此模块确实在Windows上存在问题:
#4
5
I've wrote quick backward file search using the following code on pure Perl:
我在纯Perl上使用以下代码编写了快速向后文件搜索:
#!/usr/bin/perl
use warnings;
use strict;
my ($file, $num_of_lines) = @ARGV;
my $count = 0;
my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward
my $offset = -2; # skip two last characters: \n and ^Z in the end of file
open F, $file or die "Can't read $file: $!\n";
while (abs($offset) < $filesize) {
my $line = "";
# we need to check the start of the file for seek in mode "2"
# as it continues to output data in revers order even when out of file range reached
while (abs($offset) < $filesize) {
seek F, $offset, 2; # because of negative $offset & "2" - it will seek backward
$offset -= 1; # move back the counter
my $char = getc F;
last if $char eq "\n"; # catch the whole line if reached
$line = $char . $line; # otherwise we have next character for current line
}
# got the next line!
print $line, "\n";
# exit the loop if we are done
$count++;
last if $count > $num_of_lines;
}
and run this script like:
并运行此脚本,如:
$ get-x-lines-from-end.pl ./myhugefile.log 200
#5
4
Without tail, a Perl-only solution isn't that unreasonable.
没有尾巴,只有Perl的解决方案并不是那么不合理。
One way is to seek from the end of the file, then read lines from it. If you don't have enough lines, seek even further from the end and try again.
一种方法是从文件的末尾进行搜索,然后从中读取行。如果您没有足够的线路,请从最后搜索并再试一次。
sub last_x_lines {
my ($filename, $lineswanted) = @_;
my ($line, $filesize, $seekpos, $numread, @lines);
open F, $filename or die "Can't read $filename: $!\n";
$filesize = -s $filename;
$seekpos = 50 * $lineswanted;
$numread = 0;
while ($numread < $lineswanted) {
@lines = ();
$numread = 0;
seek(F, $filesize - $seekpos, 0);
<F> if $seekpos < $filesize; # Discard probably fragmentary line
while (defined($line = <F>)) {
push @lines, $line;
shift @lines if ++$numread > $lineswanted;
}
if ($numread < $lineswanted) {
# We didn't get enough lines. Double the amount of space to read from next time.
if ($seekpos >= $filesize) {
die "There aren't even $lineswanted lines in $filename - I got $numread\n";
}
$seekpos *= 2;
$seekpos = $filesize if $seekpos >= $filesize;
}
}
close F;
return @lines;
}
P.S. A better title would be something like "Reading lines from the end of a large file in Perl".
附:一个更好的标题就是“从Perl中的大文件末尾读取行”。
#6
#7
1
You could use Tie::File module I believe. It looks like this loads the lines into an array, then you could get the size of the array and process arrayS-ze-1000 up to arraySize-1.
您可以使用Tie :: File模块我相信。看起来这会将行加载到数组中,然后您可以获得数组的大小并将arrayS-ze-1000处理到arraySize-1。
Another Option would be to count the number of lines in the file, then loop through the file once, and start reading in values at numberofLines-1000
另一个选项是计算文件中的行数,然后遍历文件一次,并开始读取numberofLines-1000中的值
$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);
That would give you number of lines (on most systems.
这会给你一些行数(在大多数系统上。
#8
0
If you know the number of lines in the file, you can do
如果您知道文件中的行数,则可以这样做
perl -ne "print if ($. > N);" filename.csv
where N is $num_lines_in_file - $num_lines_to_print. You can count the lines with
其中N是$ num_lines_in_file - $ num_lines_to_print。你可以计算行数
perl -e "while (<>) {} print $.;" filename.csv
#9
0
The modules are the way to go. However, sometimes you may be writing a piece of code that you want to run on a variety of machines that may be missing the more obscure CPAN modules. In that case why not just 'tail' and dump the output to a temp file from within Perl?
模块是要走的路。但是,有时您可能正在编写一段代码,您希望在各种可能缺少更加模糊的CPAN模块的机器上运行这些代码。在这种情况下,为什么不只是'尾'并将输出转储到Perl中的临时文件?
#!/usr/bin/perl
`tail --lines=1000 /path/myfile.txt > tempfile.txt`
You then have something that isn't dependent on a CPAN module if installing one may present an issue.
然后,如果安装一个CPAN模块可能会出现问题,那么您将拥有一些不依赖于CPAN模块的东西。
#10
-1
Without relying on tail, which I probably would do, if you have more than $FILESIZE [2GB?] of memory then I'd just be lazy and do:
如果你的内存超过$ FILESIZE [2GB?],那么我不会依赖尾巴,我只会懒得做:
my @lines = <>;
my @lastKlines = @lines[-1000,-1];
Though the other answers involving tail
or
seek()
are pretty much the way to go on this.
虽然涉及tail或seek()的其他答案几乎都是这样做的。
#11
-1
You should absolutely use File::Tail, or better yet another module. It's not a script, it's a module (programming library). It likely works on Windows. As somebody said, you can check this on CPAN Testers, or often just by reading the module documentation or just trying it.
你绝对应该使用File :: Tail,或者更好的另一个模块。它不是一个脚本,它是一个模块(编程库)。它可能适用于Windows。正如有人所说,您可以在CPAN测试仪上进行检查,或者通常只需阅读模块文档或尝试它。
You selected usage of the tail utility as your preferred answer, but that's likely to be more of a headache on Windows than File::Tail.
您选择使用尾部实用程序作为首选答案,但在Windows上可能比File :: Tail更令人头疼。
#1
11
In *nix, you can use the tail command.
在* nix中,您可以使用tail命令。
tail -1000 yourfile | perl ...
That will write only the last 1000 lines to the perl program.
那只会将最后1000行写入perl程序。
On Windows, there are gnuwin32 and unxutils packages both have tail
utility.
在Windows上,有gnuwin32和unxutils包都有tail实用程序。
#2
25
The File::ReadBackwards module allows you to read a file in reverse order. This makes it easy to get the last N lines as long as you aren't order dependent. If you are and the needed data is small enough (which it should be in your case) you could read the last 1000 lines into an array and then reverse
it.
File :: ReadBackwards模块允许您以相反的顺序读取文件。只要您不依赖于顺序,就可以轻松获得最后N行。如果你和所需的数据足够小(在你的情况下应该是这样)你可以将最后1000行读入数组然后反转它。
#3
9
This is only tangentially related to your main question, but when you want to check if a module such as File::Tail works on your platform, check the results from CPAN Testers. The links at the top of the module page in CPAN Search lead you to
这仅与您的主要问题相关,但是当您想要检查诸如File :: Tail之类的模块是否适用于您的平台时,请检查CPAN测试人员的结果。 CPAN搜索模块页面顶部的链接将引导您
Looking at the matrix, you see that indeed this module has a problem on Windows on all version of Perl tested:
查看矩阵,您会看到在所测试的所有Perl版本上,此模块确实在Windows上存在问题:
#4
5
I've wrote quick backward file search using the following code on pure Perl:
我在纯Perl上使用以下代码编写了快速向后文件搜索:
#!/usr/bin/perl
use warnings;
use strict;
my ($file, $num_of_lines) = @ARGV;
my $count = 0;
my $filesize = -s $file; # filesize used to control reaching the start of file while reading it backward
my $offset = -2; # skip two last characters: \n and ^Z in the end of file
open F, $file or die "Can't read $file: $!\n";
while (abs($offset) < $filesize) {
my $line = "";
# we need to check the start of the file for seek in mode "2"
# as it continues to output data in revers order even when out of file range reached
while (abs($offset) < $filesize) {
seek F, $offset, 2; # because of negative $offset & "2" - it will seek backward
$offset -= 1; # move back the counter
my $char = getc F;
last if $char eq "\n"; # catch the whole line if reached
$line = $char . $line; # otherwise we have next character for current line
}
# got the next line!
print $line, "\n";
# exit the loop if we are done
$count++;
last if $count > $num_of_lines;
}
and run this script like:
并运行此脚本,如:
$ get-x-lines-from-end.pl ./myhugefile.log 200
#5
4
Without tail, a Perl-only solution isn't that unreasonable.
没有尾巴,只有Perl的解决方案并不是那么不合理。
One way is to seek from the end of the file, then read lines from it. If you don't have enough lines, seek even further from the end and try again.
一种方法是从文件的末尾进行搜索,然后从中读取行。如果您没有足够的线路,请从最后搜索并再试一次。
sub last_x_lines {
my ($filename, $lineswanted) = @_;
my ($line, $filesize, $seekpos, $numread, @lines);
open F, $filename or die "Can't read $filename: $!\n";
$filesize = -s $filename;
$seekpos = 50 * $lineswanted;
$numread = 0;
while ($numread < $lineswanted) {
@lines = ();
$numread = 0;
seek(F, $filesize - $seekpos, 0);
<F> if $seekpos < $filesize; # Discard probably fragmentary line
while (defined($line = <F>)) {
push @lines, $line;
shift @lines if ++$numread > $lineswanted;
}
if ($numread < $lineswanted) {
# We didn't get enough lines. Double the amount of space to read from next time.
if ($seekpos >= $filesize) {
die "There aren't even $lineswanted lines in $filename - I got $numread\n";
}
$seekpos *= 2;
$seekpos = $filesize if $seekpos >= $filesize;
}
}
close F;
return @lines;
}
P.S. A better title would be something like "Reading lines from the end of a large file in Perl".
附:一个更好的标题就是“从Perl中的大文件末尾读取行”。
#6
2
perl -n -e "shift @d if (@d >= 1000); push(@d, $_); END { print @d }" < bigfile.csv
Although really, the fact that UNIX systems can simply tail -n 1000
should convince you to simply install cygwin or colinux
虽然真的,UNIX系统可以简单地尾随1000的事实应该说服你简单地安装cygwin或colinux
#7
1
You could use Tie::File module I believe. It looks like this loads the lines into an array, then you could get the size of the array and process arrayS-ze-1000 up to arraySize-1.
您可以使用Tie :: File模块我相信。看起来这会将行加载到数组中,然后您可以获得数组的大小并将arrayS-ze-1000处理到arraySize-1。
Another Option would be to count the number of lines in the file, then loop through the file once, and start reading in values at numberofLines-1000
另一个选项是计算文件中的行数,然后遍历文件一次,并开始读取numberofLines-1000中的值
$count = `wc -l < $file`;
die "wc failed: $?" if $?;
chomp($count);
That would give you number of lines (on most systems.
这会给你一些行数(在大多数系统上。
#8
0
If you know the number of lines in the file, you can do
如果您知道文件中的行数,则可以这样做
perl -ne "print if ($. > N);" filename.csv
where N is $num_lines_in_file - $num_lines_to_print. You can count the lines with
其中N是$ num_lines_in_file - $ num_lines_to_print。你可以计算行数
perl -e "while (<>) {} print $.;" filename.csv
#9
0
The modules are the way to go. However, sometimes you may be writing a piece of code that you want to run on a variety of machines that may be missing the more obscure CPAN modules. In that case why not just 'tail' and dump the output to a temp file from within Perl?
模块是要走的路。但是,有时您可能正在编写一段代码,您希望在各种可能缺少更加模糊的CPAN模块的机器上运行这些代码。在这种情况下,为什么不只是'尾'并将输出转储到Perl中的临时文件?
#!/usr/bin/perl
`tail --lines=1000 /path/myfile.txt > tempfile.txt`
You then have something that isn't dependent on a CPAN module if installing one may present an issue.
然后,如果安装一个CPAN模块可能会出现问题,那么您将拥有一些不依赖于CPAN模块的东西。
#10
-1
Without relying on tail, which I probably would do, if you have more than $FILESIZE [2GB?] of memory then I'd just be lazy and do:
如果你的内存超过$ FILESIZE [2GB?],那么我不会依赖尾巴,我只会懒得做:
my @lines = <>;
my @lastKlines = @lines[-1000,-1];
Though the other answers involving tail
or
seek()
are pretty much the way to go on this.
虽然涉及tail或seek()的其他答案几乎都是这样做的。
#11
-1
You should absolutely use File::Tail, or better yet another module. It's not a script, it's a module (programming library). It likely works on Windows. As somebody said, you can check this on CPAN Testers, or often just by reading the module documentation or just trying it.
你绝对应该使用File :: Tail,或者更好的另一个模块。它不是一个脚本,它是一个模块(编程库)。它可能适用于Windows。正如有人所说,您可以在CPAN测试仪上进行检查,或者通常只需阅读模块文档或尝试它。
You selected usage of the tail utility as your preferred answer, but that's likely to be more of a headache on Windows than File::Tail.
您选择使用尾部实用程序作为首选答案,但在Windows上可能比File :: Tail更令人头疼。