如何仅更改文件的第一行?

时间:2021-12-27 23:30:33

I would like to know which pattern can I use in sed to make changes in the first line of huge files (~2 GB). The preference for sed is only because I assume it must be faster than a Python or Perl script.

我想知道我可以在sed中使用哪种模式来更改巨大文件的第一行(~2 GB)。对sed的偏好只是因为我认为它必须比Python或Perl脚本更快。

The files have the following structure:

这些文件具有以下结构:

field 1, field 2, ... field n
data

and, given the likelihood of having spaces in the identifier for every field, I need to replace every space by an underscore in this way:

并且,考虑到每个字段的标识符中都有空格的可能性,我需要用这种方式用下划线替换每个空格:

**BEFORE** 
the first name,the second name,the first surname,a nickname, ...
data

**AFTER**
the_first_name,the_second_name,the_first_surname,a_nickname, ...
data

Any pointers to the right pattern to use, or another scripting solution would be great.

任何指向正确使用模式的指针,或其他脚本解决方案都会很棒。

5 个解决方案

#1


20  

To edit the first 10 lines

编辑前10行

sed -i -e '1,10s/ /_/g'

In Perl, you can use the flip-flop operator in scalar context:

在Perl中,您可以在标量上下文中使用触发器运算符:

perl -i -pe 's/ /_/g if 1 .. 10'

#2


10  

I don't think you want to use any solution that requires the data to be written to a new file.

我认为您不想使用任何需要将数据写入新文件的解决方案。

If you're pretty sure that all you need is to change the spaces into underscores in the first line of the large text files, you only have to read the first line, swap the characters and write it back in place:

如果您非常确定所需要的是在大文本文件的第一行中将空格更改为下划线,则只需读取第一行,交换字符并将其写回原位:

#!/usr/bin/env perl
use strict;

my $filename = shift;
open (FH, "+< $filename") || die "can't open $filename: $!";
my $line = <FH>;
$line =~ s/ /_/g;
seek FH, 0, 0; # go back to the start of the file
printf FH $line;
close FH;

To use it, just pass the full path of the file to update:

要使用它,只需传递文件的完整路径即可更新:

# fixheader "/path/to/myfile.txt"

#3


5  

You are unlikely to notice any speed difference between Perl, Python, and sed. Your script will spend most of its time waiting for IO.

您不太可能注意到Perl,Python和sed之间存在任何速度差异。您的脚本将花费大部分时间等待IO。

If the lines are the same length, you can edit in-place, otherwise you will have to create a new file.

如果行的长度相同,则可以就地编辑,否则必须创建新文件。

In Perl:

在Perl中:

#!/usr/bin/env perl
use strict;

my $filename = shift;
open my $in_fh, '<', $filename
  or die "Cannot open $filename for reading: $!";
my $first_line = <$in_fh>;

open my $out_fh, '>', "$filename.tmp"
  or die "Cannot open $filename.tmp for writing: $!";

$first_line =~ s/some translation/goes here/;

print {$out_fh} $first_line;
print {$out_fh} $_ while <$in_fh>; # sysread/syswrite is probably better

close $in_fh;
close $out_fh;

# overwrite original with modified copy
rename "$filename.tmp", $filename
  or warn "Failed to move $filename.tmp to $filename: $!";

#4


4  

the change you mention (replacing every space by an underscore) doesn't change the line's length, so in theory it could be done inplace.

你提到的改变(用下划线替换每个空格)不会改变线的长度,所以理论上它可以在现场完成。

warning!: untested!

警告!:未经测试!

head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
dd conv=nocreat,notrunc if=tmpfile of=yourfile

i'm not so sure about the conv=... parameters, but it seems that it should make dd overwrite the start of the original file with the transformed line.

我不太确定conv = ...参数,但似乎应该使dd用变换后的行覆盖原始文件的开头。

please note that if you want to do any other transformation, which could alter the line's length, do not, do not do this. you'd have to do a full copy. something like this:

请注意,如果您想进行任何其他可能改变线路长度的转换,请不要这样做。你必须做一个完整的副本。像这样的东西:

head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
tail -n + 2 | cat tmpfile - > transformedfile

#5


-1  

This could be a solution :

这可能是一个解决方案:


use Tie::File;
tie my @array,"Tie::File","path_to_file";
$array[0] = "new text";
untie @array;

Tie::File is one of the modules I use the most , and it's very simple to use . Each element in the array is a line in the file . One of the downsides , however , would be that this loads the whole file in memory .

Tie :: File是我最常用的模块之一,使用起来非常简单。数组中的每个元素都是文件中的一行。然而,其中一个缺点是,这会将整个文件加载到内存中。

#1


20  

To edit the first 10 lines

编辑前10行

sed -i -e '1,10s/ /_/g'

In Perl, you can use the flip-flop operator in scalar context:

在Perl中,您可以在标量上下文中使用触发器运算符:

perl -i -pe 's/ /_/g if 1 .. 10'

#2


10  

I don't think you want to use any solution that requires the data to be written to a new file.

我认为您不想使用任何需要将数据写入新文件的解决方案。

If you're pretty sure that all you need is to change the spaces into underscores in the first line of the large text files, you only have to read the first line, swap the characters and write it back in place:

如果您非常确定所需要的是在大文本文件的第一行中将空格更改为下划线,则只需读取第一行,交换字符并将其写回原位:

#!/usr/bin/env perl
use strict;

my $filename = shift;
open (FH, "+< $filename") || die "can't open $filename: $!";
my $line = <FH>;
$line =~ s/ /_/g;
seek FH, 0, 0; # go back to the start of the file
printf FH $line;
close FH;

To use it, just pass the full path of the file to update:

要使用它,只需传递文件的完整路径即可更新:

# fixheader "/path/to/myfile.txt"

#3


5  

You are unlikely to notice any speed difference between Perl, Python, and sed. Your script will spend most of its time waiting for IO.

您不太可能注意到Perl,Python和sed之间存在任何速度差异。您的脚本将花费大部分时间等待IO。

If the lines are the same length, you can edit in-place, otherwise you will have to create a new file.

如果行的长度相同,则可以就地编辑,否则必须创建新文件。

In Perl:

在Perl中:

#!/usr/bin/env perl
use strict;

my $filename = shift;
open my $in_fh, '<', $filename
  or die "Cannot open $filename for reading: $!";
my $first_line = <$in_fh>;

open my $out_fh, '>', "$filename.tmp"
  or die "Cannot open $filename.tmp for writing: $!";

$first_line =~ s/some translation/goes here/;

print {$out_fh} $first_line;
print {$out_fh} $_ while <$in_fh>; # sysread/syswrite is probably better

close $in_fh;
close $out_fh;

# overwrite original with modified copy
rename "$filename.tmp", $filename
  or warn "Failed to move $filename.tmp to $filename: $!";

#4


4  

the change you mention (replacing every space by an underscore) doesn't change the line's length, so in theory it could be done inplace.

你提到的改变(用下划线替换每个空格)不会改变线的长度,所以理论上它可以在现场完成。

warning!: untested!

警告!:未经测试!

head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
dd conv=nocreat,notrunc if=tmpfile of=yourfile

i'm not so sure about the conv=... parameters, but it seems that it should make dd overwrite the start of the original file with the transformed line.

我不太确定conv = ...参数,但似乎应该使dd用变换后的行覆盖原始文件的开头。

please note that if you want to do any other transformation, which could alter the line's length, do not, do not do this. you'd have to do a full copy. something like this:

请注意,如果您想进行任何其他可能改变线路长度的转换,请不要这样做。你必须做一个完整的副本。像这样的东西:

head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
tail -n + 2 | cat tmpfile - > transformedfile

#5


-1  

This could be a solution :

这可能是一个解决方案:


use Tie::File;
tie my @array,"Tie::File","path_to_file";
$array[0] = "new text";
untie @array;

Tie::File is one of the modules I use the most , and it's very simple to use . Each element in the array is a line in the file . One of the downsides , however , would be that this loads the whole file in memory .

Tie :: File是我最常用的模块之一,使用起来非常简单。数组中的每个元素都是文件中的一行。然而,其中一个缺点是,这会将整个文件加载到内存中。