I have been working on several Perl scripts that process large fixed-width data files, extracting small substrings out of each data record. I had imagined that delegating the extracting of substrings to method calls would be costly because of the overhead of copying the data record into the @_ array. So I ran the following to compare (a) direct call to substr(), (b) method call passing the data record as a string, and (c) method call passing the data record by reference.
我一直在研究处理大型固定宽度数据文件的几个Perl脚本,从每个数据记录中提取小的子串。我曾经想过,将子串的提取委托给方法调用会很昂贵,因为将数据记录复制到@_数组的开销很大。所以我运行以下命令来比较(a)直接调用substr(),(b)方法调用将数据记录作为字符串传递,以及(c)方法调用通过引用传递数据记录。
use strict;
use warnings;
use Benchmark qw(timethese);
my $RECORD = '0' x 50000;
my $direct = sub { my $v = substr( $RECORD, $_, 1) for 0..999 };
my $byVal = sub { my $v = ByVal ( $RECORD, $_) for 0..999 };
my $byRef = sub { my $v = ByRef (\$RECORD, $_) for 0..999 };
sub ByVal { return substr( $_[0], $_[1], 1) }
sub ByRef { return substr(${$_[0]}, $_[1], 1) }
timethese( 10000, {
direct => $direct,
byVal => $byVal,
byRef => $byRef,
} );
my $byVal2loc = sub { my $v = ByVal2loc( $RECORD, $_) for 0..999 };
my $byRef2loc = sub { my $v = ByRef2loc(\$RECORD, $_) for 0..999 };
sub ByVal2loc { my $arg = shift; return substr( $arg, $_[0], 1) }
sub ByRef2loc { my $arg = shift; return substr( $$arg, $_[0], 1) }
timethese( $ARGV[0], {
byVal2loc => $byVal2loc,
byRef2loc => $byRef2loc,
} );
# Produces this output:
Benchmark: timing 10000 iterations of byRef, byVal, direct...
byRef: 19 wallclock secs...
byVal: 15 wallclock secs...
direct: 4 wallclock secs...
Benchmark: timing 10000 iterations of byRef2loc, byVal2loc...
byRef2loc: 21 wallclock secs...
byVal2loc: 119 wallclock secs...
As expected, the direct method was the fastest. However, I was surprised to find no penalty related to the "copying of data" that I had been imagining. Even when I increased the width of the record to outlandish proportions (for example, a billion characters), the by-value and by-reference benchmarks were basically the same.
正如所料,直接方法是最快的。但是,我很惊讶地发现没有与我想象的“复制数据”相关的惩罚。即使我将记录的宽度增加到奇怪的比例(例如,十亿个字符),按值和参考基准也基本相同。
It seems that when passing arguments to methods, Perl does not copy data. I guess this makes sense upon further reflection about the aliasing power of @_. The arguments are passed by reference, not by value.
似乎在将参数传递给方法时,Perl不会复制数据。我想这有助于进一步反思@_的混叠能力。参数通过引用传递,而不是通过值传递。
However, it is a limited form of by-reference passing, because the references in @_ cannot be assigned directly to a local variable within the subroutine. Such assignments do result in data copying, as illustrated by the second set of benchmarks.
但是,它是一种受限制的引用传递形式,因为@_中的引用不能直接分配给子例程中的局部变量。这种分配确实导致数据复制,如第二组基准所示。
Am I understanding this correctly?
我理解正确吗?
3 个解决方案
#1
Yes, assignments copy; just passing arguments do not. You can alias lexicals to elements in @_ using Lexical::Alias, however. This modified benchmark shows doing that a third as fast as using a reference, but consistently so regardless of the length of $RECORD:
是的,作业复制;只是传递参数不。但是,您可以使用Lexical :: Alias将词法替换为@_中的元素。这个修改过的基准测试显示,使用引用的速度是第三快,但无论$ RECORD的长度如何,都是如此:
use strict;
use warnings;
use Benchmark qw(timethese);
use Lexical::Alias;
my $RECORD = '0' x 5000000;
my $byVal2loc = sub { my $v = ByVal2loc( $RECORD, $_) for 0..999 };
my $byRef2loc = sub { my $v = ByRef2loc(\$RECORD, $_) for 0..999 };
my $byAlias2loc = sub { my $v = ByAlias2loc( $RECORD, $_ ) for 0..999 };
sub ByVal2loc { my $arg = shift; return substr( $arg, $_[0], 1) }
sub ByRef2loc { my $arg = shift; return substr( $$arg, $_[0], 1) }
sub ByAlias2loc { my $arg; alias($_[0], $arg); return substr( $arg, $_[0], 1 ) }
timethese( $ARGV[0], {
byVal2loc => $byVal2loc,
byRef2loc => $byRef2loc,
byAlias2loc => $byAlias2loc,
} );
# output:
Benchmark: running byAlias2loc, byRef2loc, byVal2loc for at least 3 CPU seconds...
byAlias2loc: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 430.70/s (n=1361)
byRef2loc: 4 wallclock secs ( 3.24 usr + 0.00 sys = 3.24 CPU) @ 1329.63/s (n=4308)
byVal2loc: 5 wallclock secs ( 4.95 usr + 0.01 sys = 4.96 CPU) @ 0.40/s (n=2)
(warning: too few iterations for a reliable count)
(Directly using alias_r instead of the alias helper function is marginally faster.)
(直接使用alias_r而不是别名辅助函数稍微快一些。)
#2
IIRC, in a Perl 'sub', the @_
array is already a set of aliases (references) to the variables. If you modify $_[0]
, you affect the variable in the calling function.
IIRC,在Perl'sub'中,@ _数组已经是变量的一组别名(引用)。如果修改$ _ [0],则会影响调用函数中的变量。
#!/bin/perl -w
use strict;
sub x
{
print "x = $_[0]\n";
$_[0] = "pinkerton";
print "x = $_[0]\n";
}
my $y = "abc";
print "y = $y\n";
x($y);
print "y = $y\n";
The output is:
输出是:
y = abc
x = abc
x = pinkerton
y = pinkerton
#3
If you want to give the elements of @_ meaningful names, you can make aliases to them using Data::Alias, so
如果要给出@_有意义名称的元素,可以使用Data :: Alias为它们创建别名,所以
use Data::Alias;
sub foo {
alias my ($a, $b, $c) = @_;
}
You can do similar things aliasing into arrays and hashes.
你可以在阵列和散列中做类似的别名。
alias my ($a, $b, @c) = @_;
alias my ($a, $b, %c) = @_;
In fact, aliasing into a hash
实际上,别名为哈希
alias my (%p) = @_;
is especially powerful as it provides pass-by-reference named parameters. Nice.
特别强大,因为它提供了传递引用的命名参数。尼斯。
(Data::Alias provides a superset of the functionality of Lexical::Alias; it's more general purpose and more powerful.)
(Data :: Alias提供了Lexical :: Alias功能的超集;它更通用,功能更强大。)
#1
Yes, assignments copy; just passing arguments do not. You can alias lexicals to elements in @_ using Lexical::Alias, however. This modified benchmark shows doing that a third as fast as using a reference, but consistently so regardless of the length of $RECORD:
是的,作业复制;只是传递参数不。但是,您可以使用Lexical :: Alias将词法替换为@_中的元素。这个修改过的基准测试显示,使用引用的速度是第三快,但无论$ RECORD的长度如何,都是如此:
use strict;
use warnings;
use Benchmark qw(timethese);
use Lexical::Alias;
my $RECORD = '0' x 5000000;
my $byVal2loc = sub { my $v = ByVal2loc( $RECORD, $_) for 0..999 };
my $byRef2loc = sub { my $v = ByRef2loc(\$RECORD, $_) for 0..999 };
my $byAlias2loc = sub { my $v = ByAlias2loc( $RECORD, $_ ) for 0..999 };
sub ByVal2loc { my $arg = shift; return substr( $arg, $_[0], 1) }
sub ByRef2loc { my $arg = shift; return substr( $$arg, $_[0], 1) }
sub ByAlias2loc { my $arg; alias($_[0], $arg); return substr( $arg, $_[0], 1 ) }
timethese( $ARGV[0], {
byVal2loc => $byVal2loc,
byRef2loc => $byRef2loc,
byAlias2loc => $byAlias2loc,
} );
# output:
Benchmark: running byAlias2loc, byRef2loc, byVal2loc for at least 3 CPU seconds...
byAlias2loc: 3 wallclock secs ( 3.16 usr + 0.00 sys = 3.16 CPU) @ 430.70/s (n=1361)
byRef2loc: 4 wallclock secs ( 3.24 usr + 0.00 sys = 3.24 CPU) @ 1329.63/s (n=4308)
byVal2loc: 5 wallclock secs ( 4.95 usr + 0.01 sys = 4.96 CPU) @ 0.40/s (n=2)
(warning: too few iterations for a reliable count)
(Directly using alias_r instead of the alias helper function is marginally faster.)
(直接使用alias_r而不是别名辅助函数稍微快一些。)
#2
IIRC, in a Perl 'sub', the @_
array is already a set of aliases (references) to the variables. If you modify $_[0]
, you affect the variable in the calling function.
IIRC,在Perl'sub'中,@ _数组已经是变量的一组别名(引用)。如果修改$ _ [0],则会影响调用函数中的变量。
#!/bin/perl -w
use strict;
sub x
{
print "x = $_[0]\n";
$_[0] = "pinkerton";
print "x = $_[0]\n";
}
my $y = "abc";
print "y = $y\n";
x($y);
print "y = $y\n";
The output is:
输出是:
y = abc
x = abc
x = pinkerton
y = pinkerton
#3
If you want to give the elements of @_ meaningful names, you can make aliases to them using Data::Alias, so
如果要给出@_有意义名称的元素,可以使用Data :: Alias为它们创建别名,所以
use Data::Alias;
sub foo {
alias my ($a, $b, $c) = @_;
}
You can do similar things aliasing into arrays and hashes.
你可以在阵列和散列中做类似的别名。
alias my ($a, $b, @c) = @_;
alias my ($a, $b, %c) = @_;
In fact, aliasing into a hash
实际上,别名为哈希
alias my (%p) = @_;
is especially powerful as it provides pass-by-reference named parameters. Nice.
特别强大,因为它提供了传递引用的命名参数。尼斯。
(Data::Alias provides a superset of the functionality of Lexical::Alias; it's more general purpose and more powerful.)
(Data :: Alias提供了Lexical :: Alias功能的超集;它更通用,功能更强大。)