如何从一个Perl数组中添加其他数组中没有的元素?

时间:2021-11-15 07:09:48

Given:

考虑到:

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

What's the quickest way in Perl to insert in mylist2 all elements that are in mylist1 and not already in mylist2 (ABCDE).

在Perl中插入mylist2中所有在mylist1中而不在mylist2中(ABCDE)的元素的最快方法是什么?

5 个解决方案

#1


12  

my %k;
map { $k{$_} = 1 } @mylist1;
map { $k{$_} = 1 } @mylist2;
@mylist2 = keys %k;

Alternatively:

另外:

my %k;
map { $k{$_} = 1 } @mylist2;
push(@mylist2, grep { !exists $k{$_} } @mylist1);

Actually - these might be wrong because they don't account for whether duplicates might exist in either of the original lists.

实际上,这些可能是错误的,因为它们没有考虑到重复是否可能存在于任何一个原始列表中。

You didn't say in your question whether the lists are supposed to represent sets (which can't contain duplicates) or just plain lists. That you effectively want @mylist2 = @mylist1 U @mylist2 suggests that you are treating them as sets.

在您的问题中,您没有说列表应该表示集合(不能包含重复)还是只是普通列表。您想要的有效的@mylist2 = @mylist1 U @mylist2表明您将它们视为集合。

EDIT: changed increment to assign - saves a read of the hash value

编辑:改变增量分配-保存哈希值的读取

#2


23  

You could just use the List::MoreUtils module's uniq:

您可以使用列表::MoreUtils模块的uniq:

use List::MoreUtils qw(uniq);

my @mylist1;
push( @mylist1, "A" );
push( @mylist1, "B" );
push( @mylist1, "C" );

my @mylist2;
push( @mylist2, "A" );
push( @mylist2, "D" );
push( @mylist2, "E" );

@mylist2 = uniq( @mylist1, @mylist2 );

printf "%s\n", ( join ',', @mylist2 );    # A,B,C,D,E

#3


2  

[Original answer as of 2008-11-27 down to "Since the question"; the analysis from there on is new as of 2008-11-29.]

[从2008年11月27日开始,一直到“因为这个问题”;[参考译文]从那以后的分析到2008年11月29日是新的。

Quickest - not sure. This works, though it is not pretty:

最快的,不确定。这是可行的,尽管不太好:

#!/bin/perl -w
use strict;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value, @array) = @_;
    foreach my $element (@array)
    {
        return 1 if $value eq $element;
    }
    return 0;
}

@mylist2 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);

print sort @mylist2, "\n";

This avoids converting the arrays into hashes - but for large arrays, the value_in sub may be slow.

这避免了将数组转换成散列——但是对于大数组,value_in子可能是慢的。

Since the question was "what is the quickest method", I did some benchmarking. To my none-too-vast surprise, my method was slowest. Somewhat to my surprise, the fastest method was not from List::MoreUtils. Here's the test code and the results - using a modified version of my original proposal.

由于问题是“什么是最快的方法”,我做了一些基准测试。令我吃惊的是,我的方法是最慢的。令我惊讶的是,最快的方法并不是List: MoreUtils。这是测试代码和结果——使用我原始提案的修改版本。

#!/bin/perl -w
use strict;
use List::MoreUtils  qw(uniq);
use Benchmark::Timer;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value) = shift @_;
    return grep { $value eq $_ } @_;
}

my @mylist3;
my @mylist4;
my @mylist5;
my @mylist6;

my $t = Benchmark::Timer->new(skip=>1);
my $iterations = 10000;

for my $i (1..$iterations)
{
    $t->start('JLv2');
    @mylist3 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
    $t->stop('JLv2');
}
print $t->report('JLv2');

for my $i (1..$iterations)
{
    $t->start('LMU');
    @mylist4 = uniq( @mylist1, @mylist2 );
    $t->stop('LMU');
}
print $t->report('LMU');

for my $i (1..$iterations)
{
    @mylist5 = @mylist2;
    $t->start('HV1');
    my %k;
    map { $k{$_} = 1 } @mylist5;
    push(@mylist5, grep { !exists $k{$_} } @mylist1);
    $t->stop('HV1');
}
print $t->report('HV1');

for my $i (1..$iterations)
{
    $t->start('HV2');
    my %k;
    map { $k{$_} = 1 } @mylist1;
    map { $k{$_} = 1 } @mylist2;
    @mylist6 = keys %k;
    $t->stop('HV2');
}
print $t->report('HV2');


print sort(@mylist3), "\n";
print sort(@mylist4), "\n";
print sort(@mylist5), "\n";
print sort(@mylist6), "\n";

Black JL: perl xxx.pl
9999 trials of JLv2 (1.298s total), 129us/trial
9999 trials of LMU (968.176ms total), 96us/trial
9999 trials of HV1 (516.799ms total), 51us/trial
9999 trials of HV2 (768.073ms total), 76us/trial
ABCDE
ABCDE
ABCDE
ABCDE
Black JL:

This is Perl 5.10.0 compiled for 32-bit SPARC with multiplicity on an antique Sun E450 running Solaris 10.

这是为32位SPARC编译的Perl 5.10.0,在运行Solaris 10的古董Sun E450上具有多重性。

I believe that the test setups are fair; they all generate their answer into a new array, separate from mylist1 and mylist2 (so mylist1 and mylist2 can be reused for the next test). The answer designated HV1 (hash values 1) has the timing start after the assignment to @mylist5, which I think is correct. However, when I did the timing with the start before the assignment, it was still quickest:

我相信考试设置是公平的;它们都将它们的答案生成一个新的数组,与mylist1和mylist2分离(因此可以在下一次测试中重用mylist1和mylist2)。指定为HV1(哈希值1)的答案在分配给@mylist5之后开始计时,我认为这是正确的。然而,当我在作业前对开始进行计时时,仍然是最快的:

Black JL: perl xxx.pl
9999 trials of JLv2 (1.293s total), 129us/trial
9999 trials of LMU (938.504ms total), 93us/trial
9999 trials of HV1 (505.998ms total), 50us/trial
9999 trials of HV2 (756.722ms total), 75us/trial
ABCDE
ABCDE
ABCDE
ABCDE
9999 trials of HV1A (655.582ms total), 65us/trial
Black JL:

#4


1  

Because of your "(ABCDE)" comment, I'm assuming you actually meant push onto mylist1 those elements in mylist2 that aren't in mylist1. If this assumption is incorrect, you need to say something about what order you want things to end up in.

由于您的“(ABCDE)”注释,我假设您实际上是指在mylist2中,而不是mylist1中的那些元素。如果这个假设是不正确的,你需要说明你想要的东西的顺序。

First, store which elements are in mylist1 in a hash, then push all those in mylist2 not found in the hash onto mylist1.

首先,以散列形式存储mylist1中的哪些元素,然后将散列中没有的所有元素推入mylist1。

my %in_mylist1;
@in_mylist1{@mylist1} = ();
push @mylist1, grep ! exists $in_mylist1{$_}, @mylist2;

#5


0  

my(%work);
@work{@mylist1, @mylist2} = undef;
@mylist2 = sort keys %work;

#1


12  

my %k;
map { $k{$_} = 1 } @mylist1;
map { $k{$_} = 1 } @mylist2;
@mylist2 = keys %k;

Alternatively:

另外:

my %k;
map { $k{$_} = 1 } @mylist2;
push(@mylist2, grep { !exists $k{$_} } @mylist1);

Actually - these might be wrong because they don't account for whether duplicates might exist in either of the original lists.

实际上,这些可能是错误的,因为它们没有考虑到重复是否可能存在于任何一个原始列表中。

You didn't say in your question whether the lists are supposed to represent sets (which can't contain duplicates) or just plain lists. That you effectively want @mylist2 = @mylist1 U @mylist2 suggests that you are treating them as sets.

在您的问题中,您没有说列表应该表示集合(不能包含重复)还是只是普通列表。您想要的有效的@mylist2 = @mylist1 U @mylist2表明您将它们视为集合。

EDIT: changed increment to assign - saves a read of the hash value

编辑:改变增量分配-保存哈希值的读取

#2


23  

You could just use the List::MoreUtils module's uniq:

您可以使用列表::MoreUtils模块的uniq:

use List::MoreUtils qw(uniq);

my @mylist1;
push( @mylist1, "A" );
push( @mylist1, "B" );
push( @mylist1, "C" );

my @mylist2;
push( @mylist2, "A" );
push( @mylist2, "D" );
push( @mylist2, "E" );

@mylist2 = uniq( @mylist1, @mylist2 );

printf "%s\n", ( join ',', @mylist2 );    # A,B,C,D,E

#3


2  

[Original answer as of 2008-11-27 down to "Since the question"; the analysis from there on is new as of 2008-11-29.]

[从2008年11月27日开始,一直到“因为这个问题”;[参考译文]从那以后的分析到2008年11月29日是新的。

Quickest - not sure. This works, though it is not pretty:

最快的,不确定。这是可行的,尽管不太好:

#!/bin/perl -w
use strict;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value, @array) = @_;
    foreach my $element (@array)
    {
        return 1 if $value eq $element;
    }
    return 0;
}

@mylist2 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);

print sort @mylist2, "\n";

This avoids converting the arrays into hashes - but for large arrays, the value_in sub may be slow.

这避免了将数组转换成散列——但是对于大数组,value_in子可能是慢的。

Since the question was "what is the quickest method", I did some benchmarking. To my none-too-vast surprise, my method was slowest. Somewhat to my surprise, the fastest method was not from List::MoreUtils. Here's the test code and the results - using a modified version of my original proposal.

由于问题是“什么是最快的方法”,我做了一些基准测试。令我吃惊的是,我的方法是最慢的。令我惊讶的是,最快的方法并不是List: MoreUtils。这是测试代码和结果——使用我原始提案的修改版本。

#!/bin/perl -w
use strict;
use List::MoreUtils  qw(uniq);
use Benchmark::Timer;

my @mylist1;
push(@mylist1,"A");
push(@mylist1,"B");
push(@mylist1,"C");

my @mylist2;
push(@mylist2,"A");
push(@mylist2,"D");
push(@mylist2,"E");

sub value_in
{
    my($value) = shift @_;
    return grep { $value eq $_ } @_;
}

my @mylist3;
my @mylist4;
my @mylist5;
my @mylist6;

my $t = Benchmark::Timer->new(skip=>1);
my $iterations = 10000;

for my $i (1..$iterations)
{
    $t->start('JLv2');
    @mylist3 = (@mylist2, grep { ! value_in($_, @mylist2) } @mylist1);
    $t->stop('JLv2');
}
print $t->report('JLv2');

for my $i (1..$iterations)
{
    $t->start('LMU');
    @mylist4 = uniq( @mylist1, @mylist2 );
    $t->stop('LMU');
}
print $t->report('LMU');

for my $i (1..$iterations)
{
    @mylist5 = @mylist2;
    $t->start('HV1');
    my %k;
    map { $k{$_} = 1 } @mylist5;
    push(@mylist5, grep { !exists $k{$_} } @mylist1);
    $t->stop('HV1');
}
print $t->report('HV1');

for my $i (1..$iterations)
{
    $t->start('HV2');
    my %k;
    map { $k{$_} = 1 } @mylist1;
    map { $k{$_} = 1 } @mylist2;
    @mylist6 = keys %k;
    $t->stop('HV2');
}
print $t->report('HV2');


print sort(@mylist3), "\n";
print sort(@mylist4), "\n";
print sort(@mylist5), "\n";
print sort(@mylist6), "\n";

Black JL: perl xxx.pl
9999 trials of JLv2 (1.298s total), 129us/trial
9999 trials of LMU (968.176ms total), 96us/trial
9999 trials of HV1 (516.799ms total), 51us/trial
9999 trials of HV2 (768.073ms total), 76us/trial
ABCDE
ABCDE
ABCDE
ABCDE
Black JL:

This is Perl 5.10.0 compiled for 32-bit SPARC with multiplicity on an antique Sun E450 running Solaris 10.

这是为32位SPARC编译的Perl 5.10.0,在运行Solaris 10的古董Sun E450上具有多重性。

I believe that the test setups are fair; they all generate their answer into a new array, separate from mylist1 and mylist2 (so mylist1 and mylist2 can be reused for the next test). The answer designated HV1 (hash values 1) has the timing start after the assignment to @mylist5, which I think is correct. However, when I did the timing with the start before the assignment, it was still quickest:

我相信考试设置是公平的;它们都将它们的答案生成一个新的数组,与mylist1和mylist2分离(因此可以在下一次测试中重用mylist1和mylist2)。指定为HV1(哈希值1)的答案在分配给@mylist5之后开始计时,我认为这是正确的。然而,当我在作业前对开始进行计时时,仍然是最快的:

Black JL: perl xxx.pl
9999 trials of JLv2 (1.293s total), 129us/trial
9999 trials of LMU (938.504ms total), 93us/trial
9999 trials of HV1 (505.998ms total), 50us/trial
9999 trials of HV2 (756.722ms total), 75us/trial
ABCDE
ABCDE
ABCDE
ABCDE
9999 trials of HV1A (655.582ms total), 65us/trial
Black JL:

#4


1  

Because of your "(ABCDE)" comment, I'm assuming you actually meant push onto mylist1 those elements in mylist2 that aren't in mylist1. If this assumption is incorrect, you need to say something about what order you want things to end up in.

由于您的“(ABCDE)”注释,我假设您实际上是指在mylist2中,而不是mylist1中的那些元素。如果这个假设是不正确的,你需要说明你想要的东西的顺序。

First, store which elements are in mylist1 in a hash, then push all those in mylist2 not found in the hash onto mylist1.

首先,以散列形式存储mylist1中的哪些元素,然后将散列中没有的所有元素推入mylist1。

my %in_mylist1;
@in_mylist1{@mylist1} = ();
push @mylist1, grep ! exists $in_mylist1{$_}, @mylist2;

#5


0  

my(%work);
@work{@mylist1, @mylist2} = undef;
@mylist2 = sort keys %work;