使用Perl的两个数组的差异

I have two arrays. I need to check and see if the elements of one appear in the other one.

我有两个数组。我需要检查一个元素是否出现在另一个元素中。

Is there a more efficient way to do it than nested loops? I have a few thousand elements in each and need to run the program frequently.

有比嵌套循环更有效的方法吗?我有几千个元素，需要经常运行程序。

10 个解决方案

#1

Another way to do it is to use Array::Utils

另一种方法是使用数组::Utils

use Array::Utils qw(:all);

my @a = qw( a b c d );
my @b = qw( c d e f );

# symmetric difference
my @diff = array_diff(@a, @b);

# intersection
my @isect = intersect(@a, @b);

# unique union
my @unique = unique(@a, @b);

# check if arrays contain same members
if ( !array_diff(@a, @b) ) {
        # do something
}

# get items from array @a that are not in array @b
my @minus = array_minus( @a, @b );

#2

perlfaq4 to the rescue:

perlfaq4救援:

How do I compute the difference of two arrays? How do I compute the intersection of two arrays?

如何计算两个数组的差值?如何计算两个数组的交点?

Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:

使用一个散列。下面的代码可以同时做这两件事，甚至更多。它假设在给定的数组中每个元素都是唯一的:
   @union = @intersection = @difference = ();
    %count = ();
    foreach $element (@array1, @array2) { $count{$element}++ }
    foreach $element (keys %count) {
            push @union, $element;
            push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
    }

If you properly declare your variables, the code looks more like the following:

如果您正确地声明了变量，代码看起来更像如下所示:

my %count;
for my $element (@array1, @array2) { $count{$element}++ }

my ( @union, @intersection, @difference );
for my $element (keys %count) {
    push @union, $element;
    push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
}

#3

You need to provide a lot more context. There are more efficient ways of doing that ranging from:

您需要提供更多的上下文。有更有效的方法可以做到这一点，包括:

Go outside of Perl and use shell (sort + comm)

跳出Perl，使用shell (sort + comm)
map one array into a Perl hash and then loop over the other one checking hash membership. This has linear complexity ("M+N" - basically loop over each array once) as opposed to nested loop which has "M*N" complexity)

将一个数组映射到Perl散列中，然后对另一个数组进行循环，检查散列成员关系。它具有线性复杂度(“M+N”——基本上是对每个数组执行一次循环)，而不是具有“M*N”复杂度的嵌套循环)

Example:

例子:
```
my %second = map {$_=>1} @second;
my @only_in_first = grep { !$second{$_} } @first; 
# use a foreach loop with `last` instead of "grep" 
# if you only want yes/no answer instead of full list
```
Use a Perl module that does the last bullet point for you (List::Compare was mentioned in comments)

使用Perl模块为您完成最后的要点(在注释中提到::Compare)
Do it based on timestamps of when elements were added if the volume is very large and you need to re-compare often. A few thousand elements is not really big enough, but I recently had to diff 100k sized lists.

如果卷非常大，并且需要经常重新比较，则根据添加元素的时间戳来执行。几千个元素还不够大，但我最近不得不减少10万个大小的列表。

#4

You can try Arrays::Utils, and it makes it look nice and simple, but it's not doing any powerful magic on the back end. Here's the array_diffs code:

你可以尝试数组::Utils，它使它看起来很好和简单，但是它在后端没有任何强大的魔力。这是array_diffs代码:

sub array_diff(\@\@) {
    my %e = map { $_ => undef } @{$_[1]};
    return @{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } @{ $_[0] } ), keys %e ] };
}

Since Arrays::Utils isn't a standard module, you need to ask yourself if it's worth the effort to install and maintain this module. Otherwise, it's pretty close to DVK's answer.

由于数组::Utils不是一个标准的模块，您需要问问自己，是否值得花费精力来安装和维护这个模块。否则，它就很接近DVK的答案了。

There are certain things you must watch out for, and you have to define what you want to do in that particular case. Let's say:

有一些事情你必须注意，你必须定义你想在那种情况下做什么。比方说:

@array1 = qw(1 1 2 2 3 3 4 4 5 5);
@array2 = qw(1 2 3 4 5);

Are these arrays the same? Or, are they different? They have the same values, but there are duplicates in @array1 and not @array2.

这些数组相同吗?或者,他们是不同的吗?它们具有相同的值，但是在@array1中有重复，而不是@array2。

What about this?

这是什么?

@array1 = qw( 1 1 2 3 4 5 );
@array2 = qw( 1 1 2 3 4 5 );

I would say that these arrays are the same, but Array::Utils::arrays_diff begs to differ. This is because Array::Utils assumes that there are no duplicate entries.

我想说这些数组是相同的，但是数组::Utils::arrays_diff却不同。这是因为Array::Utils假定没有重复的条目。

And, even the Perl FAQ pointed out by mob also says that It assumes that each element is unique in a given array. Is this an assumption you can make?

而且，即使是mob指出的Perl FAQ也表示，它假设每个元素在给定的数组中都是唯一的。你能做出这样的假设吗?

No matter what, hashes are the answer. It's easy and quick to look up a hash. The problem is what do you want to do with unique values.

不管怎样，哈希就是答案。查找散列是很容易的。问题是你想用唯一值做什么。

Here's a solid solution that assumes duplicates don't matter:

这里有一个可靠的解决方案，假设重复并不重要:

sub array_diff {
    my @array1 = @{ shift() };
    my @array2 = @{ shift() }; 

    my %array1_hash;
    my %array2_hash;

    # Create a hash entry for each element in @array1
    for my $element ( @array1 ) {
       $array1_hash{$element} = @array1;
    }

    # Same for @array2: This time, use map instead of a loop
    map { $array_2{$_} = 1 } @array2;

    for my $entry ( @array2 ) {
        if ( not $array1_hash{$entry} ) {
            return 1;  #Entry in @array2 but not @array1: Differ
        }
    }
    if ( keys %array_hash1 != keys %array_hash2 ) {
       return 1;   #Arrays differ
    }
    else {
       return 0;   #Arrays contain the same elements
    }
}

If duplicates do matter, you'll need a way to count them. Here's using map not just to create a hash keyed by each element in the array, but also count the duplicates in the array:

如果重复确实重要，您将需要一种方法来计数它们。这里使用map不仅是为了创建一个由数组中的每个元素键控的散列，还可以计算数组中的重复数据:

my %array1_hash;
my %array2_hash;
map { $array1_hash{$_} += 1 } @array1;
map { $array2_hash{$_} += 2 } @array2;

Now, you can go through each hash and verify that not only do the keys exist, but that their entries match

现在，您可以遍历每个散列，并验证它们不仅存在，而且它们的条目匹配。

for my $key ( keys %array1_hash ) {
    if ( not exists $array2_hash{$key} 
       or $array1_hash{$key} != $array2_hash{$key} ) {
       return 1;   #Arrays differ
    }
 }

You will only exit the for loop if all of the entries in %array1_hash match their corresponding entries in %array2_hash. Now, you have to show that all of the entries in %array2_hash also match their entries in %array1_hash, and that %array2_hash doesn't have more entries. Fortunately, we can do what we did before:

只有当%array1_hash中的所有条目与它们在%array2_hash中的对应条目匹配时，才会退出for循环。现在，您必须显示%array2_hash中的所有条目也匹配它们在%array1_hash中的条目，而%array2_hash没有更多条目。幸运的是，我们可以做我们以前做过的:

if ( keys %array2_hash != keys %array1_hash ) {
     return 1;  #Arrays have a different number of keys: Don't match
}
else {
     return;    #Arrays have the same keys: They do match
}

#5

n + n log n algorithm, if sure that elements are unique in each array (as hash keys)

n + n log n算法，如果每个数组中的元素都是唯一的(作为哈希键)

my %count = (); 
foreach my $element (@array1, @array2) { 
    $count{$element}++;
}
my @difference = grep { $count{$_} == 1 } keys %count;
my @intersect  = grep { $count{$_} == 2 } keys %count;
my @union      = keys %count;

So if I'm not sure of unity and want to check presence of the elements of array1 inside array2,

如果我不确定是否统一想检查array1的元素在array2中的存在性，

my %count = (); 
foreach (@array1) {
    $count{$_} = 1 ;
};
foreach (@array2) {
    $count{$_} = 2 if $count{$_};
};
# N log N
if (grep { $_ == 1 } values %count) {
    return 'Some element of array1 does not appears in array2'
} else {
    return 'All elements of array1 are in array2'.
} 
# N + N log N

#6

my @a = (1,2,3); 
my @b=(2,3,1); 
print "Equal" if grep { $_ ~~ @b } @a == @b;

#7

Try to use List:Compare . IT has solutions for all the operations that can be performed on arrays. https://metacpan.org/pod/List::Compare

尝试使用列表:比较。它为所有可以在数组上执行的操作提供了解决方案。https://metacpan.org/pod/List:比较

#8

You want to compare each element of @x against the element of the same index in @y, right? This will do it.

你想比较@x的每个元素和@y中相同索引的元素，对吧?这将做它。

print "Index: $_ => \@x: $x[$_], \@y: $y[$_]\n" 
    for grep { $x[$_] != $y[$_] } 0 .. $#x;

...or...

…或…

foreach( 0 .. $#x ) {
    print "Index: $_ => \@x: $x[$_], \@y: $y[$_]\n" if $x[$_] != $y[$_];
}

Which you choose kind of depends on whether you're more interested in keeping a list of indices to the dissimilar elements, or simply interested in processing the mismatches one by one. The grep version is handy for getting the list of mismatches. (original post)

你选择哪一种取决于你是更感兴趣保持一系列的指数到不同的元素，还是仅仅感兴趣处理一个接一个的不匹配。grep版本对于获取不匹配的列表非常方便。(早前发布的文章)

#9

You can use this for getting diffrence between two arrays

你可以用它来得到两个数组之间的衍射

#!/usr/bin/perl -w
use strict;

my @list1 = (1, 2, 3, 4, 5);
my @list2 = (2, 3, 4);

my %diff;

@diff{ @list1 } = undef;
delete @diff{ @list2 };

#10

Not elegant, but easy to understand:

不雅，但容易理解:

#!/usr/local/bin/perl 
use strict;
my $file1 = shift or die("need file1");
my $file2 = shift or die("need file2");;
my @file1lines = split/\n/,`cat $file1`;
my @file2lines = split/\n/,`cat $file2`;
my %lines;
foreach my $file1line(@file1lines){
    $lines{$file1line}+=1;
}
foreach my $file2line(@file2lines){
    $lines{$file2line}+=2;
}
while(my($key,$value)=each%lines){
    if($value == 1){
        print "$key is in only $file1\n";
    }elsif($value == 2){
        print "$key is in only $file2\n";
    }elsif($value == 3){
        print "$key is in both $file1 and $file2\n";
    }
}
exit;
__END__

#1