使用Perl,如何使用每个数组元素中的数字值对数组进行排序?

时间:2021-01-20 14:27:03

Let's say I have an array, @theArr, which holds 1,000 or so elements such as the following:

假设我有一个数组,@ theArr,它包含大约1,000个元素,如下所示:

01  '12 16 sj.1012804p1012831.93.gz'
02  '12 16 sj.1012832p1012859.94.gz'
03  '12 16 sj.1012860p1012887.95.gz'
04  '12 16 sj.1012888p1012915.96.gz'
05  '12 16 sj.1012916p1012943.97.gz'
06  '12 16 sj.875352p875407.01.gz'
07  '12 16 sj.875408p875435.02.gz'
08  '12 16 sj.875436p875535.03.gz'
09  '12 16 sj.875536p875575.04.gz'
10  '12 16 sj.875576p875603.05.gz'
11  '12 16 sj.875604p875631.06.gz'
12  '12 16 sj.875632p875659.07.gz'
13  '12 16 sj.875660p875687.08.gz'
14  '12 16 sj.875688p875715.09.gz'
15  '12 16 sj.875716p875743.10.gz'
...

If my first set of numbers (between the 'sj.' and the 'p') was always 6 digits, I wouldn't have a problem. But, when the numbers roll over into 7 digits the default sort stops working as the larger 7 digit numbers comes before the smaller 6 digit number.

如果我的第一组数字(在'sj。'和'p'之间)总是6位数,我就不会有问题。但是,当数字翻转为7位数时,默认排序将停止工作,因为较大的7位数字位于较小的6位数字之前。

Is there a way to tell Perl to sort by that number inside the string in each array element?

有没有办法告诉Perl在每个数组元素的字符串中按该数字排序?

4 个解决方案

#1


18  

Looks like you need a Schwartzian Transform:

看起来你需要Schwartzian变换:

#!/usr/bin/perl

use strict;
use warnings;

my @a = <DATA>;

print 
    map  { $_->[1] }                #get the original value back
    sort { $a->[0] <=> $b->[0] }    #sort arrayrefs numerically on the sort value
    map  { /sj\.(.*?)p/; [$1, $_] } #build arrayref of the sort value and orig
    @a;

__DATA__
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.875604p875631.06.gz
12 16 sj.875632p875659.07.gz
12 16 sj.875660p875687.08.gz
12 16 sj.875688p875715.09.gz
12 16 sj.875716p875743.10.gz

#2


3  

You can use a regex to pull the number out of every line inside the block you pass to the sort function:

您可以使用正则表达式从传递给sort函数的块内的每一行中提取数字:

@newArray = sort { my ($anum,$bnum); $a =~ /sj\.([0-9]+)p/; $anum = $1; $b =~ /sj\.(\d+)p/; $bnum = $1; $anum <=> $bnum } @theArr;

However, Chas. Owens's solution is better, since it only does the regex matches once for every element.

但是,查斯。 Owens的解决方案更好,因为它只对正则表达式匹配每个元素一次。

#3


2  

Here's an example that sorts them ascending, assuming you don't care too much about efficiency:

假设您不太关心效率,这里有一个对它们进行排序的示例:

use strict;

my @theArr = split(/\n/, <<END_SAMPLE);
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
END_SAMPLE

my @sortedArr = sort compareBySJ @theArr;

print "Before:\n".join("\n", @theArr)."\n";
print "After:\n".join("\n", @sortedArr)."\n";

sub compareBySJ {
    # Capture the values to compare, against the expected format
    # NOTE: This could be inefficient for large, unsorted arrays
    #       since you'll be matching the same strings repeatedly
    my ($aVal) = $a =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";
    my ($bVal) = $b =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";

    # Return the numerical comparison of the values (ascending order)
    return $aVal <=> $bVal;
}

Outputs:

Before:
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
After:
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz

#4


1  

Yes. The sort function takes an optional comparison function which will be used to compare two elements. It can take the form of either a block of code, or the name of a function to call.

是。 sort函数采用可选的比较函数,用于比较两个元素。它可以采用代码块或要调用的函数名称的形式。

There is an example at the linked document that is similar to what you want to do:

链接文档中有一个与您要执行的操作类似的示例:

# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise

@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
            ||
            uc($a)  cmp  uc($b)
} @old;

#1


18  

Looks like you need a Schwartzian Transform:

看起来你需要Schwartzian变换:

#!/usr/bin/perl

use strict;
use warnings;

my @a = <DATA>;

print 
    map  { $_->[1] }                #get the original value back
    sort { $a->[0] <=> $b->[0] }    #sort arrayrefs numerically on the sort value
    map  { /sj\.(.*?)p/; [$1, $_] } #build arrayref of the sort value and orig
    @a;

__DATA__
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.875604p875631.06.gz
12 16 sj.875632p875659.07.gz
12 16 sj.875660p875687.08.gz
12 16 sj.875688p875715.09.gz
12 16 sj.875716p875743.10.gz

#2


3  

You can use a regex to pull the number out of every line inside the block you pass to the sort function:

您可以使用正则表达式从传递给sort函数的块内的每一行中提取数字:

@newArray = sort { my ($anum,$bnum); $a =~ /sj\.([0-9]+)p/; $anum = $1; $b =~ /sj\.(\d+)p/; $bnum = $1; $anum <=> $bnum } @theArr;

However, Chas. Owens's solution is better, since it only does the regex matches once for every element.

但是,查斯。 Owens的解决方案更好,因为它只对正则表达式匹配每个元素一次。

#3


2  

Here's an example that sorts them ascending, assuming you don't care too much about efficiency:

假设您不太关心效率,这里有一个对它们进行排序的示例:

use strict;

my @theArr = split(/\n/, <<END_SAMPLE);
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
END_SAMPLE

my @sortedArr = sort compareBySJ @theArr;

print "Before:\n".join("\n", @theArr)."\n";
print "After:\n".join("\n", @sortedArr)."\n";

sub compareBySJ {
    # Capture the values to compare, against the expected format
    # NOTE: This could be inefficient for large, unsorted arrays
    #       since you'll be matching the same strings repeatedly
    my ($aVal) = $a =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";
    my ($bVal) = $b =~ /^\d+\s+\d+\s+sj\.(\d+)p/
        or die "Couldn't match against value $a";

    # Return the numerical comparison of the values (ascending order)
    return $aVal <=> $bVal;
}

Outputs:

Before:
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
After:
12 16 sj.875352p875407.01.gz
12 16 sj.875408p875435.02.gz
12 16 sj.875436p875535.03.gz
12 16 sj.875536p875575.04.gz
12 16 sj.875576p875603.05.gz
12 16 sj.1012804p1012831.93.gz
12 16 sj.1012832p1012859.94.gz
12 16 sj.1012860p1012887.95.gz
12 16 sj.1012888p1012915.96.gz
12 16 sj.1012916p1012943.97.gz

#4


1  

Yes. The sort function takes an optional comparison function which will be used to compare two elements. It can take the form of either a block of code, or the name of a function to call.

是。 sort函数采用可选的比较函数,用于比较两个元素。它可以采用代码块或要调用的函数名称的形式。

There is an example at the linked document that is similar to what you want to do:

链接文档中有一个与您要执行的操作类似的示例:

# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case-insensitively otherwise

@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
            ||
            uc($a)  cmp  uc($b)
} @old;