按字母顺序进行AWK字符串比较

时间:2020-12-01 01:06:54

The problem is simple. I have an AWK script and I have two strings (names). If they have the same length, I need to pick the one which is "sooner" in aplhabet according ASCII.

问题很简单。我有一个AWK脚本,我有两个字符串(名称)。如果它们具有相同的长度,我需要根据ASCII选择aplhabet中“更快”的那个。

first example:

第一个例子:

 1st string = "aac", 2nd string = "aab"

result: aab

结果:aab

second example:

第二个例子:

1st string = "Donald J Cat", 2nd string = "Donald J Bat"

result : Donald J Bat

结果:唐纳德J蝙蝠

Is there a simple way how to do it in AWK ?

在AWK中有一个简单的方法吗?

3 个解决方案

#1


2  

With awk:

用awk:

if ("aab" < "aac") {print "aab is sooner"}

#2


0  

assume the compared fields are first and second, print the one that is shorter, or if equal length based on lexical order (aka dictionary order)

假设比较字段是第一个和第二个,打印较短的一个,或者如果基于词汇顺序(也就是字典顺序)等长

awk '...
     len1=length($1); len2=length($2);
     f = len1<len2 || (len1==len2 && $1<$2);
     print f?$1:$2; ...'

if you want case insensitive change to tolower($1)<tolower($2)

如果您希望不区分大小写更改为tolower($ 1) ($>

#3


0  

If you are only dealing with two strings, you can use awk's behavior with string comparison and a ternary to assign the two strings to a single string in the order you describe:

如果您只处理两个字符串,则可以使用字符串比较的awk行为和三元组按照您描述的顺序将两个字符串分配给单个字符串:

$ echo "aac,aab
Donald J Cat,Donald J Bat
zoom batman,ahem Mr President
zzzzzz,a
aa,z" | awk -F, '{s=$1<$2 ? $1 "," $2 : $2 "," $1; print s}'
aab,aac
Donald J Bat,Donald J Cat
ahem Mr President,zoom batman
a,zzzzzz
aa,z

This will print the two words in asciibetical order; an a beats a full hand of zzzz's

这将以asciibetical顺序打印两个单词;一个击败zzzz的全部手

If you wanted to sort more than one string, and you have a recent gawk vs POSIX awk, you can use PROCINFO to traverse an array sorted by values:

如果你想排序多个字符串,并且你有一个最近的gawk vs POSIX awk,你可以使用PROCINFO来遍历按值排序的数组:

echo "aac,aab,Donald J Cat,Donald J Bat
zoom batman,ahem Mr President,zzzzzz,a,aa,zz" | awk -F, '{s="";split("",a);
                                                    for (i=1;i<=NF;i++) a[i]=$i
                                                    PROCINFO["sorted_in"] = "@val_num_asc"
                                                    for (e in a) s=s a[e] ","
                                                    print gensub(",$","","1",s)}'
Donald J Bat,Donald J Cat,aab,aac
a,aa,ahem Mr President,zoom batman,zz,zzzzzz

Note that in asciibetical sorting 'D'<'a'. In gawk, it is easy to write a custom comparison function if needed.

请注意,在asciibetical排序'D'<'a'。在gawk中,如果需要,可以很容易地编写自定义比较函数。

#1


2  

With awk:

用awk:

if ("aab" < "aac") {print "aab is sooner"}

#2


0  

assume the compared fields are first and second, print the one that is shorter, or if equal length based on lexical order (aka dictionary order)

假设比较字段是第一个和第二个,打印较短的一个,或者如果基于词汇顺序(也就是字典顺序)等长

awk '...
     len1=length($1); len2=length($2);
     f = len1<len2 || (len1==len2 && $1<$2);
     print f?$1:$2; ...'

if you want case insensitive change to tolower($1)<tolower($2)

如果您希望不区分大小写更改为tolower($ 1) ($>

#3


0  

If you are only dealing with two strings, you can use awk's behavior with string comparison and a ternary to assign the two strings to a single string in the order you describe:

如果您只处理两个字符串,则可以使用字符串比较的awk行为和三元组按照您描述的顺序将两个字符串分配给单个字符串:

$ echo "aac,aab
Donald J Cat,Donald J Bat
zoom batman,ahem Mr President
zzzzzz,a
aa,z" | awk -F, '{s=$1<$2 ? $1 "," $2 : $2 "," $1; print s}'
aab,aac
Donald J Bat,Donald J Cat
ahem Mr President,zoom batman
a,zzzzzz
aa,z

This will print the two words in asciibetical order; an a beats a full hand of zzzz's

这将以asciibetical顺序打印两个单词;一个击败zzzz的全部手

If you wanted to sort more than one string, and you have a recent gawk vs POSIX awk, you can use PROCINFO to traverse an array sorted by values:

如果你想排序多个字符串,并且你有一个最近的gawk vs POSIX awk,你可以使用PROCINFO来遍历按值排序的数组:

echo "aac,aab,Donald J Cat,Donald J Bat
zoom batman,ahem Mr President,zzzzzz,a,aa,zz" | awk -F, '{s="";split("",a);
                                                    for (i=1;i<=NF;i++) a[i]=$i
                                                    PROCINFO["sorted_in"] = "@val_num_asc"
                                                    for (e in a) s=s a[e] ","
                                                    print gensub(",$","","1",s)}'
Donald J Bat,Donald J Cat,aab,aac
a,aa,ahem Mr President,zoom batman,zz,zzzzzz

Note that in asciibetical sorting 'D'<'a'. In gawk, it is easy to write a custom comparison function if needed.

请注意,在asciibetical排序'D'<'a'。在gawk中,如果需要,可以很容易地编写自定义比较函数。