在bash中比较/区别两个数组

时间:2021-05-17 22:51:55

Is it possible to take the difference of two arrays in bash.
Would be really great if you could suggest me the way to do it.

在bash中可以取两个数组的不同之处吗?如果你能告诉我怎么做就太好了。

Code :

代码:

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) 

Array3 =diff(Array1, Array2)

Array3 ideally should be :
Array3=( "key7" "key8" "key9" "key10" )

Appreciate your help.

感谢你的帮助。

7 个解决方案

#1


22  

If you strictly want Array1 - Array2, then

如果你想要Array1 - Array2,那么

$ Array3=()
$ for i in "${Array1[@]}"; do
>     skip=
>     for j in "${Array2[@]}"; do
>         [[ $i == $j ]] && { skip=1; break; }
>     done
>     [[ -n $skip ]] || Array3+=("$i")
> done
$ declare -p Array3

Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.

运行时可以使用关联数组进行改进,但我个人并不介意。如果您操作了足够的数据,那么shell是错误的工具。


For a symmetric difference like Dennis's answer, existing tools like comm work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

对于对称差异,如Dennis的答案,只要我们稍微调整输入和输出(因为它们是基于行的文件,而不是shell变量),现有的工具如comm工作。

Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm back into an array.

在这里,我们告诉shell使用换行符将数组加入到单个字符串中,并在从comm返回到数组时丢弃制表符。

$ oldIFS=$IFS IFS=$'\n\t'
$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))
comm: file 1 is not in sorted order
$ IFS=$oldIFS
$ declare -p Array3
declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'

It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order to get rid of the warning, or add a | sort -u inside the <(…) process substitution if you can't guarantee order&uniqueness of the input arrays.

它会报错,因为通过字典排序,key1 <…< key9 > key10。但是由于两个输入数组的排序方式类似,因此忽略这个警告是可以的。如果不能保证输入数组的顺序和惟一性,您可以使用—nocheck-order来消除警告,或者在<(…)进程替换中添加| sort -u。

#2


82  

echo ${Array1[@]} ${Array2[@]} | tr ' ' '\n' | sort | uniq -u

Output

输出

key10
key7
key8
key9

You can add sorting if you need

如果需要,可以添加排序。

#3


15  

Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

每当有问题出现时,我的脑海中就会浮现出一些独特的值,而这些值可能是无法排序的。这是我的看法。

Code

#!/bin/bash

diff(){
  awk 'BEGIN{RS=ORS=" "}
       {NR==FNR?a[$0]++:a[$0]--}
       END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")
}

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=($(diff Array1[@] Array2[@]))
echo ${Array3[@]}

Output

$ ./diffArray.sh
key10 key7 key8 key9

*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.

*注*:与其他答案一样,如果数组中有重复键,则只报告一次;这可能是你正在寻找的行为,也可能不是。要处理的awk代码比较混乱,不那么干净。

#4


5  

In Bash 4:

在Bash中4:

declare -A temp    # associative array
for element in "${Array1[@]}" "${Array2[@]}"
do
    ((temp[$element]++))
done
for element in "${!temp[@]}"
do
    if (( ${temp[$element]} > 1 ))
    then
        unset "temp[$element]"
    fi
done
Array3=(${!temp[@]})    # retrieve the keys as values

Edit:

编辑:

ephemient pointed out a potentially serious bug. If an element exists in one array with one or more duplicates and doesn't exist at all in the other array, it will be incorrectly removed from the list of unique values. The version below attempts to handle that situation.

ephemient指出一个潜在的严重缺陷。如果一个元素存在于一个具有一个或多个重复的数组中,而在另一个数组中根本不存在,那么它将被不正确地从唯一值列表中删除。下面的版本尝试处理这种情况。

declare -A temp1 temp2    # associative arrays
for element in "${Array1[@]}"
do
    ((temp1[$element]++))
done

for element in "${Array2[@]}"
do
    ((temp2[$element]++))
done

for element in "${!temp1[@]}"
do
    if (( ${temp1[$element]} >= 1 && ${temp2[$element]-0} >= 1 ))
    then
        unset "temp1[$element]" "temp2[$element]"
    fi
done
Array3=(${!temp1[@]} ${!temp2[@]})

#5


3  

Having ARR1 and ARR2 as arguments, use comm to do the job and mapfile to put it back into RESULT array:

使用ARR1和ARR2作为参数,使用comm来完成作业和mapfile将其返回到结果数组中:

ARR1=("key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10")
ARR2=("key1" "key2" "key3" "key4" "key5" "key6")

mapfile -t RESULT < \
    <(comm -23 \
        <(IFS=$'\n'; echo "${ARR1[*]}" | sort) \
        <(IFS=$'\n'; echo "${ARR2[*]}" | sort) \
    )

echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Note that result may not meet source order.

注意,结果可能不符合源订单。

Bonus aka "that's what you are here for":

奖励,也就是“这就是你在这里的目的”:

function array_diff {
    eval local ARR1=\(\"\${$2[@]}\"\)
    eval local ARR2=\(\"\${$3[@]}\"\)
    local IFS=$'\n'
    mapfile -t $1 < <(comm -23 <(echo "${ARR1[*]}" | sort) <(echo "${ARR2[*]}" | sort))
}

# usage:
array_diff RESULT ARR1 ARR2
echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Using those tricky evals is the least worst option among others dealing with array parameters passing in bash.

在处理在bash中传递的数组参数时,使用这些棘手的evals是最不糟糕的选择。

Also, take a look at comm manpage; based on this code it's very easy to implement, for example, array_intersect: just use -12 as comm options.

另外,看看comm手册;基于这段代码,很容易实现,例如array_intersect:只需使用-12作为comm选项。

#6


2  

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
a1=${Array1[@]};a2=${Array2[@]}; a3=${Array3[@]}
diff(){
    a1="$1"
    a2="$2"
    awk -va1="$a1" -va2="$a2" '
     BEGIN{
       m= split(a1, A1," ")
       n= split(a2, t," ")
       for(i=1;i<=n;i++) { A2[t[i]] }
       for (i=1;i<=m;i++){
            if( ! (A1[i] in A2)  ){
                printf A1[i]" "
            }
        }
    }'
}
Array4=( $(diff "$a1" "$a2") )  #compare a1 against a2
echo "Array4: ${Array4[@]}"
Array4=( $(diff "$a3" "$a1") )  #compare a3 against a1
echo "Array4: ${Array4[@]}"

output

输出

$ ./shell.sh
Array4: key7 key8 key9 key10
Array4: key11

#7


-1  

It is possible to use regex too (based on another answer: Array intersection in bash):

也可以使用regex(基于另一个答案:bash中的数组交集):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if ! [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}:

Result:

结果:

$ bash diff-arrays.sh 
4 7 10 12

#1


22  

If you strictly want Array1 - Array2, then

如果你想要Array1 - Array2,那么

$ Array3=()
$ for i in "${Array1[@]}"; do
>     skip=
>     for j in "${Array2[@]}"; do
>         [[ $i == $j ]] && { skip=1; break; }
>     done
>     [[ -n $skip ]] || Array3+=("$i")
> done
$ declare -p Array3

Runtime might be improved with associative arrays, but I personally wouldn't bother. If you're manipulating enough data for that to matter, shell is the wrong tool.

运行时可以使用关联数组进行改进,但我个人并不介意。如果您操作了足够的数据,那么shell是错误的工具。


For a symmetric difference like Dennis's answer, existing tools like comm work, as long as we massage the input and output a bit (since they work on line-based files, not shell variables).

对于对称差异,如Dennis的答案,只要我们稍微调整输入和输出(因为它们是基于行的文件,而不是shell变量),现有的工具如comm工作。

Here, we tell the shell to use newlines to join the array into a single string, and discard tabs when reading lines from comm back into an array.

在这里,我们告诉shell使用换行符将数组加入到单个字符串中,并在从comm返回到数组时丢弃制表符。

$ oldIFS=$IFS IFS=$'\n\t'
$ Array3=($(comm -3 <(echo "${Array1[*]}") <(echo "${Array2[*]}")))
comm: file 1 is not in sorted order
$ IFS=$oldIFS
$ declare -p Array3
declare -a Array3='([0]="key7" [1]="key8" [2]="key9" [3]="key10")'

It complains because, by lexographical sorting, key1 < … < key9 > key10. But since both input arrays are sorted similarly, it's fine to ignore that warning. You can use --nocheck-order to get rid of the warning, or add a | sort -u inside the <(…) process substitution if you can't guarantee order&uniqueness of the input arrays.

它会报错,因为通过字典排序,key1 <…< key9 > key10。但是由于两个输入数组的排序方式类似,因此忽略这个警告是可以的。如果不能保证输入数组的顺序和惟一性,您可以使用—nocheck-order来消除警告,或者在<(…)进程替换中添加| sort -u。

#2


82  

echo ${Array1[@]} ${Array2[@]} | tr ' ' '\n' | sort | uniq -u

Output

输出

key10
key7
key8
key9

You can add sorting if you need

如果需要,可以添加排序。

#3


15  

Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

每当有问题出现时,我的脑海中就会浮现出一些独特的值,而这些值可能是无法排序的。这是我的看法。

Code

#!/bin/bash

diff(){
  awk 'BEGIN{RS=ORS=" "}
       {NR==FNR?a[$0]++:a[$0]--}
       END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")
}

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=($(diff Array1[@] Array2[@]))
echo ${Array3[@]}

Output

$ ./diffArray.sh
key10 key7 key8 key9

*Note**: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.

*注*:与其他答案一样,如果数组中有重复键,则只报告一次;这可能是你正在寻找的行为,也可能不是。要处理的awk代码比较混乱,不那么干净。

#4


5  

In Bash 4:

在Bash中4:

declare -A temp    # associative array
for element in "${Array1[@]}" "${Array2[@]}"
do
    ((temp[$element]++))
done
for element in "${!temp[@]}"
do
    if (( ${temp[$element]} > 1 ))
    then
        unset "temp[$element]"
    fi
done
Array3=(${!temp[@]})    # retrieve the keys as values

Edit:

编辑:

ephemient pointed out a potentially serious bug. If an element exists in one array with one or more duplicates and doesn't exist at all in the other array, it will be incorrectly removed from the list of unique values. The version below attempts to handle that situation.

ephemient指出一个潜在的严重缺陷。如果一个元素存在于一个具有一个或多个重复的数组中,而在另一个数组中根本不存在,那么它将被不正确地从唯一值列表中删除。下面的版本尝试处理这种情况。

declare -A temp1 temp2    # associative arrays
for element in "${Array1[@]}"
do
    ((temp1[$element]++))
done

for element in "${Array2[@]}"
do
    ((temp2[$element]++))
done

for element in "${!temp1[@]}"
do
    if (( ${temp1[$element]} >= 1 && ${temp2[$element]-0} >= 1 ))
    then
        unset "temp1[$element]" "temp2[$element]"
    fi
done
Array3=(${!temp1[@]} ${!temp2[@]})

#5


3  

Having ARR1 and ARR2 as arguments, use comm to do the job and mapfile to put it back into RESULT array:

使用ARR1和ARR2作为参数,使用comm来完成作业和mapfile将其返回到结果数组中:

ARR1=("key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10")
ARR2=("key1" "key2" "key3" "key4" "key5" "key6")

mapfile -t RESULT < \
    <(comm -23 \
        <(IFS=$'\n'; echo "${ARR1[*]}" | sort) \
        <(IFS=$'\n'; echo "${ARR2[*]}" | sort) \
    )

echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Note that result may not meet source order.

注意,结果可能不符合源订单。

Bonus aka "that's what you are here for":

奖励,也就是“这就是你在这里的目的”:

function array_diff {
    eval local ARR1=\(\"\${$2[@]}\"\)
    eval local ARR2=\(\"\${$3[@]}\"\)
    local IFS=$'\n'
    mapfile -t $1 < <(comm -23 <(echo "${ARR1[*]}" | sort) <(echo "${ARR2[*]}" | sort))
}

# usage:
array_diff RESULT ARR1 ARR2
echo "${RESULT[@]}" # outputs "key10 key7 key8 key9"

Using those tricky evals is the least worst option among others dealing with array parameters passing in bash.

在处理在bash中传递的数组参数时,使用这些棘手的evals是最不糟糕的选择。

Also, take a look at comm manpage; based on this code it's very easy to implement, for example, array_intersect: just use -12 as comm options.

另外,看看comm手册;基于这段代码,很容易实现,例如array_intersect:只需使用-12作为comm选项。

#6


2  

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
a1=${Array1[@]};a2=${Array2[@]}; a3=${Array3[@]}
diff(){
    a1="$1"
    a2="$2"
    awk -va1="$a1" -va2="$a2" '
     BEGIN{
       m= split(a1, A1," ")
       n= split(a2, t," ")
       for(i=1;i<=n;i++) { A2[t[i]] }
       for (i=1;i<=m;i++){
            if( ! (A1[i] in A2)  ){
                printf A1[i]" "
            }
        }
    }'
}
Array4=( $(diff "$a1" "$a2") )  #compare a1 against a2
echo "Array4: ${Array4[@]}"
Array4=( $(diff "$a3" "$a1") )  #compare a3 against a1
echo "Array4: ${Array4[@]}"

output

输出

$ ./shell.sh
Array4: key7 key8 key9 key10
Array4: key11

#7


-1  

It is possible to use regex too (based on another answer: Array intersection in bash):

也可以使用regex(基于另一个答案:bash中的数组交集):

list1=( 1 2 3 4   6 7 8 9 10 11 12)
list2=( 1 2 3   5 6   8 9    11 )

l2=" ${list2[*]} "                    # add framing blanks
for item in ${list1[@]}; do
  if ! [[ $l2 =~ " $item " ]] ; then    # use $item as regexp
    result+=($item)
  fi
done
echo  ${result[@]}:

Result:

结果:

$ bash diff-arrays.sh 
4 7 10 12