比较两个不同长度的数组并显示差异

时间:2022-11-29 20:42:04

Problem:
I have two arrays that can possibly be different lengths. I need to iterate through both arrays and find similarities, additions, and deletions.

问题:我有两个可能是不同长度的数组。我需要遍历这两个数组,并查找相似、添加和删除。

What's the fastest and most efficient way to accomplish this in C#?

在c#中,最快和最有效的方法是什么?

Edit: The arrays are pre-sorted and they can contain anywhere between 50-100 items. Also, there aren't any constraints on speed and/or memory usage (however, no one likes a memory hog;)

编辑:数组是预先排序的,它们可以包含50-100个条目。此外,对于速度和/或内存使用也没有任何限制(然而,没有人喜欢内存独占;)


For example:

例如:

String[] Foo_Old = {"test1", "test2", "test3"};
String[] Foo_New = {"test1", "test2", "test4", "test5"};

AND

String[] Bar_Old = {"test1", "test2", "test4"};
String[] Bar_New = {"test1", "test3"};

Differences:

差异:

(with respect to the Foo_New array)

(关于Foo_New数组)

[Same]    "test1"
[Same]    "test2"
[Removed] "test3"
[Added]   "test4"
[Added]   "test5"

(with respect to the Bar_New array)

(关于Bar_New数组)

[Same]    "test1"
[Removed] "test2"
[Removed] "test4"
[Added]   "test3"

4 个解决方案

#1


17  

You can use Except and Intersect ...

你可以使用Except和Intersect…

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}

#2


3  

I went ahead and hand-coded one and use the example in the accepted answer, and the hand-coded one performs a little better. I handled outputting my strings a little differently. Other factors to consider include whether the Except make a sorted copy of the array (since it cannot assume it's sorted) or whether it makes some kind of hash or a linear search (it's actually restricted to IEnumerable - for very large arrays which are already sorted, this could be a problem). You could change mine to compare IEnumerable (which is more general) instead of IComparable[].

我继续手工编码了一个,并在已接受的答案中使用了这个例子,而手工编码的那个表现更好一些。我处理输出字符串的方式有点不同。其他考虑因素包括是否除了做一个排序数组的副本(因为它不能假设它是排序)还是让一些散列或一个线性搜索(实际上是局限于IEnumerable,对于非常大的数组已经排序,这可能是一个问题)。您可以将我的改为比较IEnumerable(更一般),而不是icom亦庄文[]。

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}

#3


1  

I wrote this a while back:

我之前写过:

Usage:

用法:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

Implementation:

实现:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}

#4


1  

Since your arrays are sorted, you should be able to just go through the arrays simultaneously, and in one pass and determine if each element is in the other array. (Similar to the merge step in merge sort.) You can see a sample of that below:

由于您的数组是有序的,您应该能够同时通过数组,并在一个传递中确定每个元素是否在另一个数组中。(类似于合并排序中的合并步骤)你可以看到以下的一个例子:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

Alternatively you'd need to go through one array, and for each element in this array, do a single pass of the other array looking for a match.

或者,您需要遍历一个数组,对于这个数组中的每个元素,对另一个数组执行一次遍历,以寻找匹配。

Edit: JP has a good suggestion on how to do this using the framework. Although, assuming the arrays are sorted, the benefit of my approach is that you only have to do one pass to find all the results. You would not have to do three passes.

编辑:JP有一个很好的建议,关于如何使用这个框架。虽然,假设数组是排序的,但是我的方法的好处是您只需要做一次遍历就可以找到所有的结果。你不需要做三次。

#1


17  

You can use Except and Intersect ...

你可以使用Except和Intersect…

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}

#2


3  

I went ahead and hand-coded one and use the example in the accepted answer, and the hand-coded one performs a little better. I handled outputting my strings a little differently. Other factors to consider include whether the Except make a sorted copy of the array (since it cannot assume it's sorted) or whether it makes some kind of hash or a linear search (it's actually restricted to IEnumerable - for very large arrays which are already sorted, this could be a problem). You could change mine to compare IEnumerable (which is more general) instead of IComparable[].

我继续手工编码了一个,并在已接受的答案中使用了这个例子,而手工编码的那个表现更好一些。我处理输出字符串的方式有点不同。其他考虑因素包括是否除了做一个排序数组的副本(因为它不能假设它是排序)还是让一些散列或一个线性搜索(实际上是局限于IEnumerable,对于非常大的数组已经排序,这可能是一个问题)。您可以将我的改为比较IEnumerable(更一般),而不是icom亦庄文[]。

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}

#3


1  

I wrote this a while back:

我之前写过:

Usage:

用法:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

Implementation:

实现:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}

#4


1  

Since your arrays are sorted, you should be able to just go through the arrays simultaneously, and in one pass and determine if each element is in the other array. (Similar to the merge step in merge sort.) You can see a sample of that below:

由于您的数组是有序的,您应该能够同时通过数组,并在一个传递中确定每个元素是否在另一个数组中。(类似于合并排序中的合并步骤)你可以看到以下的一个例子:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

Alternatively you'd need to go through one array, and for each element in this array, do a single pass of the other array looking for a match.

或者,您需要遍历一个数组,对于这个数组中的每个元素,对另一个数组执行一次遍历,以寻找匹配。

Edit: JP has a good suggestion on how to do this using the framework. Although, assuming the arrays are sorted, the benefit of my approach is that you only have to do one pass to find all the results. You would not have to do three passes.

编辑:JP有一个很好的建议,关于如何使用这个框架。虽然,假设数组是排序的,但是我的方法的好处是您只需要做一次遍历就可以找到所有的结果。你不需要做三次。