从列表中删除重复列表>中。

时间:2022-08-31 07:42:55

I'm having trouble coming up with the most efficient algorithm to remove duplicates from List<List<int>>, for example (I know this looks like a list of int[], but just doing it that way for visual purposes:

我很难想出最有效的算法来从列表中删除重复项(我知道这看起来像一个int[]列表,但只是为了视觉目的而这样做:

my_list[0]= {1, 2, 3};
my_list[1]= {1, 2, 3};
my_list[2]= {9, 10, 11};
my_list[3]= {1, 2, 3};

So the output would just be

所以输出就是。

new_list[0]= {1, 2, 3};
new_list[1]= {9, 10, 11};

Let me know if you have any ideas. I would really appreciate it.

如果你有什么想法,请告诉我。我真的很感激。

6 个解决方案

#1


12  

Build custom of EqualityComparer<List<int>>:

构建定制的EqualityComparer < < int > >列表:

public class CusComparer : IEqualityComparer<List<int>>
{
    public bool Equals(List<int> x, List<int> y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(List<int> obj)
    {
        int hashCode = 0;

        for (var index = 0; index < obj.Count; index++)
        {
            hashCode ^= new {Index = index, Item = obj[index]}.GetHashCode();
        }

        return hashCode;
    }
}

Then you can get the result by using Distinct with custom comparer method:

然后你可以通过使用自定义比较方法来得到结果:

var result = my_list.Distinct(new CusComparer());

Edit:

编辑:

Include the index into method GetHashCode to make sure different orders will not be equal

将索引包含到方法GetHashCode中,以确保不同的顺序不相等。

#2


8  

This simple program does what you want:

这个简单的程序做你想做的:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            List<List<int>> lists = new List<List<int>>();

            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 9, 10, 11 });
            lists.Add(new List<int> { 1, 2, 3 });

            var distinct = lists.Select(x => new HashSet<int>(x))
                    .Distinct(HashSet<int>.CreateSetComparer());

            foreach (var list in distinct)
            {
                foreach (var v in list)
                {
                    Console.Write(v + " ");
                }

                Console.WriteLine();
            }
        }
    }
}

#3


8  

    var finalList = lists.GroupBy(x => String.Join(",", x))
                         .Select(x => x.First().ToList())
                         .ToList();

#4


6  

You can use the LINQ Distinct overload that takes a comparer. The comparer should see if the lists are equal. Note that the default equals operations of lists won't do what you're really looking for, so the comparer will need to loop through each for you. Here's an example of such a comparer:

您可以使用比较器的LINQ不同的重载。比较者应该看看列表是否相等。注意,默认情况下,列表的操作不会做你真正想做的事情,所以比较器需要为你循环。这里有一个这样的比较者的例子:

public class SequenceComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    IEqualityComparer<T> itemComparer;
    public SequenceComparer()
    {
        this.itemComparer = EqualityComparer<T>.Default;
    }

    public SequenceComparer(IEqualityComparer<T> itemComparer)
    {
        this.itemComparer = itemComparer;
    }

    public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
    {
        if (object.Equals(x, y))
            return true;
        if (x == null || y == null)
            return false;
        return x.SequenceEqual(y, itemComparer);
    }

    public int GetHashCode(IEnumerable<T> obj)
    {
        if (obj == null)
            return -1;
        int i = 0;
        return obj.Aggregate(0, (x, y) => x ^ new { Index = i++, ItemHash = itemComparer.GetHashCode(y) }.GetHashCode());
    }
}

Update: I got the idea of using an anonymous type to make a better hash from Cuong Le's answer, and I LINQ-ified it and made it work in my class.

更新:我的想法是使用匿名类型来更好地从Cuong Le的答案中得到更好的哈希,我把它做了linq - ify,并让它在我的类中工作。

#5


1  

For small sets of data, a comparer could be useful, but if you have 1000 or more List> then trying to compare them all could begin to take a long amount of time.

对于小的数据集,比较器可能是有用的,但是如果您有1000个或更多的列表>,那么尝试比较它们都可能需要很长的时间。

I suggest that you instead use your data to build a distinct tree. The building of the tree will be much faster and when you are done you can always bring your data back into your old data structure.

我建议您使用您的数据构建一个不同的树。树的构建将会更快,当您完成时,您总是可以将您的数据恢复到旧的数据结构中。

#6


0  

I wanted to compare the performance of the answers of @Leniel Macaferi and @L.B as I wasn't sure which would be more performant, or if the difference would be significant. It turns out that the difference is very significant:

我想比较一下@Leniel Macaferi和@L的答案。B因为我不确定哪一种会更有效果,或者两者的差别是否显著。结果是差异非常显著:

Method 1: 00:00:00.0976649 @Leniel Macaferi
Method 2: 00:00:32.0961650 @L.B

Here is the code I used to benchmark them:

下面是我用来评测它们的代码:

public static void Main(string[] args)
        {
            var list = new List<List<int>> {new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11, 1}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6, 7}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}};

            var sw1 = new Stopwatch();
            sw1.Start();

            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.Select(x => new HashSet<int>(x)).Distinct(HashSet<int>.CreateSetComparer());
            }

            sw1.Stop();
            Console.WriteLine($"Method 1: {sw1.Elapsed}");

            var sw2 = new Stopwatch();
            sw2.Start();
            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.GroupBy(a => string.Join(",", a)).Select(a => a.First()).ToList();

            }
            sw2.Stop();
            Console.WriteLine($"Method 2: {sw2.Elapsed}");

            Console.ReadKey();
        }

#1


12  

Build custom of EqualityComparer<List<int>>:

构建定制的EqualityComparer < < int > >列表:

public class CusComparer : IEqualityComparer<List<int>>
{
    public bool Equals(List<int> x, List<int> y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(List<int> obj)
    {
        int hashCode = 0;

        for (var index = 0; index < obj.Count; index++)
        {
            hashCode ^= new {Index = index, Item = obj[index]}.GetHashCode();
        }

        return hashCode;
    }
}

Then you can get the result by using Distinct with custom comparer method:

然后你可以通过使用自定义比较方法来得到结果:

var result = my_list.Distinct(new CusComparer());

Edit:

编辑:

Include the index into method GetHashCode to make sure different orders will not be equal

将索引包含到方法GetHashCode中,以确保不同的顺序不相等。

#2


8  

This simple program does what you want:

这个简单的程序做你想做的:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            List<List<int>> lists = new List<List<int>>();

            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 1, 2, 3 });
            lists.Add(new List<int> { 9, 10, 11 });
            lists.Add(new List<int> { 1, 2, 3 });

            var distinct = lists.Select(x => new HashSet<int>(x))
                    .Distinct(HashSet<int>.CreateSetComparer());

            foreach (var list in distinct)
            {
                foreach (var v in list)
                {
                    Console.Write(v + " ");
                }

                Console.WriteLine();
            }
        }
    }
}

#3


8  

    var finalList = lists.GroupBy(x => String.Join(",", x))
                         .Select(x => x.First().ToList())
                         .ToList();

#4


6  

You can use the LINQ Distinct overload that takes a comparer. The comparer should see if the lists are equal. Note that the default equals operations of lists won't do what you're really looking for, so the comparer will need to loop through each for you. Here's an example of such a comparer:

您可以使用比较器的LINQ不同的重载。比较者应该看看列表是否相等。注意,默认情况下,列表的操作不会做你真正想做的事情,所以比较器需要为你循环。这里有一个这样的比较者的例子:

public class SequenceComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    IEqualityComparer<T> itemComparer;
    public SequenceComparer()
    {
        this.itemComparer = EqualityComparer<T>.Default;
    }

    public SequenceComparer(IEqualityComparer<T> itemComparer)
    {
        this.itemComparer = itemComparer;
    }

    public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
    {
        if (object.Equals(x, y))
            return true;
        if (x == null || y == null)
            return false;
        return x.SequenceEqual(y, itemComparer);
    }

    public int GetHashCode(IEnumerable<T> obj)
    {
        if (obj == null)
            return -1;
        int i = 0;
        return obj.Aggregate(0, (x, y) => x ^ new { Index = i++, ItemHash = itemComparer.GetHashCode(y) }.GetHashCode());
    }
}

Update: I got the idea of using an anonymous type to make a better hash from Cuong Le's answer, and I LINQ-ified it and made it work in my class.

更新:我的想法是使用匿名类型来更好地从Cuong Le的答案中得到更好的哈希,我把它做了linq - ify,并让它在我的类中工作。

#5


1  

For small sets of data, a comparer could be useful, but if you have 1000 or more List> then trying to compare them all could begin to take a long amount of time.

对于小的数据集,比较器可能是有用的,但是如果您有1000个或更多的列表>,那么尝试比较它们都可能需要很长的时间。

I suggest that you instead use your data to build a distinct tree. The building of the tree will be much faster and when you are done you can always bring your data back into your old data structure.

我建议您使用您的数据构建一个不同的树。树的构建将会更快,当您完成时,您总是可以将您的数据恢复到旧的数据结构中。

#6


0  

I wanted to compare the performance of the answers of @Leniel Macaferi and @L.B as I wasn't sure which would be more performant, or if the difference would be significant. It turns out that the difference is very significant:

我想比较一下@Leniel Macaferi和@L的答案。B因为我不确定哪一种会更有效果,或者两者的差别是否显著。结果是差异非常显著:

Method 1: 00:00:00.0976649 @Leniel Macaferi
Method 2: 00:00:32.0961650 @L.B

Here is the code I used to benchmark them:

下面是我用来评测它们的代码:

public static void Main(string[] args)
        {
            var list = new List<List<int>> {new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11, 1}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6, 7}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}};

            var sw1 = new Stopwatch();
            sw1.Start();

            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.Select(x => new HashSet<int>(x)).Distinct(HashSet<int>.CreateSetComparer());
            }

            sw1.Stop();
            Console.WriteLine($"Method 1: {sw1.Elapsed}");

            var sw2 = new Stopwatch();
            sw2.Start();
            for (var i = 0; i < 1_000_000; i++)
            {
                var distinct = list.GroupBy(a => string.Join(",", a)).Select(a => a.First()).ToList();

            }
            sw2.Stop();
            Console.WriteLine($"Method 2: {sw2.Elapsed}");

            Console.ReadKey();
        }