前言
数组去重其实是个很常见的面试题,比如在数据分析中,有时候因为一些原因会有重复的记录,因此需要去重。如果重复的那些行是每一列懂相同的,删除多余的行只保留相同行中的一行就可以了。
其实不管前端还是后端,都是可以进行数组去重的,但数据处理一般都在后端,所以本文对List对象去重的方法进行总结和性能分析。
一、List对象去重的方法总结
1.循环去除重复
使用循环去除重复,需要新实例化一个List<T>,再循环判断数组对象里是否有这个对象,如果有没有重复添加到这个集合对象,否则不添加。
1、模型类代码:
public class Customer
{
public int id { get; set; }
public string name { get; set; }
public string email { get; set; }
public int age { get; set; }
}
2、主程序代码:
using ConsoleTest;
using System.Diagnostics;
Stopwatch swatch = new Stopwatch();
swatch.Start(); //计时开始
var rel = new List<Customer>();//新new一个对象,
foreach (var item in GetCustomer())
{ //判断是否存在,不存在添加
if (rel.Where(w => w.id == item.id & w.name == item.name && w.age == item.age && w.email == item.email).Count() == 0)
rel.Add(item);//不存在新增
}
foreach (var item in rel)
{
Console.WriteLine(item.name);
}
swatch.Stop(); //计时结束
string time = swatch.ElapsedMilliseconds.ToString(); //获取代码段执行时间
Console.WriteLine($"=========代码执行时间:{time}=============");
swatch.Reset(); //第二次计时时进行重置
//获取数据
static List<Customer> GetCustomer()
{
List<Customer> list = new List<Customer>();
list.Add(new Customer { id = 1, name = "刘德华", age = 56, email = "ldh@net.cn" });
list.Add(new Customer { id = 2, name = "张学友", age = 52, email = "zxy@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
list.Add(new Customer { id = 4, name = "郭富城", age = 60, email = "gfc@net.cn" });
list.Add(new Customer { id = 4, name = "古天乐", age = 55, email = "gtl@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
return list;
}
3、运行程序:
可以看出代码执行时间:34
2.使用Linq中GroupBy去重
linq是个好东西,对于对象的操作很方便,GroupBy类似数据库中的group by。这个例子就不需要循环了,直接一句代码解决,简便多了。
1、模型类代码:
public class Customer
{
public int id { get; set; }
public string name { get; set; }
public string email { get; set; }
public int age { get; set; }
}
2、主程序代码:
using ConsoleTest;
using System.Diagnostics;
Stopwatch swatch = new Stopwatch();
swatch.Start(); //计时开始
var rel = GetCustomer().GroupBy(g => new { g.id, g.name, g.age, g.email });
foreach (var item in rel)
{
Console.WriteLine(item.Key.name);
}
swatch.Stop(); //计时结束
string time = swatch.ElapsedMilliseconds.ToString(); //获取代码段执行时间
Console.WriteLine($"=========代码执行时间:{time}=============");
swatch.Reset(); //第二次计时时进行重置
//获取数据
static List<Customer> GetCustomer()
{
List<Customer> list = new List<Customer>();
list.Add(new Customer { id = 1, name = "刘德华", age = 56, email = "ldh@net.cn" });
list.Add(new Customer { id = 2, name = "张学友", age = 52, email = "zxy@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
list.Add(new Customer { id = 4, name = "郭富城", age = 60, email = "gfc@net.cn" });
list.Add(new Customer { id = 4, name = "古天乐", age = 55, email = "gtl@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
return list;
}
3、运行程序: 可以看出代码执行时间:35
3.使用Linq中Distinct去重
Distinct跟数据库中的Distinct还是有一定的区别,对于对象集合去除重复需要自定义客户对象的Comparer方法。
1、模型类代码:
public class Customer
{
public int id { get; set; }
public string name { get; set; }
public string email { get; set; }
public int age { get; set; }
}
2、比较器:
//需要继承IEqualityComparer,先判断第一个字段,如果有重复再做对比。
public class CustomerComparer : IEqualityComparer<Customer>
{
public bool Equals(Customer x, Customer y)
{
if (x == null)
return y == null;
return x.id == y.id && x.name == y.name && x.age == y.age && x.email == y.email;
}
public int GetHashCode(Customer obj)
{
if (obj == null)
return 0;
return obj.id.GetHashCode();
}
}
3、主程序代码:
using ConsoleTest;
using System.Diagnostics;
Stopwatch swatch = new Stopwatch();
swatch.Start(); //计时开始
var rel = GetCustomer().Distinct(new CustomerComparer());
foreach (var item in rel)
{
Console.WriteLine(item.name);
}
swatch.Stop(); //计时结束
string time = swatch.ElapsedMilliseconds.ToString(); //获取代码段执行时间
Console.WriteLine($"=========代码执行时间:{time}=============");
swatch.Reset(); //第二次计时时进行重置
//获取数据
static List<Customer> GetCustomer()
{
List<Customer> list = new List<Customer>();
list.Add(new Customer { id = 1, name = "刘德华", age = 56, email = "ldh@net.cn" });
list.Add(new Customer { id = 2, name = "张学友", age = 52, email = "zxy@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
list.Add(new Customer { id = 4, name = "郭富城", age = 60, email = "gfc@net.cn" });
list.Add(new Customer { id = 4, name = "古天乐", age = 55, email = "gtl@net.cn" });
list.Add(new Customer { id = 3, name = "黎明", age = 58, email = "lm@net.cn" });
return list;
}
3、运行程序: 可以看出代码执行时间:25
总结
综上测试,Distinct是效率性能最高的。