重复的元素java.util.Set

时间:2022-03-17 19:37:00

java.util.Set implementations removes the duplicate elements.

java.util。Set实现删除重复的元素。

How are duplicates elements deleted internally in a java.util.Set??

如何在java.util.Set中内部删除重复的元素?

4 个解决方案

#1


14  

Actually AFAIK from the sources most Set implementations in java don't even check if the element is already contained.

实际上,来自java中大多数集合实现的源的AFAIK甚至不检查元素是否已经包含。

They just always execute the add() on their internal structure which holds the set elements and let that object handle the duplication case.

它们总是在内部结构上执行add(),该结构保存集合元素,并让该对象处理重复情况。

e.g. HashSet calls put(K,V) on the internal HashMap which just inserts the new object overwriting the old entry if duplicate.

例如,HashSet调用将(K,V)放在内部的HashMap上,如果重复,它将插入新对象重写旧条目。

#2


10  

Reading a little into your question I'm guessing that you're seeing strange behaviour with a java.util.HashSet (typically what everyone uses by default).

从你的问题中,我猜你看到了java的奇怪行为。HashSet(通常是每个人默认使用的)。

Contary to the contract of java.util.Set it is possible to get the same object in a java.util.HashSet twice like this:

遵守java.util的合同。设置可以在java.util中获取相同的对象。HashSet两次是这样的:

import java.util.HashSet;
import java.util.Set;

public class SetTest 
{
  public static void main(String[] args) 
  {
    MyClass myObject = new MyClass(1, "testing 1 2 3");

    Set<MyClass> set = new HashSet<MyClass>();
    set.add(myObject);

    myObject.setHashCode(2);
    set.add(myObject);

    System.out.println(set.size());  // this will print 2.
  }

  private static class MyClass 
  {
    private int hashCode;
    private String otherField;

    public MyClass(int hashCode, String otherField) 
    {    
      this.hashCode = hashCode;
      this.otherField = otherField;
    }

    public void setHashCode(int hashCode) 
    {
      this.hashCode = hashCode;
    }

    public boolean equals(Object obj) 
    {    
      return obj != null && obj.getClass().equals(getClass()) && ((MyClass)obj).otherField.equals(otherField);
    }

    public int hashCode() 
    {
      return hashCode;
    }
  }
}

After the pointer from @jitter and a look at the source you can see why this would happen.

在@jitter的指针和查看源代码之后,您可以看到为什么会发生这种情况。

Like @jitter says, the java.util.HashSet uses a java.util.HashMap internally. When the hash changes between the first and second add a different bucket is used in the java.util.HashMap and the object is in the set twice.

就像@jitter说的,java.util。HashSet使用java.util。内部HashMap。当第一个和第二个添加的哈希值发生变化时,java.util中使用了一个不同的bucket。HashMap和对象在集合中两次。

The code sample may look a little contrieved but I've seen this happen in the wild with domain classes where the hash is created from mutable fields and the equals method hasn't been kept in sync with those fields.

代码示例可能看起来有点做作,但我在域类中看到过这种情况,在域类中,散列是由可变字段创建的,而equals方法与这些字段不同步。

#3


2  

An easy way to find this out is to look in the source for the code you are interested in.

找到这一点的一个简单方法是查找您感兴趣的代码的源代码。

Each JDK has a src.zip included which contains the source code for the public classes so you can just locate the source for HashSet and have a look :) I often use Eclipse for this. Start it, create a new Java project, set the JVM to be an installed JDK (if not you are using the system default JRE which doesn't have src.zip), and Ctrl-Shift-T to go to HashSet.

每个JDK都有一个src。zip包含公共类的源代码,因此您可以为HashSet定位源代码并查看:)我经常使用Eclipse来实现这一点。启动它,创建一个新的Java项目,将JVM设置为已安装的JDK(如果不是的话,您将使用没有src.zip的系统默认JRE),并按Ctrl-Shift-T转到HashSet。

#4


0  

Read your question more detailed:

详细阅读你的问题:

You can't add duplicates, from java doc for Set.add() or do you mean addAll?:

您不能添加来自java doc的Set.add()副本,还是您指的是addAll?

Adds the specified element to this set if it is not already present (optional operation). More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.

将指定的元素添加到该集合中(可选操作)。更正式地说,如果集合中不包含元素e2 (e==null ?e2 = = null:e.equals(e2))。如果该集合已经包含元素,则调用将保持设置不变并返回false。结合对构造函数的限制,这将确保集合不会包含重复的元素。

#1


14  

Actually AFAIK from the sources most Set implementations in java don't even check if the element is already contained.

实际上,来自java中大多数集合实现的源的AFAIK甚至不检查元素是否已经包含。

They just always execute the add() on their internal structure which holds the set elements and let that object handle the duplication case.

它们总是在内部结构上执行add(),该结构保存集合元素,并让该对象处理重复情况。

e.g. HashSet calls put(K,V) on the internal HashMap which just inserts the new object overwriting the old entry if duplicate.

例如,HashSet调用将(K,V)放在内部的HashMap上,如果重复,它将插入新对象重写旧条目。

#2


10  

Reading a little into your question I'm guessing that you're seeing strange behaviour with a java.util.HashSet (typically what everyone uses by default).

从你的问题中,我猜你看到了java的奇怪行为。HashSet(通常是每个人默认使用的)。

Contary to the contract of java.util.Set it is possible to get the same object in a java.util.HashSet twice like this:

遵守java.util的合同。设置可以在java.util中获取相同的对象。HashSet两次是这样的:

import java.util.HashSet;
import java.util.Set;

public class SetTest 
{
  public static void main(String[] args) 
  {
    MyClass myObject = new MyClass(1, "testing 1 2 3");

    Set<MyClass> set = new HashSet<MyClass>();
    set.add(myObject);

    myObject.setHashCode(2);
    set.add(myObject);

    System.out.println(set.size());  // this will print 2.
  }

  private static class MyClass 
  {
    private int hashCode;
    private String otherField;

    public MyClass(int hashCode, String otherField) 
    {    
      this.hashCode = hashCode;
      this.otherField = otherField;
    }

    public void setHashCode(int hashCode) 
    {
      this.hashCode = hashCode;
    }

    public boolean equals(Object obj) 
    {    
      return obj != null && obj.getClass().equals(getClass()) && ((MyClass)obj).otherField.equals(otherField);
    }

    public int hashCode() 
    {
      return hashCode;
    }
  }
}

After the pointer from @jitter and a look at the source you can see why this would happen.

在@jitter的指针和查看源代码之后,您可以看到为什么会发生这种情况。

Like @jitter says, the java.util.HashSet uses a java.util.HashMap internally. When the hash changes between the first and second add a different bucket is used in the java.util.HashMap and the object is in the set twice.

就像@jitter说的,java.util。HashSet使用java.util。内部HashMap。当第一个和第二个添加的哈希值发生变化时,java.util中使用了一个不同的bucket。HashMap和对象在集合中两次。

The code sample may look a little contrieved but I've seen this happen in the wild with domain classes where the hash is created from mutable fields and the equals method hasn't been kept in sync with those fields.

代码示例可能看起来有点做作,但我在域类中看到过这种情况,在域类中,散列是由可变字段创建的,而equals方法与这些字段不同步。

#3


2  

An easy way to find this out is to look in the source for the code you are interested in.

找到这一点的一个简单方法是查找您感兴趣的代码的源代码。

Each JDK has a src.zip included which contains the source code for the public classes so you can just locate the source for HashSet and have a look :) I often use Eclipse for this. Start it, create a new Java project, set the JVM to be an installed JDK (if not you are using the system default JRE which doesn't have src.zip), and Ctrl-Shift-T to go to HashSet.

每个JDK都有一个src。zip包含公共类的源代码,因此您可以为HashSet定位源代码并查看:)我经常使用Eclipse来实现这一点。启动它,创建一个新的Java项目,将JVM设置为已安装的JDK(如果不是的话,您将使用没有src.zip的系统默认JRE),并按Ctrl-Shift-T转到HashSet。

#4


0  

Read your question more detailed:

详细阅读你的问题:

You can't add duplicates, from java doc for Set.add() or do you mean addAll?:

您不能添加来自java doc的Set.add()副本,还是您指的是addAll?

Adds the specified element to this set if it is not already present (optional operation). More formally, adds the specified element e to this set if the set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements.

将指定的元素添加到该集合中(可选操作)。更正式地说,如果集合中不包含元素e2 (e==null ?e2 = = null:e.equals(e2))。如果该集合已经包含元素,则调用将保持设置不变并返回false。结合对构造函数的限制,这将确保集合不会包含重复的元素。