关于set想说的(一)之Set的实现类及必要的方法

时间:2021-04-01 16:47:06

最近看到了《Thinking in Java》的第17章 容器深入探究,17.6 Set和存储顺序。自己写了写测试代码,加深下理解。主要设计toString()方法(主要是为了方便打印),equals()方法,hashCode()方法,compareTo()方法。

结论

首先明确Set接口有三种不同的实现,HashSet()、TreeSet()、LinkedHashSet()。
HashSet() :快速的定位、读取,会根据hash值来存放,因此读取出来的顺序未必就是插入的顺序。
TreeSet():存入set容器中的内容是按照一定的顺序排列的。
LinkedHashSet():既可以实现快速的定位读取,又可以满足读取出来的顺序就是插入顺序。
除了Integer、String等内部就已经实现了toString()、equals()、hashCode()、compareTo()方法,都实现了Comparable接口,用户自己定义的类的实例(对象),如果要放在set,则要实现equals(),hashCode()、compareTo()方法,实现Comparable接口。

在《Thinking in java》中说:“如果不为你的键覆盖hashCode()和equals()方法,那么使用散列的数据结构(HashSet, HashMap, LinkedHashSet, LinkedHashMap)就无法正确处理你的键”。

因为equals()用来确保你的键是唯一的,而如果你想只覆盖equals()方法,而不覆盖hashCode()方法(你认为可以使用父类Object#hashCode()方法,按照对象的地址来进行hash),但是在Object#hashCode()的源码注释中,有说:如果equals()方法返回true,则每个对象调用hashCode()方法也必须返回相同的hash值

1.equals()方法
用来实现Set中元素的不重复性,如果不覆盖(override)equals()方法,默认使用父类Object的equals方法,则只是比较对象的引用是否相同。下面是Object#equals()方法的实现。

    /**
* Indicates whether some other object is "equal to" this one.
* <p>
* The <code>equals</code> method implements an equivalence relation
* on non-null object references:
* <ul>
* <li>It is <i>reflexive</i>: for any non-null reference value
* <code>x</code>, <code>x.equals(x)</code> should return
* <code>true</code>.
* <li>It is <i>symmetric</i>: for any non-null reference values
* <code>x</code> and <code>y</code>, <code>x.equals(y)</code>
* should return <code>true</code> if and only if
* <code>y.equals(x)</code> returns <code>true</code>.
* <li>It is <i>transitive</i>: for any non-null reference values
* <code>x</code>, <code>y</code>, and <code>z</code>, if
* <code>x.equals(y)</code> returns <code>true</code> and
* <code>y.equals(z)</code> returns <code>true</code>, then
* <code>x.equals(z)</code> should return <code>true</code>.
* <li>It is <i>consistent</i>: for any non-null reference values
* <code>x</code> and <code>y</code>, multiple invocations of
* <tt>x.equals(y)</tt> consistently return <code>true</code>
* or consistently return <code>false</code>, provided no
* information used in <code>equals</code> comparisons on the
* objects is modified.
* <li>For any non-null reference value <code>x</code>,
* <code>x.equals(null)</code> should return <code>false</code>.
* </ul>
* <p>
* The <tt>equals</tt> method for class <code>Object</code> implements
* the most discriminating possible equivalence relation on objects;
* that is, for any non-null reference values <code>x</code> and
* <code>y</code>, this method returns <code>true</code> if and only
* if <code>x</code> and <code>y</code> refer to the same object
* (<code>x == y</code> has the value <code>true</code>).
* <p>
* Note that it is generally necessary to override the <tt>hashCode</tt>
* method whenever this method is overridden, so as to maintain the
* general contract for the <tt>hashCode</tt> method, which states
* that equal objects must have equal hash codes.
*
* @param obj the reference object with which to compare.
* @return <code>true</code> if this object is the same as the obj
* argument; <code>false</code> otherwise.
* @see #hashCode()
* @see java.util.Hashtable
*/
public boolean equals(Object obj) {
return (this == obj);
}

代码里的注释也挺重要的,所以也复制过来了。注释主要讲的是:
(1)对于非空Object对象x,y,x.equals(y)返回true,当且 仅当(if and only if) x,y指向了相同的对象。
(2)对于非空Object对象及null,x.equals(null)返回false。
(3)equals方法具有自反性(reflexive),即对于非空Object对象x,x.equals(x)返回true。
(4)equals方法具有对称性(symmetric),即对于非空Object对象x,y,x.equals(y)与y.equals(x)的返回值是一样的。
(5)equals方法具有传递性(transitive),即对于非空Object对象x,y,z,如果x.equals(y) y.equals(z) 均返回true,则x.equals(z)也应该返回true。
(6)equals方法具有一致性(consistent),即对于非空Object对象x,y,x.equals(y)的返回值经过多次调用之后应与第一次的返回值保持一致。
(7)一个不成文的规定:当你覆盖(override)equals()方法时,最好也覆盖(override)hashCode()方法,因为,“相等的对象,其hash code也相等”。 当然了这不是强制规定,只是一个建议而已。覆盖了equals()方法而不覆盖hashCode()方法也可以。
使用IDEA的alt+insert组合键,添加代码时,会发现equals()方法和hashCode()方法时在一起的,如下所示:
关于set想说的(一)之Set的实现类及必要的方法
2.hashCode()
hashCode()方法时为了实现HashSet和LinkedHashSet而实现的。只有知道对象的hash值,才能根据这个hash值确定 存放在散列表的槽的index。同样,如果不覆盖(override)hashCode()方法,默认使用父类Object的hashCode()方法。下面是Object#hashCode()的实现源码:

    /**
* Returns a hash code value for the object. This method is
* supported for the benefit of hashtables such as those provided by
* <code>java.util.Hashtable</code>.
* <p>
* The general contract of <code>hashCode</code> is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the <tt>hashCode</tt> method
* must consistently return the same integer, provided no information
* used in <tt>equals</tt> comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the <tt>equals(Object)</tt>
* method, then calling the <code>hashCode</code> method on each of
* the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {@link java.lang.Object#equals(java.lang.Object)}
* method, then calling the <tt>hashCode</tt> method on each of the
* two objects must produce distinct integer results. However, the
* programmer should be aware that producing distinct integer results
* for unequal objects may improve the performance of hashtables.
* </ul>
* <p>
* As much as is reasonably practical, the hashCode method defined by
* class <tt>Object</tt> does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java<font size="-2"><sup>TM</sup></font> programming language.)
*
* @return a hash code value for this object.
* @see java.lang.Object#equals(java.lang.Object)
* @see java.util.Hashtable
*/
public native int hashCode();

(1)hashCode()方法是个native方法。
(2)对相同对象多次调用hashCode()方法,返回的整型值应该是一样的。
(3) 规定,如果两个对象调用equals()方法,返回值为true,则这两个对象各自调用hashCode()返回的整型值必须相等。(好像也没这么绝对,至少代码运行不报错,但是执行的结果可能并没有达到预期)
如:
我在Person类中覆盖了equals()方法,而没有覆盖hashCode()方法,看代码如下:

public class Person {
private String name;

public Person(String name) {
this.name = name;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;

Person person = (Person) o;

return name.equals(person.name);

}

// @Override
// public int hashCode() {
// return name.hashCode();
// }

@Override
public String toString() {
return "Person{" +
"name='" + name + '\'' +
'}';
}
}
        Person p1 = new Person("fan");
Person p2 = new Person("hehe");
Person p3 = new Person("fan");
Person p4 = new Person("axyz");

Set<Person> personHashSet = new HashSet<Person>();
Collections.addAll(personHashSet, p1, p2, p3, p4);
System.out.println(personHashSet);

执行结果如下:

[Person{name='fan'}, Person{name='axyz'}, Person{name='hehe'}, Person{name='fan'}]

结果是不对的。
结果之所以不对主要原因是:
放进去的p1,p2,p3,p4他们的地址是不同的,又没有覆盖hashCode方法,使用默认的Object对象的hashCode,所以hash值不同。
当hash值相同了之后,按照hash表处理冲突的策略,在list或者树结构上按照equals方法来查找需要update(update用于更新值,也就是说已经存在这个值)或者insert(insert用于插入新值,在冲突列表中还不存在这个值)的位置。
HashSet的实现其实是基于HashMap的,其add操作调用的是HashMap的put方法。
IDEA自动生成的equals()方法不是很好,Thinking in java中给的比较好。

    @Override
public boolean equals(Object o) {
if (this == o) return true;
//if (o == null || getClass() != o.getClass()) return false;
/*
if (!(o instanceof Person)) return false;

Person person = (Person) o;

return name.equals(person.name);
*/

return o instanceof Person &&
(name.equals(((Person) o).name));
}

看上去这个equals方法没有判断其参数o 是否为空,其实 instanceof已经悄悄的检查了参数o是否为null,如果o为null,则返回false。当equals的参数不为null且类型正确,则按照String的equals方法进行比较。
(4)两个不相等(equals方法返回false)的对象,产生的hashCode**不必必须** 相同。但是建议产生不同的hashCode,说是有效率的提升。
3.toString()方法
toString()方法在打印对象时会调用。如果不覆盖(override)toString()方法,默认使用父类Object的toString()方法,这时会打印该类的名字以及该对象散列码的无符号十六进制表示(通过hashCode生成的)。下面是Object#toString()的实现源码:

    /**
* Returns a string representation of the object. In general, the
* {@code toString} method returns a string that
* "textually represents" this object. The result should
* be a concise but informative representation that is easy for a
* person to read.
* It is recommended that all subclasses override this method.
* <p>
* The {@code toString} method for class {@code Object}
* returns a string consisting of the name of the class of which the
* object is an instance, the at-sign character `{@code @}', and
* the unsigned hexadecimal representation of the hash code of the
* object. In other words, this method returns a string equal to the
* value of:
* <blockquote>
* <pre>
* getClass().getName() + '@' + Integer.toHexString(hashCode())
* </pre></blockquote>
*
* @return a string representation of the object.
*/
public String toString() {
return getClass().getName() + "@" + Integer.toHexString(hashCode());
}

从上面的注释可以看到,推荐所有子类覆盖toString()方法
4.compareTo()方法
用户类要实现Comparable接口。这个方法主要用于将对象存放在TreeSet()时保证顺序的。由于是接口,所以用户类必须要实现这个方法。
下面是Comparable接口的源码:

public interface Comparable<T> {
/**
* Compares this object with the specified object for order. Returns a
* negative integer, zero, or a positive integer as this object is less
* than, equal to, or greater than the specified object.
*
* <p>The implementor must ensure <tt>sgn(x.compareTo(y)) ==
* -sgn(y.compareTo(x))</tt> for all <tt>x</tt> and <tt>y</tt>. (This
* implies that <tt>x.compareTo(y)</tt> must throw an exception iff
* <tt>y.compareTo(x)</tt> throws an exception.)
*
* <p>The implementor must also ensure that the relation is transitive:
* <tt>(x.compareTo(y)&gt;0 &amp;&amp; y.compareTo(z)&gt;0)</tt> implies
* <tt>x.compareTo(z)&gt;0</tt>.
*
* <p>Finally, the implementor must ensure that <tt>x.compareTo(y)==0</tt>
* implies that <tt>sgn(x.compareTo(z)) == sgn(y.compareTo(z))</tt>, for
* all <tt>z</tt>.
*
* <p>It is strongly recommended, but <i>not</i> strictly required that
* <tt>(x.compareTo(y)==0) == (x.equals(y))</tt>. Generally speaking, any
* class that implements the <tt>Comparable</tt> interface and violates
* this condition should clearly indicate this fact. The recommended
* language is "Note: this class has a natural ordering that is
* inconsistent with equals."

*
* <p>In the foregoing description, the notation
* <tt>sgn(</tt><i>expression</i><tt>)</tt> designates the mathematical
* <i>signum</i> function, which is defined to return one of <tt>-1</tt>,
* <tt>0</tt>, or <tt>1</tt> according to whether the value of
* <i>expression</i> is negative, zero or positive.
*
* @param o the object to be compared.
* @return a negative integer, zero, or a positive integer as this object
* is less than, equal to, or greater than the specified object.
*
* @throws NullPointerException if the specified object is null
* @throws ClassCastException if the specified object's type prevents it
* from being compared to this object.
*/
public int compareTo(T o);
}

(1)x.compareTo(y),如果x小于y,则返回负数;如果x=y,则返回0;如果x>y,则返回正数。
(2)注释中的sgn是一个数学函数,当参数小于0时返回-1;当参数等于0时返回0;当参数大于0时返回1。
(3)要求sgn(x.compareTo(y)) ==-sgn(y.compareTo(x))
(4)要求传递性 ,即:如果x.compareTo(y)与y.compareTo(z)均返回0,则y.compareTo(z)也必须返回0。
(5)必须要求,如果x.compareTo(y)返回0,则sgn(x.compareTo(z))=sgn(y.compareTo(z))。
(6)强烈推荐,但是不是严格要求,(x.compareTo(y)==0) == (x.equals(y))。也就是说,最好满足这个条件。

结束语

这篇博文总结了一下Set的实现类的特点,以及实现类需要的方法。下篇博文 关于set想说的(二)之Set Demo 写一个Demo。