JAVA泛型学习笔记

时间:2022-09-20 09:00:44

最近深入学习JAVA,抽时间开始看官方的tutorial。好不容易看到Generics这一章,发现真的很多坑等着我去跳。其实,我早已踩过雷,如今终于有机会深入地反向思考,去尝试理解其中的原理。

所以,我将换一种思路,谈谈泛型这个坑爹货。

 

1.为啥子要用泛型

官方文档的解释如下:

1)Stronger type checks at compile time.

2)Elimination of casts.

3)Enabling programmers to implement generic algorithms.

也就是,泛型的引入,不仅仅是增强灵活性,如实现泛型算法。还有更重要的一点,它也在增强编译型语言JAVA类型检查的作用!即编译检查更为严格。

所以,有松有紧,正是泛型的本质,而不是一味的强调Generalization。很多莫名其妙的error和warning很大部分比例是源自stronger type checks。

 

2.基本概念的明确

这里的基本概念,主要是type parameter和type argument.

Type Parameter and Type Argument Terminology: Many developers use the terms "type parameter" and "type argument" interchangeably, but these terms are not the same. When coding, one provides type arguments in order to create a parameterized type. Therefore, the T in Foo<T> is a type parameter and the String in Foo<String> f is a type argument. This lesson observes this definition when using these terms.

这里非常明确地指出了两者的区别,Foo<T>中的T是parameter,而Foo<String>中的String则是argument。

一定要区别这两个专业术语,因为在后面的讲解中,会严格的按照这里的定义来。

 

3.有限类型参数(bounded type parameters)

泛型的参数除了简单的用单一参数以外,还引入了upper bounded类型,例如:

public class NaturalNumber<T extends Integer>
public static <T extends Comparable<T>> int countGreaterThan(T[] anArray, T elem) 

需要注意的是,这里没有lower bounded的类型哦!至于为什么,可以看这个讨论:

http://*.com/questions/4902723/why-cant-a-java-type-parameter-have-a-lower-bound

 

4.泛型的继承关系

大家都知道,父类变量可以引用子类对象。因而在函数中,可以将参数声明为父类,而实际传递参数为子类对象。这是多态的重要表现之一。

泛型的继承比较有意思,有一些特有的规律。下面两张图可以很好的说明其中的关系。

JAVA泛型学习笔记            JAVA泛型学习笔记     

 

JAVA泛型学习笔记

 

5.通配符(wildcard)的诞生

上述继承关系并不能很好的解决诸如Box<Number>和Box<Integer>的父类问题。只是以Object作为父类其范围太大,起不到编译时更加严格的类型检查约束的作用。所以引入了通配符。通配符分upper bounded wildcard(? extends someClass)  , lower bounded wildcard(? super someClass) 和 unbounded wildcard(?)三类。具体关系如下图所示:

JAVA泛型学习笔记      JAVA泛型学习笔记

其具体的定义,也要好好琢磨,关键部分我加粗斜体表示。

In generic code, the question mark (?), called the wildcard, represents an unknown type. The wildcard can be used in a variety of situations: as the type of a parameter, field, or local variable; sometimes as a return type (though it is better programming practice to be more specific). The wildcard is never used as a type argument for a generic method invocation, a generic class instance creation, or a supertype.

也就是说,在泛型方法调用,泛型类实例创建,父类型的时候,wildcard是一种type parameter而不是type argument。

那么,是不是wildcard就不能作为type argument呢?答案是可以。下面的代码,是ok的。

List<? extends Integer> intList = new ArrayList<>();
List<? extends Number>  numList = intList; 

通配符先暂时放在这里,后面的坑大多与此有关,到时候我们回来再讲。

既然说到这里,就需要好好理解下,为什么List<Integer> 是 List<? extends Number>的子类。可能有人说,就是这么规定的。那好,我们重新深入思考一下,什么样的关系可以构成父类-子类关系。这里我总结了若干参考规则

1)父类比子类具有更加宽泛的概念。子类是父类的一种。

2)子类比父类拥有更多的属性。

最近和同事讨论技术,经常听见“子集”这个说法。其实,细细思量,“子集”这个词汇用在父类子类关系中并不准确,因为:

1)从拥有的属性和方法的数量角度而言,父类是子类的“子集”

2)从概念的精确程度上来说,子类是父类的“子集”

所以,为了避免这种模糊的定义,就不要拿子集父集来定义类的继承关系。一律使用父类 子类,或者subtype supertype来进行定义和区分。

回到刚才这个问题,List<Integer> 必然不是List<Number>的子类,哪怕Integer是Number的子类。

但是为了刻画出这种泛型元素之间的父子关系,只能另辟蹊径,引入通配符?来完成这种另一种概念层次上的父子关系。

但不管怎么绕,都离不开上面两个规律。List<? extends Number>代表的是一种更加宽泛的范围,他可以是List<Number>可以是List<Integer>,但却不能说List<Number>就是List<? extends Number>

 

6.类型擦除(type erasure)

java编译器在编译过程中,会自动擦除掉类型信息,进行某种程度的替换。同时在运行时期,JVM是不知道泛型的存在的,虚拟机是无法区别List<string>和List<Number>的差异的。

类型擦除是如何做的呢?文档有如下说明:

Generics were introduced to the Java language to provide tighter type checks at compile time and to support generic programming. To implement generics, the Java compiler applies type erasure to:

  • Replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded. The produced bytecode, therefore, contains only ordinary classes, interfaces, and methods.
  • Insert type casts if necessary to preserve type safety.
  • Generate bridge methods to preserve polymorphism in extended generic types.

Type erasure ensures that no new classes are created for parameterized types; consequently, generics incur no runtime overhead.

这里用例子解释一下,类似地,有以下转换(均来自于官方文档)

public static <T> int count(T[] anArray, T elem)    ----->    public static int count(Object[] anArray, Object elem)
public static <T extends Shape> void draw(T shape) { /* ... */ } ------>     public static void draw(Shape shape) { /* ... */ }

读到这里,发现类型擦除似乎只涉及到了标准的类型参数T以及bounded type parameters,而没有看到wildcard类型。

那么wildcard类型和标准的之间有什么区别吗?使用的时候需要注意什么?两者的type erase处理方式是否一样呢?

①类的定义

class A <T extends B>{}   //正确
class A <? extends B>{}  //错误

②对象初始化

A<? extends B> a = new A<>(); //正确
A<? extends B> a = new A<? extends B>(); //错误
A<T extends B> a = new A<>(); //错误
A<T extends B> a = new A<T extends B>(); //错误

③使用wildcard守则

其实规则的细节只有编译器的作者最为清楚,有时候,我们在去深究编译器到底会怎么处理一些看上去比较诡异的代码的时候,我们本身就应该尽量保证不去写这些诡异的代码。

写出简洁而又合乎常理的代码,才是我们应该做的。下面是应用wildcard的注意事项

An "In" Variable

An "in" variable serves up data to the code. Imagine a copy method with two arguments: copy(src, dest). The src argument provides the data to be copied, so it is the "in" parameter.

An "Out" VariableAn "out" variable holds data for use elsewhere. In the copy example, copy(src, dest), the dest argument accepts data, so it is the "out" parameter.

Wildcard Guidelines: 

  • An "in" variable is defined with an upper bounded wildcard, using the extends keyword.
  • An "out" variable is defined with a lower bounded wildcard, using the super keyword.
  • In the case where the "in" variable can be accessed using methods defined in the Object class, use an unbounded wildcard.
  • In the case where the code needs to access the variable as both an "in" and an "out" variable, do not use a wildcard.
  • These guidelines do not apply to a method's return type. Using a wildcard as a return type should be avoided because it forces programmers using the code to deal with wildcards.

根据上述规范,我们可以总结出一点比较通俗的规律,那就是wildcard类型一般都是作为一个引用变量的声明 (A<? extends B> a = new A<>())或者作为某个函数的形参的声明,而不是用于类的定义和实例化。类的定义,用普通参数T(class A <T extends B>{}),对象实例化,要么diamond,让编译器自动infer,(如 A<? extends B> a = new A<>())或者明确指出是哪个类(A<? extends Number> a = new A<Integer>())。

④type erase如何处理?

我们重点关注两种情况的type erase,一种是List<? extends Number>  、List<?>带有wildcard。还有一种是不带有通配符的。

不带通配符的上文已经说过,这里不在累述。那带有通配符呢。比如说List<?>会转化为List<object>么?

解答之前,先引入两个概念

reifiable type is a type whose type information is fully available at runtime. This includes primitives, non-generic types, raw types, and invocations of unbound wildcards. 

For example StringInteger, etc. A reifiable type essentially has the same type information at compile-time as it has at run-time.

Non-reifiable types are types where information has been removed at compile-time by type erasure — invocations of generic types that are not defined as unbounded wildcards. A non-reifiable type does not have all of its information available at runtime. Examples of non-reifiable types are List<String> and List<Number>; the JVM cannot tell the difference between these types at runtime. As shown in Restrictions on Generics, there are certain situations where non-reifiable types cannot be used: in an instanceof expression, for example, or as an element in an array. 

For example List<String>List<T>, and T. Non-reifiable types have less type information at run-time that at compile time. In fact, the run-time types of the above are ListList, and Object. During compilation, the generic type information is erased.

以及官方文档的资料:

It's important to note that List<Object> and List<?> are not the same. You can insert an Object, or any subtype of Object, into a List<Object>. But you can only insert null into a List<?>

这里有一个解答:

http://*.com/questions/31583697/how-type-erasure-works-for-wildcard-in-java

The only difference between ? extends Number and T extends Number is that in the second case, if is encountered again, it should denote the same type. So void add(List<? extends Number> first, List<? extends Number> second) can be called with add(new List<Double>(), new List<Long>())but <T> void add(List<T extends Number> first, List<T extends Number> second) can not.

 

Wildcards only differ from named type parameters at compile time as the compiler will try to enforce that types using the same named parameter are indeed the same.

 

? is replaced with Object (since it has no bound) - not that useful.
At build time the compiler will check you are only calling Object's behaviours.

? extends Number is replaced with its bound Number.
At build time the compiler will check you are only passing a Number or any of its subtypes as an argument.

? and T have different uses.
Think T for generic Type (Classes, Interfaces) creation - which can then be referred to anywhere in the type.

Think ? as a way of limiting what types you can legally invoke a method with at Compile time.

答案其实很明显了。不管是不是通配符,到最后运行时是没有差别的,真正的差别就在于编译时期的检查规则有所不同。

回到上面一个问题:比如说List<?>会转化为List<object>?

答案是,在编译时期,在结果上,会进行类型擦除,List<?>最后肯定还是List<object>。

但是为了保证在运行时期,不出现类型转换的的错误,会在编译的过程中对泛型参数(包括通配符)进行检查。如果无法通过检查,也就没有最后的类型擦除结果了。

说白了,一个是结果,一个是过程。

官方文档的例子是这样的:

import java.util.List;

public class WildcardError {

    void foo(List<?> i) {
        i.set(0, i.get(0));
    }
}

//In this example, the compiler processes the i input parameter as being of type Object. When the foo method invokes List.set(int, E), the compiler is not able to confirm the type of object that is being inserted into the list, and an error is produced. 

官方解释的不算太详细,*上有一个答案很好:http://*.com/questions/12043874/java-generics-wildcard-capture-misunderstanding

The compiler doesn't know anything about the type of elements in List<?> i, by definition of ?. Wildcard does not mean "any type;" it means "some unknown type."

the compiler can only know – at compile time, remember – that i.get(0) returns an Object, which is the upper bound of ?. But there's no guarantee that ? is at runtime Object, so there is no way for the compiler to know that i.set(0, i.get(0)) is a safe call. It's like writing this:

List<Foo> fooz = /* init */; Object foo = fooz.get(0); fooz.set(0, foo); // won't compile because foo is an object, not a Foo

--------------------------------------------------------------------------------------------------------------------------------------------

Put differently, why does the compiler not know that the two usages of the wildcard type List<?> in

i.set(0, i.get(0));

refer to the same actual type?

Well, that would require the compiler to know that i contains the same instance for both evaluations of the expression. Since i isn't even final, the compiler would have to check whether i could possibly have been assigned in between evaluating the two expressions. Such an analysis is only simple for local variables (for who knows whether an invoked method will update a particular field of a particular object?). This is quite a bit of additional complexity in the compiler for rarely manifesting benefits. I suppose that's why the designers of the Java programming language kept things simple by specifying that different uses of the same wildcard type have different captures.

在List<?>的情况下,调用get时,编译器也不知道,取出的到底是个什么玩意,暂且用object进行替代。

请注意 '? extends A' 不代表任何一种类,而是代表某一种类。例如,即便B和C是A的子类,你也不能往List<? extends A>里面添加B类后又添加C类,来个混搭。

List<? extends A>最终的结果可以是List<B> 也可能是List<C>,但只可能是一种,不可能同时。

所以,在后面set的时候,编译器是需要严格检查插入元素的类型,而目前List是一个?类型,它可能存了B类,也可能存了C类,即便编译器暂且用object来同时引用B类C类,但是在运行时期,“有可能”就会发生把一个C类插入ArrayList<B>中,这显然是不行的。为了防止运行时期出现错误,编译器干脆悲观一点,编译时报错,以免后患无穷。

为了解决这个问题,采用以下方法,显式指出参数T,强制指出前后匹配性,这样就完成了编译检查。

public class WildcardFixed {
    void foo(List<?> i) {
        fooHelper(i);
    }

    // Helper method created so that the wildcard can be captured
    // through type inference.
    private <T> void fooHelper(List<T> l) {
        l.set(0, l.get(0));
    }
}

由此看来,T和?还是有区别的,正如前文所说,可以使得T在后续内部再次使用,保证了一致性(same)。

 

7、为何不能实例化一个类型参数对象 例如 E e = new E()

因为编译器根本就不知道E到底是个什么类型,如何谈初始化?可是你会说,不是都被type erase为object了么。。

错,注意定义,type erase针对的是type parameter。什么是type parameter,只有类似于foo<T>中的T才是。

换句话说,类型擦除的是各种类似于声明后用于引用的变量,而不是实例化对象。

如果编译规则都无法通过,那么也就不会擦除为object。擦除是结果,不是过程。

 

深挖了这么多,我想说的是,在深究编译器到底会怎么处理一些看上去比较诡异的代码的时候,我们本身就应该尽量保证不去写这些诡异的代码。

wildcard的使用准则,业界也早已形成共识。

  • If you are writing a method that can be implemented using functionality provided in the Object class.
  • When the code is using methods in the generic class that don't depend on the type parameter. For example,List.size or List.clear. In fact, Class<?> is so often used because most of the methods in Class<T> do not depend on T.

 

This, in Java, implies a read-only nature, namely, we are allowed to read items from the generic structure, but we are not allowed to put anything back in it, because we cannot be certain of the actual type of the elements in it.

 

Wildcard Guidelines: 

  • An "in" variable is defined with an upper bounded wildcard, using the extends keyword.
  • An "out" variable is defined with a lower bounded wildcard, using the super keyword.
  • In the case where the "in" variable can be accessed using methods defined in the Object class, use an unbounded wildcard.
  • In the case where the code needs to access the variable as both an "in" and an "out" variable, do not use a wildcard.
  • These guidelines do not apply to a method's return type. Using a wildcard as a return type should be avoided because it forces programmers using the code to deal with wildcards.

 

上述guildline在effective java里面也有探讨,那就是PECS原则,也叫Producer Extends, Consumer super

下面论述的问题和上面讲的通配符?的用法有类似之处。

Wildcards

As we've seen in the previous post, the subtyping relation of generic types is invariant. Sometimes, though, we'd like to use generic types in the same way we can use ordinary types:

  • Narrowing a reference (covariance).
  • Widening a reference (contravariance

 

Covariance

Let's suppose, for example, that we've got a set of boxes, each one of a different kind of fruit. We'd like to be able to write methods that could accept a any of them. More formally, given a subtype A of a type B, we'd like to find a way to use a reference (or a method parameter) of type C<B> that could accept instances of C<A>.

To accomplish this task we can use a wildcard with extends, such as in the following example:

List<Apple> apples = new ArrayList<Apple>();
List<? extends Fruit> fruits = apples;

? extends reintroduces covariant subtyping for generics types: Apple is a subtype of Fruit andList<Apple> is a subtype of List<? extends Fruit>.

Contravariance

Let's now introduce another wildcard: ? super. Given a supertype B of a type A, then C<B> is a subtype ofC<? super A>:

List<Fruit> fruits = new ArrayList<Fruit>();
List<? super Apple> = fruits;

 

How Can Wildcards Be Used?

Enough theory for now: how can we take advantage of these new constructs?

? extends

Let's go back to the example we used in Part II when introducing Java array covariance:

Apple[] apples = new Apple[1];
Fruit[] fruits = apples;
fruits[0] = new Strawberry(); 

As we saw, this code compiles but results in a runtime exception when trying to add a Strawberry to anApple array through a reference to a Fruit array.

Now we can use wildcards to translate this code to its generic counterpart: since Apple is a subtype of Fruit, we will use the ? extends wildcard to be able to assign a reference of a List<Apple> to a reference of aList<? extends Fruit> :

List<Apple> apples = new ArrayList<Apple>();
List<? extends Fruit> fruits = apples;
fruits.add(new Strawberry());


This time, the code won't compile! The Java compiler now prevents us to add a strawberry to a list of fruits. We will detect the error at compile time and we won't even need any runtime check (such as in the case of array stores) to ensure that we're adding to the list a compatible type. The code won't compile even if we try to add a Fruit instance into the list:

fruits.add(new Fruit());


No way. It comes out that, indeed, you can't put anything into a structure whose type uses the ? extendswildcard.

The reason is pretty simple, if we think about it: the ? extends T wildcard tells the compiler that we're dealing with a subtype of the type T, but we cannot know which one. Since there's no way to tell, and we need to guarantee type safety, you won't be allowed to put anything inside such a structure. On the other hand, since we know that whichever type it might be, it will be a subtype of T, we can get data out of the structure with the guarantee that it will be a T instance:

Fruit get = fruits.get(0);

? super

What's the behavior of a type that's using the ? super wildcard? Let's start with this:

List<Apple> apples = new ArrayList<Apple>();
List<? super Apple> fruits = apples;


We know that fruits is a reference to a List of something that is a supertype of Apple. Again, we cannot know which supertype it is, but we know that Apple and any of its subtypes will be assignment compatible with it. Indeed, since such an unknown type will be both an Apple and a GreenApple supertype, we can write:

fruits.add(new Apple());

fruits.add(new GreenApple());

If we try to add whichever Apple supertype, the compiler will complain:

fruits.add(new Fruit());
fruits.add(new Object());


Since we cannot know which supertype it is, we aren't allowed to add instances of any.

What about getting data out of such a type? It turns out that you the only thing you can get out of it will be Object instances: since we cannot know which supertype it is, the compiler can only guarantee that it will be a reference to an Object, since Object is the supertype of any Java type.

 

The Get and Put Principle or the PECS Rule

Summarizing the behavior of the ? extends and the ? super wildcards, we draw the following conclusion:


Use the ? extends wildcard if you need to retrieve object from a data structure.
Use the ? super wildcard if you need to put objects in a data structure.
If you need to do both things, don't use any wildcard.

This is what Maurice Naftalin calls The Get and Put Principle in his Java Generics and Collections and what Joshua Bloch calls The PECS Rule in his Effective Java.

Bloch's mnemonic, PECS, comes from "Producer Extends, Consumer Super" and is probably easier to remember and use.