Java垃圾收集如何与循环引用一起工作?

时间:2021-05-07 03:49:42

From my understanding, garbage collection in Java cleans up some object if nothing else is 'pointing' to that object.

根据我的理解,Java中的垃圾收集清理了一些对象,如果没有其他东西指向该对象的话。

My question is, what happens if we have something like this:

我的问题是,如果我们有这样的东西会怎么样:

class Node {
    public object value;
    public Node next;
    public Node(object o, Node n) { value = 0; next = n;}
}

//...some code
{
    Node a = new Node("a", null), 
         b = new Node("b", a), 
         c = new Node("c", b);
    a.next = c;
} //end of scope
//...other code

a, b, and c should be garbage collected, but they are all being referenced by other objects.

a、b和c应该被垃圾收集,但是它们都被其他对象引用。

How does the Java garbage collection deal with this? (or is it simply a memory drain?)

Java垃圾收集如何处理这个问题?(还是仅仅是内存耗尽?)

8 个解决方案

#1


128  

Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a garbage collection root, so these objects will be collected. Even though objects may point to each other to form a cycle, they're still garbage if they're cut off from the root.

如果对象不能通过从垃圾收集根开始的链来访问,Java的GC将其视为“垃圾”,因此将收集这些对象。即使对象可能指向彼此形成一个循环,但如果从根上切断它们,它们仍然是垃圾。

See the section on unreachable objects in Appendix A: The Truth About Garbage Collection in Java Platform Performance: Strategies and Tactics for the gory details.

关于Java平台性能中的垃圾收集的真相:关于gory细节的策略和策略。

#2


106  

yes Java Garbage collector handles circular-reference!

是的,Java垃圾收集器处理循环引用!

How?

There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.

有一种特殊的对象叫做垃圾收集根(GC根)。它们总是可达的,任何具有它们自身根的对象也是如此。

A simple Java application has the following GC roots:

一个简单的Java应用程序有以下GC根:

  1. Local variables in the main method
  2. 主方法中的局部变量。
  3. The main thread
  4. 主线程
  5. Static variables of the main class
  6. 主类的静态变量

Java垃圾收集如何与循环引用一起工作?

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows

为了确定哪些对象不再被使用,JVM间歇性地运行所谓的“标记-清除算法”。它的工作原理如下

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. 该算法遍历所有对象引用,从GC根开始,并将发现的每个对象标记为活动的。
  3. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.
  4. 所有没有被标记对象占用的堆内存都被回收。它被简单地标记为*,基本上清除了未使用的对象。

So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.

因此,如果任何对象不能从GC根中访问(即使它是自引用的或循环引用的),那么它将受到垃圾收集的影响。

Ofcourse sometimes this may led to memory leak if programmer forgets to dereference an object.

当然,有时这可能导致内存泄漏,如果程序员忘记删除一个对象。

Java垃圾收集如何与循环引用一起工作?

Source : Java Memory Management

源代码:Java内存管理

#3


10  

A garbage collector starts from some "root" set of places that are always considered "reachable", such as the CPU registers, stack, and global variables. It works by finding any pointers in those areas, and recursively finding everything they point at. Once it's found all that, everything else is garbage.

垃圾收集器从一些总是被认为是“可到达”的“根”位置开始,例如CPU寄存器、堆栈和全局变量。它通过查找这些区域中的任何指针来工作,并递归地查找它们指向的所有内容。一旦发现了所有这些,其他的一切都是垃圾。

There are, of course, quite a few variations, mostly for the sake of speed. For example, most modern garbage collectors are "generational", meaning that they divide objects into generations, and as an object gets older, the garbage collector goes longer and longer between times that it tries to figure out whether that object is still valid or not -- it just starts to assume that if it has lived a long time, chances are pretty good that it'll continue to live even longer.

当然,有相当多的变化,主要是为了速度。例如,大多数现代垃圾收集器“世代”,这意味着他们将对象划分为几代人,当一个对象变得老了,垃圾收集器就会越来越长倍之间,它试图确定对象是否仍然有效,它就开始假设,如果住了很长一段时间,机会是很好的,它会继续活的更久。

Nonetheless, the basic idea remains the same: it's all based on starting from some root set of things that it takes for granted could still be used, and then chasing all the pointers to find what else could be in use.

尽管如此,基本的思想仍然是一样的:它都是基于从一些它认为理所当然仍然可以使用的东西的根集开始,然后追踪所有的指针以找到其他可以使用的东西。

Interesting aside: may people are often surprised by the degree of similarity between this part of a garbage collector and code for marshaling objects for things like remote procedure calls. In each case, you're starting from some root set of objects, and chasing pointers to find all the other objects those refer to...

有趣的是:人们可能经常会惊讶于垃圾收集器的这一部分和用于封送对象的代码之间的相似性,比如远程过程调用。在每一种情况下,您都是从一些根对象集合开始,然后追踪指针,以找到那些引用的所有其他对象……

#4


8  

You are correct. The specific form of garbage collection you describe is called "reference counting". The way it works (conceptually, at least, most modern implementations of reference counting are actually implemented quite differently) in the simplest case, looks like this:

你是正确的。您描述的垃圾收集的具体形式称为“引用计数”。在最简单的情况下,它的工作方式(至少在概念上,大多数现代的引用计数实现实际上是完全不同的)是这样的:

  • whenever a reference to an object is added (e.g. it is assigned to a variable or a field, passed to method, and so on), its reference count is increased by 1
  • 每当添加对对象的引用(例如,将对象分配给变量或字段,传递给方法,等等),其引用计数就增加1
  • whenever a reference to an object is removed (the method returns, the variable goes out of scope, the field is re-assigned to a different object or the object which contains the field gets itself garbage collected), the reference count is decreased by 1
  • 当对对象的引用被删除(方法返回时,变量超出范围,字段被重新分配给一个不同的对象或包含该字段的对象本身被垃圾收集),引用计数减少1。
  • as soon as the reference count hits 0, there is no more reference to the object, which means nobody can use it anymore, therefore it is garbage and can be collected
  • 当引用计数为0时,不再引用该对象,这意味着没有人可以使用它,因此它是垃圾,可以收集。

And this simple strategy has exactly the problem you decribe: if A references B and B references A, then both of their reference counts can never be less than 1, which means they will never get collected.

这个简单的策略有一个问题,就是如果A引用B和B引用A,那么它们的引用数量永远不会小于1,这意味着它们永远不会被收集。

There are four ways to deal with this problem:

解决这个问题有四种方法:

  1. Ignore it. If you have enough memory, your cycles are small and infrequent and your runtime is short, maybe you can get away with simply not collecting cycles. Think of a shell script interpreter: shell scripts typically only run for a few seconds and don't allocate much memory.
  2. 忽略它。如果您有足够的内存,那么您的周期很小且不频繁,并且您的运行时很短,也许您可以不收集周期。考虑shell脚本解释器:shell脚本通常只运行几秒钟,并且不会分配太多内存。
  3. Combine your reference counting garbage collector with another garbage collector which doesn't have problems with cycles. CPython does this, for example: the main garbage collector in CPython is a reference counting collector, but from time to time a tracing garbage collector is run to collect the cycles.
  4. 将引用计数垃圾收集器与另一个没有循环问题的垃圾收集器组合在一起。例如,CPython实现了这一点:CPython中的主要垃圾收集器是引用计数收集器,但是不时运行跟踪垃圾收集器来收集循环。
  5. Detect the cycles. Unfortunately, detecting cycles in a graph is a rather expensive operation. In particular, it requires pretty much the same overhead that a tracing collector would, so you could just as well use one of those.
  6. 检测周期。不幸的是,在图中检测周期是一个相当昂贵的操作。特别是,它需要的开销与跟踪收集器几乎相同,因此您也可以使用其中之一。
  7. Don't implement the algorithm in the naive way you and I would: since the 1970s, there have been multiple quite interesting algorithms developed that combine cycle detection and reference counting in a single operation in a clever way that is significantly cheaper than either doing them both seperately or doing a tracing collector.
  8. 不天真的方式实现算法你我:自1970年代以来,已经有多个相当有趣的算法开发结合周期的检测和引用计数在单个操作在一个聪明的方式显著低于做他们两个是分开的或做一个跟踪收集器。

By the way, the other major way to implement a garbage collector (and I have already hinted at that a couple of times above), is tracing. A tracing collector is based on the concept of reachability. You start out with some root set that you know is always reachable (global constants, for example, or the Object class, the current lexical scope, the current stack frame) and from there you trace all objects that are reachable from the root set, then all objects that are reachable from the objects reachable from the root set and so on, until you have the transitive closure. Everything that is not in that closure is garbage.

顺便说一下,实现垃圾收集器的另一种主要方法是跟踪(我已经多次暗示过这一点)。跟踪收集器基于可达性的概念。你从一些根集开始总是可及(全局变量,例如,或对象类,当前词法作用域,当前堆栈帧),从那里你可以跟踪所有对象从根集,那么所有对象可及的对象可以从根集等,直到你传递闭包。没有闭包的所有东西都是垃圾。

Since a cycle is only reachable within itself, but not reachable from the root set, it will be collected.

由于一个循环本身只能到达,而不能从根集到达,因此将收集它。

#5


5  

The Java GCs don't actually behave as you describe. It's more accurate to say that they start from a base set of objects, frequently called "GC roots", and will collect any object that can not be reached from a root.
GC roots include things like:

Java GCs实际上并不像您描述的那样。更准确的说法是,它们从一组基本对象(通常称为“GC root”)开始,并收集从根中无法到达的任何对象。GC根源包括以下内容:

  • static variables
  • 静态变量
  • local variables (including all applicable 'this' references) currently in the stack of a running thread
  • 本地变量(包括所有适用的“this”引用)当前在一个正在运行的线程的堆栈中。

So, in your case, once the local variables a, b, and c go out of scope at the end of your method, there are no more GC roots that contain, directly or indirectly, a reference to any of your three nodes, and they'll be eligible for garbage collection.

因此,在您的情况下,一旦本地变量a、b和c在方法的末尾超出范围,就不再有包含、直接或间接地包含对您的三个节点的任何引用的GC根,并且它们将有资格进行垃圾收集。

TofuBeer's link has more detail if you want it.

如果你愿意,TofuBeer的链接有更多的细节。

#6


4  

This article (no longer available) goes into depth about the garbage collector (conceptually... there are several implementations). The relevant part to your post is "A.3.4 Unreachable":

本文(不再可用)深入介绍了垃圾收集器(概念上……)有几种实现)。你的职位的相关部分是“A.3.4不可及”:

A.3.4 Unreachable An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object.

A.3.4当不再存在对对象的强引用时,对象进入不可达状态。当对象不可到达时,它是集合的候选对象。请注意这样的措辞:仅仅因为对象是集合的候选对象并不意味着它将立即被收集。JVM可以*地延迟收集,直到对象立即需要使用内存。

#7


0  

Garbage collection doesn't usually mean "clean some object iff nothing else is 'pointing' to that object" (that's reference counting). Garbage collection roughly means finding objects that can't be reached from the program.

垃圾收集通常并不意味着“清除一些对象iff,没有其他东西指向该对象”(这是引用计数)。垃圾收集大致是指查找程序无法访问的对象。

So in your example, after a,b, and c go out of scope, they can be collected by the GC, since you can't access these objects anymore.

在你的例子中,在a b c超出范围后,它们可以被GC收集,因为你不能再访问这些对象了。

#8


0  

Bill answered your question directly. As Amnon said, your definition of garbage collection is just reference counting. I just wanted to add that even very simple algorithms like mark and sweep and copy collection easily handle circular references. So, nothing magic about it!

比尔直接回答了你的问题。正如Amnon所说,垃圾收集的定义只是引用计数。我只是想补充一点,即使是非常简单的算法,比如标记、清除和复制集合,也可以轻松地处理循环引用。所以,这没什么神奇的!

#1


128  

Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a garbage collection root, so these objects will be collected. Even though objects may point to each other to form a cycle, they're still garbage if they're cut off from the root.

如果对象不能通过从垃圾收集根开始的链来访问,Java的GC将其视为“垃圾”,因此将收集这些对象。即使对象可能指向彼此形成一个循环,但如果从根上切断它们,它们仍然是垃圾。

See the section on unreachable objects in Appendix A: The Truth About Garbage Collection in Java Platform Performance: Strategies and Tactics for the gory details.

关于Java平台性能中的垃圾收集的真相:关于gory细节的策略和策略。

#2


106  

yes Java Garbage collector handles circular-reference!

是的,Java垃圾收集器处理循环引用!

How?

There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.

有一种特殊的对象叫做垃圾收集根(GC根)。它们总是可达的,任何具有它们自身根的对象也是如此。

A simple Java application has the following GC roots:

一个简单的Java应用程序有以下GC根:

  1. Local variables in the main method
  2. 主方法中的局部变量。
  3. The main thread
  4. 主线程
  5. Static variables of the main class
  6. 主类的静态变量

Java垃圾收集如何与循环引用一起工作?

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows

为了确定哪些对象不再被使用,JVM间歇性地运行所谓的“标记-清除算法”。它的工作原理如下

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. 该算法遍历所有对象引用,从GC根开始,并将发现的每个对象标记为活动的。
  3. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.
  4. 所有没有被标记对象占用的堆内存都被回收。它被简单地标记为*,基本上清除了未使用的对象。

So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.

因此,如果任何对象不能从GC根中访问(即使它是自引用的或循环引用的),那么它将受到垃圾收集的影响。

Ofcourse sometimes this may led to memory leak if programmer forgets to dereference an object.

当然,有时这可能导致内存泄漏,如果程序员忘记删除一个对象。

Java垃圾收集如何与循环引用一起工作?

Source : Java Memory Management

源代码:Java内存管理

#3


10  

A garbage collector starts from some "root" set of places that are always considered "reachable", such as the CPU registers, stack, and global variables. It works by finding any pointers in those areas, and recursively finding everything they point at. Once it's found all that, everything else is garbage.

垃圾收集器从一些总是被认为是“可到达”的“根”位置开始,例如CPU寄存器、堆栈和全局变量。它通过查找这些区域中的任何指针来工作,并递归地查找它们指向的所有内容。一旦发现了所有这些,其他的一切都是垃圾。

There are, of course, quite a few variations, mostly for the sake of speed. For example, most modern garbage collectors are "generational", meaning that they divide objects into generations, and as an object gets older, the garbage collector goes longer and longer between times that it tries to figure out whether that object is still valid or not -- it just starts to assume that if it has lived a long time, chances are pretty good that it'll continue to live even longer.

当然,有相当多的变化,主要是为了速度。例如,大多数现代垃圾收集器“世代”,这意味着他们将对象划分为几代人,当一个对象变得老了,垃圾收集器就会越来越长倍之间,它试图确定对象是否仍然有效,它就开始假设,如果住了很长一段时间,机会是很好的,它会继续活的更久。

Nonetheless, the basic idea remains the same: it's all based on starting from some root set of things that it takes for granted could still be used, and then chasing all the pointers to find what else could be in use.

尽管如此,基本的思想仍然是一样的:它都是基于从一些它认为理所当然仍然可以使用的东西的根集开始,然后追踪所有的指针以找到其他可以使用的东西。

Interesting aside: may people are often surprised by the degree of similarity between this part of a garbage collector and code for marshaling objects for things like remote procedure calls. In each case, you're starting from some root set of objects, and chasing pointers to find all the other objects those refer to...

有趣的是:人们可能经常会惊讶于垃圾收集器的这一部分和用于封送对象的代码之间的相似性,比如远程过程调用。在每一种情况下,您都是从一些根对象集合开始,然后追踪指针,以找到那些引用的所有其他对象……

#4


8  

You are correct. The specific form of garbage collection you describe is called "reference counting". The way it works (conceptually, at least, most modern implementations of reference counting are actually implemented quite differently) in the simplest case, looks like this:

你是正确的。您描述的垃圾收集的具体形式称为“引用计数”。在最简单的情况下,它的工作方式(至少在概念上,大多数现代的引用计数实现实际上是完全不同的)是这样的:

  • whenever a reference to an object is added (e.g. it is assigned to a variable or a field, passed to method, and so on), its reference count is increased by 1
  • 每当添加对对象的引用(例如,将对象分配给变量或字段,传递给方法,等等),其引用计数就增加1
  • whenever a reference to an object is removed (the method returns, the variable goes out of scope, the field is re-assigned to a different object or the object which contains the field gets itself garbage collected), the reference count is decreased by 1
  • 当对对象的引用被删除(方法返回时,变量超出范围,字段被重新分配给一个不同的对象或包含该字段的对象本身被垃圾收集),引用计数减少1。
  • as soon as the reference count hits 0, there is no more reference to the object, which means nobody can use it anymore, therefore it is garbage and can be collected
  • 当引用计数为0时,不再引用该对象,这意味着没有人可以使用它,因此它是垃圾,可以收集。

And this simple strategy has exactly the problem you decribe: if A references B and B references A, then both of their reference counts can never be less than 1, which means they will never get collected.

这个简单的策略有一个问题,就是如果A引用B和B引用A,那么它们的引用数量永远不会小于1,这意味着它们永远不会被收集。

There are four ways to deal with this problem:

解决这个问题有四种方法:

  1. Ignore it. If you have enough memory, your cycles are small and infrequent and your runtime is short, maybe you can get away with simply not collecting cycles. Think of a shell script interpreter: shell scripts typically only run for a few seconds and don't allocate much memory.
  2. 忽略它。如果您有足够的内存,那么您的周期很小且不频繁,并且您的运行时很短,也许您可以不收集周期。考虑shell脚本解释器:shell脚本通常只运行几秒钟,并且不会分配太多内存。
  3. Combine your reference counting garbage collector with another garbage collector which doesn't have problems with cycles. CPython does this, for example: the main garbage collector in CPython is a reference counting collector, but from time to time a tracing garbage collector is run to collect the cycles.
  4. 将引用计数垃圾收集器与另一个没有循环问题的垃圾收集器组合在一起。例如,CPython实现了这一点:CPython中的主要垃圾收集器是引用计数收集器,但是不时运行跟踪垃圾收集器来收集循环。
  5. Detect the cycles. Unfortunately, detecting cycles in a graph is a rather expensive operation. In particular, it requires pretty much the same overhead that a tracing collector would, so you could just as well use one of those.
  6. 检测周期。不幸的是,在图中检测周期是一个相当昂贵的操作。特别是,它需要的开销与跟踪收集器几乎相同,因此您也可以使用其中之一。
  7. Don't implement the algorithm in the naive way you and I would: since the 1970s, there have been multiple quite interesting algorithms developed that combine cycle detection and reference counting in a single operation in a clever way that is significantly cheaper than either doing them both seperately or doing a tracing collector.
  8. 不天真的方式实现算法你我:自1970年代以来,已经有多个相当有趣的算法开发结合周期的检测和引用计数在单个操作在一个聪明的方式显著低于做他们两个是分开的或做一个跟踪收集器。

By the way, the other major way to implement a garbage collector (and I have already hinted at that a couple of times above), is tracing. A tracing collector is based on the concept of reachability. You start out with some root set that you know is always reachable (global constants, for example, or the Object class, the current lexical scope, the current stack frame) and from there you trace all objects that are reachable from the root set, then all objects that are reachable from the objects reachable from the root set and so on, until you have the transitive closure. Everything that is not in that closure is garbage.

顺便说一下,实现垃圾收集器的另一种主要方法是跟踪(我已经多次暗示过这一点)。跟踪收集器基于可达性的概念。你从一些根集开始总是可及(全局变量,例如,或对象类,当前词法作用域,当前堆栈帧),从那里你可以跟踪所有对象从根集,那么所有对象可及的对象可以从根集等,直到你传递闭包。没有闭包的所有东西都是垃圾。

Since a cycle is only reachable within itself, but not reachable from the root set, it will be collected.

由于一个循环本身只能到达,而不能从根集到达,因此将收集它。

#5


5  

The Java GCs don't actually behave as you describe. It's more accurate to say that they start from a base set of objects, frequently called "GC roots", and will collect any object that can not be reached from a root.
GC roots include things like:

Java GCs实际上并不像您描述的那样。更准确的说法是,它们从一组基本对象(通常称为“GC root”)开始,并收集从根中无法到达的任何对象。GC根源包括以下内容:

  • static variables
  • 静态变量
  • local variables (including all applicable 'this' references) currently in the stack of a running thread
  • 本地变量(包括所有适用的“this”引用)当前在一个正在运行的线程的堆栈中。

So, in your case, once the local variables a, b, and c go out of scope at the end of your method, there are no more GC roots that contain, directly or indirectly, a reference to any of your three nodes, and they'll be eligible for garbage collection.

因此,在您的情况下,一旦本地变量a、b和c在方法的末尾超出范围,就不再有包含、直接或间接地包含对您的三个节点的任何引用的GC根,并且它们将有资格进行垃圾收集。

TofuBeer's link has more detail if you want it.

如果你愿意,TofuBeer的链接有更多的细节。

#6


4  

This article (no longer available) goes into depth about the garbage collector (conceptually... there are several implementations). The relevant part to your post is "A.3.4 Unreachable":

本文(不再可用)深入介绍了垃圾收集器(概念上……)有几种实现)。你的职位的相关部分是“A.3.4不可及”:

A.3.4 Unreachable An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object.

A.3.4当不再存在对对象的强引用时,对象进入不可达状态。当对象不可到达时,它是集合的候选对象。请注意这样的措辞:仅仅因为对象是集合的候选对象并不意味着它将立即被收集。JVM可以*地延迟收集,直到对象立即需要使用内存。

#7


0  

Garbage collection doesn't usually mean "clean some object iff nothing else is 'pointing' to that object" (that's reference counting). Garbage collection roughly means finding objects that can't be reached from the program.

垃圾收集通常并不意味着“清除一些对象iff,没有其他东西指向该对象”(这是引用计数)。垃圾收集大致是指查找程序无法访问的对象。

So in your example, after a,b, and c go out of scope, they can be collected by the GC, since you can't access these objects anymore.

在你的例子中,在a b c超出范围后,它们可以被GC收集,因为你不能再访问这些对象了。

#8


0  

Bill answered your question directly. As Amnon said, your definition of garbage collection is just reference counting. I just wanted to add that even very simple algorithms like mark and sweep and copy collection easily handle circular references. So, nothing magic about it!

比尔直接回答了你的问题。正如Amnon所说,垃圾收集的定义只是引用计数。我只是想补充一点,即使是非常简单的算法,比如标记、清除和复制集合,也可以轻松地处理循环引用。所以,这没什么神奇的!