Java 7是否对方法array .Sort使用Tim排序?

时间:2021-06-29 07:44:20

I can't find the documentation of Java 7, I can only find about the Java 6, which is still quick or merge. Does anyone know how to find the documentation of the method Arrays.sort in Java 7?

我找不到Java 7的文档,只能找到Java 6的文档,它仍然是快速的或合并的。有人知道如何查找方法数组的文档吗?在Java 7 ?

3 个解决方案

#1


73  

Java 7 uses Dual-Pivot Quicksort for primitives and TimSort for objects.

Java 7对原语使用双主快速排序,对对象使用TimSort。

According to the Java 7 API doc for primitives:

根据Java 7 API doc对原语:

Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.

实现说明:排序算法是弗拉基米尔·雅罗斯拉夫斯基(Vladimir Yaroslavskiy)、乔恩·本特利(Jon Bentley)和约书亚·布洛赫(Joshua Bloch)设计的双轴快速排序算法。该算法在许多数据集上提供O(n log(n)性能,这些数据集导致其他快速排序退化为二次性能,并且通常比传统的(单支点)快速排序实现要快。

According to the Java 7 API doc for objects:

根据Java 7 API doc对象:

The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.

该实现改编自Tim Peters的Python列表排序(TimSort)。它使用了Peter McIlroy的“乐观排序和信息理论的复杂性”的技术,在第四届ACM-SIAM离散算法研讨会的会议上,第467-474页,1993年1月。

Not sure if this is much different from what it was in Java 6:

不确定这是否与Java 6有很大的不同:

a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993)

由Jon L. Bentley和M. Douglas McIlroy的“工程排序功能”,软件实践和经验,Vol. 23(11) P. 1249-1265(1993年11月)

#2


17  

Yes, Java 7 will use Timsort for Arrays.sort. Here is the commit: http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/bfd7abda8f79

是的,Java 7将对array .sort使用Timsort。下面是提交:http://hg.openjdk.java.net/jdk7/jdk/rev/bfd7abda8f79

#3


12  

Yes! ... and also no.

是的!…也没有。

Summary

In current Open JDK0 implementations Tim Sort is generally used for sorting arrays of objects (i.e., Arrays.sort(Object[]) and friends) - but for primitive arrays (the remainder of the Arrays.sort methods) a variety of methods, including but not limited to Tim Sort, are used. In that case, heuristics are used to chose which sort method to use.

在当前开放的JDK0实现中,Tim Sort通常用于对对象数组进行排序(即。sort(对象[])和朋友)-但对于原始数组(数组的其余部分)。排序方法)使用各种方法,包括但不限于Tim排序。在这种情况下,启发式用于选择要使用的排序方法。

For primitives, the heuristics choose among a variety of sorting methods depending on the data being sorted. Most of these decisions are simply made up-front based on the type and size of the array being sorted, but for int and long elements the decision is actually adaptive based on the measured sortedness of the array. So you have adaptation/introspection (the heuristics to pick an algorithm) on top of adaptation/introspection (TimSort) in many cases!

对于原语,启发式会根据被排序的数据在各种排序方法中进行选择。大多数决策都是基于排序的数组的类型和大小而预先做出的,但是对于int和long元素,决策实际上是基于数组的可测排序性进行的。所以在很多情况下,你都有自适应/内省(选择算法的启发式)和自适应/内省(TimSort) !

Details

Tim Sort is used for most sorts of objects, such as Arrays.sort(Object[] a), unless the user has specifically requested the legacy behavior by setting system property java.util.Arrays.useLegacyMergeSort to true.

Tim Sort用于大多数类型的对象,如数组。排序(对象[]a),除非用户通过设置system property java.util. array专门请求遗留行为。useLegacyMergeSort为true。

For primitives, the situation is more complex. As of JDK 8 (version 1.8.0_111) a variety of heurstics are used depending on the size of the arrays being sorted, the primitive type and the measured "sortedness" of the array. Here's an overview:

对于原语来说,情况更加复杂。从JDK 8(版本1.8.0_111)开始,根据被排序的数组的大小、基元类型和被测量的数组的“排序性”,将使用各种不同的解释学。概述:

  • For all primitive types other than bytes1, arrays of less than 47 elements are simply sorted using insertion sort (see DualPivotQuicksort.INSERTION_SORT_THRESHOLD). This threshold is also used when sorting sub-arrays that arise when merge or quicksort are used and the size of the subarray falls below the threshold. So some form of insertion sort will be used in all sorts, and for small arrays it is the only algorithm used.
  • 对于除了bytes1之外的所有基本类型,少于47个元素的数组仅使用插入排序(参见DualPivotQuicksort.INSERTION_SORT_THRESHOLD)。当使用merge或quicksort时产生的子数组排序,并且子数组的大小低于阈值时,也使用这个阈值。所以某种形式的插入排序将在所有类型中使用,对于小数组,它是唯一使用的算法。
  • For primitive types byte, short and char, a counting sort is used for largish arrays. This is a simple sort that takes O(n + range) time, where range is the total number of byte (256) or short/char (65536) values. The sort involves allocating an underlying array of range values, so it is only used when the number of elements to sort is a significant fraction of the total range. In particular, it is used for byte arrays greater than 29 elements (i.e. ~11% of the range), and short/char arrays greater than 3200 elements (~5% of the range).
  • 对于原始类型字节、短字符和字符,较大数组使用计数排序。这是一个简单的排序,需要O(n + range)时间,其中range是字节(256)或短/字符(65536)值的总数。排序涉及分配范围值的基础数组,因此只有在要排序的元素数量占整个范围的很大一部分时才使用排序。特别是,它用于大于29个元素(即范围的11%)的字节数组和大于3200个元素(范围的5%)的短/字符数组。
  • For byte arrays, one of the two approaches above is always used.
  • 对于字节数组,总是使用上述两种方法之一。
  • For int and long arrays above the insertion sort threshold and for short/char arrays both above the insertion sort threshold and below the counting sort threshold, one of two algorithms may be used: dual pivot quicksort, or merge sort. Which one is used depends on a measure of the sortedness of the array: the input is divided into runs of ascending or descending elements. If the number of such runs is greater than 66, then the array is considered mostly unsorted and is sorted with dual-pivot quicksort. Otherwise, the array is considered mostly sorted, and mergesort is used (using the already enumerated runs as a starting point).
  • 对于插入排序阈值以上的int和长数组,对于插入排序阈值以上和计数排序阈值以下的短/字符数组,可以使用两种算法中的一种:双主快速排序或合并排序。哪一个被使用取决于对数组的排序的度量:输入被划分为上升或下降的元素的运行。如果这样的运行次数大于66次,那么数组就被认为是大部分未排序的,并使用双主快速排序进行排序。否则,数组被认为是排序的,并使用归并排序(使用已经枚举的运行作为起点)。

The idea of finding runs and then using mergesort to sort them is in fact very similar to timsort, although there are some differences. So at least for some parameters, the JDK is using a run-aware mergesort, but for many other combinations of parameters it is using a different algorithm, and at least 4 distinct algorithms are used in total!

寻找运行并使用归并排序的想法实际上与timsort非常相似,尽管存在一些差异。因此,至少对于某些参数,JDK使用了一个运行感知的归并排序,但是对于许多其他参数的组合,它使用了不同的算法,总共使用了至少4种不同的算法!

Rationale

The reasoning behind the different sort behavior of Object[] versus primitive is probably at least two-fold:

对象[]与原语的不同排序行为背后的推理可能至少有两方面:

1) Sorts of Object[] are required to be stable: objects which sort equally will appear in the same order as the input. For primitive arrays, no such concept exists: primitives are fully defined by their value, so there is no distinction between a stable and an unstable sort. This allows primitive sorts to dispense with the need for stable algorithms in favor of speed.

1)对象的种类[]必须是稳定的:排序的对象将以与输入相同的顺序出现。对于原始数组,不存在这样的概念:原语完全由它们的值定义,因此稳定排序和不稳定排序之间没有区别。这使得原始排序不再需要稳定的算法,而需要速度。

2) Sorts of Object[] need to involve the Object.compare() method, which may be arbitrarily complex and expensive. Even if the compare() method is simple, there will generally be method call overhead unless the entire sort method can be inlined2. So sorts of Object[] will generally be biased towards minimizing total comparisons, even at the cost of some additional algorithmic complexity.

2)对象的种类[]需要涉及objec .compare()方法,该方法可能是任意复杂和昂贵的。即使compare()方法很简单,通常也会有方法调用开销,除非整个sort方法可以inlined2。因此,各种对象[]一般都倾向于最小化总比较,即使以某种额外的算法复杂度为代价。

Sorts of primitive arrays, on the other hand, just directly compare primitive values which typically takes on the order of a cycle or two. In this case, the algorithm should be optimized considering both the cost of comparisons and the surrounding algorithm, since the are likely to be of the same magnitude.

另一方面,原始数组的类型直接比较原始值,这些值通常是一个或两个周期的顺序。在这种情况下,算法应该考虑到比较的代价和周围的算法,因为它们的大小可能是相同的。


0 At least for versions between Java 7 and Java 9, and of course this also includes Oracle's JDK as it is based on Open JDK. It is likely that other implementations use a similar approach, but I haven't checked.

0至少适用于Java 7和Java 9之间的版本,当然也包括Oracle的JDK,因为它是基于Open JDK的。其他实现可能使用类似的方法,但我没有检查。

1 For byte arrays, the insertion sort threshold is effectively 29 elements since that's the lower cutoff above which counting sort is used.

对于字节数组,插入排序阈值实际上是29个元素,因为这是使用计数排序的下限。

2 This seems unlikely, since it is quite large.

这似乎不太可能,因为它相当大。

#1


73  

Java 7 uses Dual-Pivot Quicksort for primitives and TimSort for objects.

Java 7对原语使用双主快速排序,对对象使用TimSort。

According to the Java 7 API doc for primitives:

根据Java 7 API doc对原语:

Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.

实现说明:排序算法是弗拉基米尔·雅罗斯拉夫斯基(Vladimir Yaroslavskiy)、乔恩·本特利(Jon Bentley)和约书亚·布洛赫(Joshua Bloch)设计的双轴快速排序算法。该算法在许多数据集上提供O(n log(n)性能,这些数据集导致其他快速排序退化为二次性能,并且通常比传统的(单支点)快速排序实现要快。

According to the Java 7 API doc for objects:

根据Java 7 API doc对象:

The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.

该实现改编自Tim Peters的Python列表排序(TimSort)。它使用了Peter McIlroy的“乐观排序和信息理论的复杂性”的技术,在第四届ACM-SIAM离散算法研讨会的会议上,第467-474页,1993年1月。

Not sure if this is much different from what it was in Java 6:

不确定这是否与Java 6有很大的不同:

a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993)

由Jon L. Bentley和M. Douglas McIlroy的“工程排序功能”,软件实践和经验,Vol. 23(11) P. 1249-1265(1993年11月)

#2


17  

Yes, Java 7 will use Timsort for Arrays.sort. Here is the commit: http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/bfd7abda8f79

是的,Java 7将对array .sort使用Timsort。下面是提交:http://hg.openjdk.java.net/jdk7/jdk/rev/bfd7abda8f79

#3


12  

Yes! ... and also no.

是的!…也没有。

Summary

In current Open JDK0 implementations Tim Sort is generally used for sorting arrays of objects (i.e., Arrays.sort(Object[]) and friends) - but for primitive arrays (the remainder of the Arrays.sort methods) a variety of methods, including but not limited to Tim Sort, are used. In that case, heuristics are used to chose which sort method to use.

在当前开放的JDK0实现中,Tim Sort通常用于对对象数组进行排序(即。sort(对象[])和朋友)-但对于原始数组(数组的其余部分)。排序方法)使用各种方法,包括但不限于Tim排序。在这种情况下,启发式用于选择要使用的排序方法。

For primitives, the heuristics choose among a variety of sorting methods depending on the data being sorted. Most of these decisions are simply made up-front based on the type and size of the array being sorted, but for int and long elements the decision is actually adaptive based on the measured sortedness of the array. So you have adaptation/introspection (the heuristics to pick an algorithm) on top of adaptation/introspection (TimSort) in many cases!

对于原语,启发式会根据被排序的数据在各种排序方法中进行选择。大多数决策都是基于排序的数组的类型和大小而预先做出的,但是对于int和long元素,决策实际上是基于数组的可测排序性进行的。所以在很多情况下,你都有自适应/内省(选择算法的启发式)和自适应/内省(TimSort) !

Details

Tim Sort is used for most sorts of objects, such as Arrays.sort(Object[] a), unless the user has specifically requested the legacy behavior by setting system property java.util.Arrays.useLegacyMergeSort to true.

Tim Sort用于大多数类型的对象,如数组。排序(对象[]a),除非用户通过设置system property java.util. array专门请求遗留行为。useLegacyMergeSort为true。

For primitives, the situation is more complex. As of JDK 8 (version 1.8.0_111) a variety of heurstics are used depending on the size of the arrays being sorted, the primitive type and the measured "sortedness" of the array. Here's an overview:

对于原语来说,情况更加复杂。从JDK 8(版本1.8.0_111)开始,根据被排序的数组的大小、基元类型和被测量的数组的“排序性”,将使用各种不同的解释学。概述:

  • For all primitive types other than bytes1, arrays of less than 47 elements are simply sorted using insertion sort (see DualPivotQuicksort.INSERTION_SORT_THRESHOLD). This threshold is also used when sorting sub-arrays that arise when merge or quicksort are used and the size of the subarray falls below the threshold. So some form of insertion sort will be used in all sorts, and for small arrays it is the only algorithm used.
  • 对于除了bytes1之外的所有基本类型,少于47个元素的数组仅使用插入排序(参见DualPivotQuicksort.INSERTION_SORT_THRESHOLD)。当使用merge或quicksort时产生的子数组排序,并且子数组的大小低于阈值时,也使用这个阈值。所以某种形式的插入排序将在所有类型中使用,对于小数组,它是唯一使用的算法。
  • For primitive types byte, short and char, a counting sort is used for largish arrays. This is a simple sort that takes O(n + range) time, where range is the total number of byte (256) or short/char (65536) values. The sort involves allocating an underlying array of range values, so it is only used when the number of elements to sort is a significant fraction of the total range. In particular, it is used for byte arrays greater than 29 elements (i.e. ~11% of the range), and short/char arrays greater than 3200 elements (~5% of the range).
  • 对于原始类型字节、短字符和字符,较大数组使用计数排序。这是一个简单的排序,需要O(n + range)时间,其中range是字节(256)或短/字符(65536)值的总数。排序涉及分配范围值的基础数组,因此只有在要排序的元素数量占整个范围的很大一部分时才使用排序。特别是,它用于大于29个元素(即范围的11%)的字节数组和大于3200个元素(范围的5%)的短/字符数组。
  • For byte arrays, one of the two approaches above is always used.
  • 对于字节数组,总是使用上述两种方法之一。
  • For int and long arrays above the insertion sort threshold and for short/char arrays both above the insertion sort threshold and below the counting sort threshold, one of two algorithms may be used: dual pivot quicksort, or merge sort. Which one is used depends on a measure of the sortedness of the array: the input is divided into runs of ascending or descending elements. If the number of such runs is greater than 66, then the array is considered mostly unsorted and is sorted with dual-pivot quicksort. Otherwise, the array is considered mostly sorted, and mergesort is used (using the already enumerated runs as a starting point).
  • 对于插入排序阈值以上的int和长数组,对于插入排序阈值以上和计数排序阈值以下的短/字符数组,可以使用两种算法中的一种:双主快速排序或合并排序。哪一个被使用取决于对数组的排序的度量:输入被划分为上升或下降的元素的运行。如果这样的运行次数大于66次,那么数组就被认为是大部分未排序的,并使用双主快速排序进行排序。否则,数组被认为是排序的,并使用归并排序(使用已经枚举的运行作为起点)。

The idea of finding runs and then using mergesort to sort them is in fact very similar to timsort, although there are some differences. So at least for some parameters, the JDK is using a run-aware mergesort, but for many other combinations of parameters it is using a different algorithm, and at least 4 distinct algorithms are used in total!

寻找运行并使用归并排序的想法实际上与timsort非常相似,尽管存在一些差异。因此,至少对于某些参数,JDK使用了一个运行感知的归并排序,但是对于许多其他参数的组合,它使用了不同的算法,总共使用了至少4种不同的算法!

Rationale

The reasoning behind the different sort behavior of Object[] versus primitive is probably at least two-fold:

对象[]与原语的不同排序行为背后的推理可能至少有两方面:

1) Sorts of Object[] are required to be stable: objects which sort equally will appear in the same order as the input. For primitive arrays, no such concept exists: primitives are fully defined by their value, so there is no distinction between a stable and an unstable sort. This allows primitive sorts to dispense with the need for stable algorithms in favor of speed.

1)对象的种类[]必须是稳定的:排序的对象将以与输入相同的顺序出现。对于原始数组,不存在这样的概念:原语完全由它们的值定义,因此稳定排序和不稳定排序之间没有区别。这使得原始排序不再需要稳定的算法,而需要速度。

2) Sorts of Object[] need to involve the Object.compare() method, which may be arbitrarily complex and expensive. Even if the compare() method is simple, there will generally be method call overhead unless the entire sort method can be inlined2. So sorts of Object[] will generally be biased towards minimizing total comparisons, even at the cost of some additional algorithmic complexity.

2)对象的种类[]需要涉及objec .compare()方法,该方法可能是任意复杂和昂贵的。即使compare()方法很简单,通常也会有方法调用开销,除非整个sort方法可以inlined2。因此,各种对象[]一般都倾向于最小化总比较,即使以某种额外的算法复杂度为代价。

Sorts of primitive arrays, on the other hand, just directly compare primitive values which typically takes on the order of a cycle or two. In this case, the algorithm should be optimized considering both the cost of comparisons and the surrounding algorithm, since the are likely to be of the same magnitude.

另一方面,原始数组的类型直接比较原始值,这些值通常是一个或两个周期的顺序。在这种情况下,算法应该考虑到比较的代价和周围的算法,因为它们的大小可能是相同的。


0 At least for versions between Java 7 and Java 9, and of course this also includes Oracle's JDK as it is based on Open JDK. It is likely that other implementations use a similar approach, but I haven't checked.

0至少适用于Java 7和Java 9之间的版本,当然也包括Oracle的JDK,因为它是基于Open JDK的。其他实现可能使用类似的方法,但我没有检查。

1 For byte arrays, the insertion sort threshold is effectively 29 elements since that's the lower cutoff above which counting sort is used.

对于字节数组,插入排序阈值实际上是29个元素,因为这是使用计数排序的下限。

2 This seems unlikely, since it is quite large.

这似乎不太可能,因为它相当大。