有什么方法可以在不进行初始化的情况下创建基元数组吗?

时间:2021-06-30 15:53:37

As we know Java always initialises arrays upon creation. I.e. new int[1000000] always returns an array with all elements = 0. I understand that it's a must for Object arrays, but for primitive arrays (except may be Boolean) in most cases we don't care about the initial values.

正如我们所知道的,Java总是在创建时初始化数组。例如,新的int[1000000]总是返回一个所有元素都为0的数组。我知道这对于对象数组是必须的,但是对于原始数组(除了可能是布尔)在大多数情况下,我们不关心初始值。

Does anybody know a way to avoid this intialization?

有人知道如何避免这种暴动吗?

2 个解决方案

#1


16  

I've done some investigation. There is no legal way to create uninitialized array in Java. Even JNI NewXxxArray creates initialized arrays. So it is impossible to know exactly the cost of array zeroing. Nevertheless I've done some measurements:

我做了一些调查。在Java中没有合法的方法来创建未初始化的数组。即使是JNI NewXxxArray也会创建初始化的数组。因此,不可能确切地知道阵列归零的代价。尽管如此,我还是做了一些测量:

1) 1000 byte arrays creation with different array size

1)创建不同数组大小的1000字节数组

        long t0 = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
//          byte[] a1 = new byte[1];
            byte[] a1 = new byte[1000000];
        }
        System.out.println(System.currentTimeMillis() - t0);

on my PC it gives < 1ms for byte[1] and ~500 ms for byte[1000000]. Sounds impressive to me.

在我的PC上,它提供了< 1ms的字节[1]和~500 ms的字节[1000000]。我听起来令人印象深刻。

2) We don't have a fast (native) method in JDK for filling arrays, Arrays.fill is too slow, so let's see at least how much 1000 copying of 1,000,000 size array takes with native System.arraycopy

2)我们在JDK中没有快速(本机)方法来填充数组和数组。fill太慢了,所以我们至少看看在本机System.arraycopy中1000次复制1,000,000大小的数组需要多少时间

    byte[] a1 = new byte[1000000];
    byte[] a2 = new byte[1000000];
    for(int i = 0; i < 1000; i++) {
        System.arraycopy(a1, 0, a2, 0, 1000000);
    }

It is 700 ms.

这是700 ms。

It gives me reasons to believe that a) creating long arrays is expensive b) it seems to be expensive because of useless initialization.

它使我相信a)创建长数组是昂贵的b)它似乎是昂贵的,因为无用的初始化。

3) Let's take sun.misc.Unsafe http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html. It is protected from external usage but not too much

3)我们来看sun.misc。不安全的http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html。它不受外部使用的保护,但也不过分

    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    Unsafe unsafe = (Unsafe)f.get(null);

Here is the cost of memory allocation test

这是内存分配测试的成本

    for(int i = 0; i < 1000; i++) {
        long m = u.allocateMemory(1000000);
    }

It takes < 1 ms, if you remember, for new byte[1000000] it took 500ms.

如果你还记得的话,它需要< 1ms,对于新的字节[1000000]它需要500ms。

4) Unsafe has no direct methods to work with arrays. It needs to know class fields, but reflection shows no fields in an array. There is not much info about arrays internals, I guess it is JVM / platform specific. Nevertheless, it is, like any other Java Object, header + fields. On my PC/JVM it looks like

不安全没有使用数组的直接方法。它需要知道类字段,但是反射在数组中不显示字段。关于数组内部的信息并不多,我想它是特定于JVM /平台的。不过,它和其他Java对象一样,是header + fields。在我的PC/JVM上看起来是这样的

header - 8 bytes
int length - 4 bytes
long bufferAddress - 8 bytes

Now, using Unsafe, I will create byte[10], allocate a 10 byte memory buffer and use it as my array's elements:

现在,使用不安全,我将创建字节[10],分配一个10字节的内存缓冲区,并将它作为我的数组的元素:

    byte[] a = new byte[10];
    System.out.println(Arrays.toString(a));
    long mem = unsafe.allocateMemory(10);
    unsafe.putLong(a, 12, mem);
    System.out.println(Arrays.toString(a));

it prints

它打印

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[8, 15, -114, 24, 0, 0, 0, 0, 0, 0]

You can see thay array's data are not initialized.

可以看到数组的数据没有初始化。

Now I'll change our array length (though it still points to 10 bytes memory)

现在我将修改数组长度(尽管它仍然指向10字节内存)

    unsafe.putInt(a, 8, 1000000);
    System.out.println(a.length);

it shows 1000000. It was just to prove that the idea works.

它显示了1000000。只是为了证明这个想法是可行的。

Now performance test. I will create an empty byte array a1, allocate a buffer of 1000000 bytes, assign this buffer to a1 an set a1.length = 10000000

现在的性能测试。我将创建一个空字节数组a1,分配一个1000000字节的缓冲区,将这个缓冲区分配给a1一个集合a1。长度= 10000000

    long t0 = System.currentTimeMillis();
    for(int i = 0; i < 1000; i++) {
        byte[] a1 = new byte[0];
        long mem1 = unsafe.allocateMemory(1000000);
        unsafe.putLong(a1, 12, mem);
        unsafe.putInt(a1, 8, 1000000);
    }
    System.out.println(System.currentTimeMillis() - t0);

it takes 10ms.

这需要10 ms。

5) There are malloc and alloc in C++, malloc just allocates memory block , calloc also initializes it with zeroes.

5) c++中有malloc和alloc, malloc只分配内存块,calloc也用0初始化它。

cpp

cpp

...
JNIEXPORT void JNICALL Java_Test_malloc(JNIEnv *env, jobject obj, jint n) {
     malloc(n);
} 

java

java

private native static void malloc(int n);

for (int i = 0; i < 500; i++) {
    malloc(1000000);
}

results malloc - 78 ms; calloc - 468 ms

结果:malloc - 78ms;女士calloc - 468

Conclusions

结论

  1. It seems that Java array creation is slow because of useless element zeroing.
  2. 看起来Java数组的创建速度很慢,因为元素归零是无用的。
  3. We cannot change it, but Oracle can. No need to change anything in JLS, just add native methods to java.lang.reflect.Array like

    我们无法改变,但甲骨文可以。无需更改JLS中的任何内容,只需向java.lang. reflection添加本机方法即可。数组一样

    public static native xxx[] newUninitialziedXxxArray(int size);

    公共静态本机xxx[] newUninitialziedXxxArray(int size);

for all primitive numeric types (byte - double) and char type. It could be used all over the JDK, like in java.util.Arrays

对于所有原始数字类型(字节-双)和字符类型。它可以在JDK中使用,比如在java.util. array中

    public static int[] copyOf(int[] original, int newLength) {
        int[] copy = Array.newUninitializedIntArray(newLength);
        System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength));
        ...

or java.lang.String

或以

   public String concat(String str) {
        ...   
        char[] buf = Array.newUninitializedCharArray(count + otherLen);
        getChars(0, count, buf, 0);
        ...

#2


3  

I'm going to move this to an answer because it probably should be.

我把它移到一个答案里因为它应该是。

An "Array" in java is not what you think it is. It's not just a pointer to a chunk of contiguous memory on the stack or heap.

java中的“数组”不是您想的那样。它不只是指向堆栈或堆上一块连续内存的指针。

An Array in Java is an Object just like everything else (except primitives) and is on the heap. When you call new int[100000] you're creating a new object just like every other object, and it gets initialized, etc.

Java中的数组与其他对象(除了原语)一样,也是堆上的对象。当你调用new int[100000]时,你就像其他对象一样创建了一个新对象,它会被初始化,等等。

The JLS provides all the specific info about this:

JLS提供了关于此的所有具体信息:

http://docs.oracle.com/javase/specs/jls/se5.0/html/arrays.html

http://docs.oracle.com/javase/specs/jls/se5.0/html/arrays.html

So, no. You can't avoid "initializing" an array. That's just not how Java works. There's simply no such thing as uninitialized heap memory; many people call that a "feature" as it prevents you from accessing uninitialized memory.

所以,没有。不能避免“初始化”数组。这不是Java的工作方式。根本不存在未初始化的堆内存;许多人将其称为“特性”,因为它阻止您访问未初始化的内存。

#1


16  

I've done some investigation. There is no legal way to create uninitialized array in Java. Even JNI NewXxxArray creates initialized arrays. So it is impossible to know exactly the cost of array zeroing. Nevertheless I've done some measurements:

我做了一些调查。在Java中没有合法的方法来创建未初始化的数组。即使是JNI NewXxxArray也会创建初始化的数组。因此,不可能确切地知道阵列归零的代价。尽管如此,我还是做了一些测量:

1) 1000 byte arrays creation with different array size

1)创建不同数组大小的1000字节数组

        long t0 = System.currentTimeMillis();
        for(int i = 0; i < 1000; i++) {
//          byte[] a1 = new byte[1];
            byte[] a1 = new byte[1000000];
        }
        System.out.println(System.currentTimeMillis() - t0);

on my PC it gives < 1ms for byte[1] and ~500 ms for byte[1000000]. Sounds impressive to me.

在我的PC上,它提供了< 1ms的字节[1]和~500 ms的字节[1000000]。我听起来令人印象深刻。

2) We don't have a fast (native) method in JDK for filling arrays, Arrays.fill is too slow, so let's see at least how much 1000 copying of 1,000,000 size array takes with native System.arraycopy

2)我们在JDK中没有快速(本机)方法来填充数组和数组。fill太慢了,所以我们至少看看在本机System.arraycopy中1000次复制1,000,000大小的数组需要多少时间

    byte[] a1 = new byte[1000000];
    byte[] a2 = new byte[1000000];
    for(int i = 0; i < 1000; i++) {
        System.arraycopy(a1, 0, a2, 0, 1000000);
    }

It is 700 ms.

这是700 ms。

It gives me reasons to believe that a) creating long arrays is expensive b) it seems to be expensive because of useless initialization.

它使我相信a)创建长数组是昂贵的b)它似乎是昂贵的,因为无用的初始化。

3) Let's take sun.misc.Unsafe http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html. It is protected from external usage but not too much

3)我们来看sun.misc。不安全的http://www.javasourcecode.org/html/open-source/jdk/jdk-6u23/sun/misc/Unsafe.html。它不受外部使用的保护,但也不过分

    Field f = Unsafe.class.getDeclaredField("theUnsafe");
    f.setAccessible(true);
    Unsafe unsafe = (Unsafe)f.get(null);

Here is the cost of memory allocation test

这是内存分配测试的成本

    for(int i = 0; i < 1000; i++) {
        long m = u.allocateMemory(1000000);
    }

It takes < 1 ms, if you remember, for new byte[1000000] it took 500ms.

如果你还记得的话,它需要< 1ms,对于新的字节[1000000]它需要500ms。

4) Unsafe has no direct methods to work with arrays. It needs to know class fields, but reflection shows no fields in an array. There is not much info about arrays internals, I guess it is JVM / platform specific. Nevertheless, it is, like any other Java Object, header + fields. On my PC/JVM it looks like

不安全没有使用数组的直接方法。它需要知道类字段,但是反射在数组中不显示字段。关于数组内部的信息并不多,我想它是特定于JVM /平台的。不过,它和其他Java对象一样,是header + fields。在我的PC/JVM上看起来是这样的

header - 8 bytes
int length - 4 bytes
long bufferAddress - 8 bytes

Now, using Unsafe, I will create byte[10], allocate a 10 byte memory buffer and use it as my array's elements:

现在,使用不安全,我将创建字节[10],分配一个10字节的内存缓冲区,并将它作为我的数组的元素:

    byte[] a = new byte[10];
    System.out.println(Arrays.toString(a));
    long mem = unsafe.allocateMemory(10);
    unsafe.putLong(a, 12, mem);
    System.out.println(Arrays.toString(a));

it prints

它打印

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[8, 15, -114, 24, 0, 0, 0, 0, 0, 0]

You can see thay array's data are not initialized.

可以看到数组的数据没有初始化。

Now I'll change our array length (though it still points to 10 bytes memory)

现在我将修改数组长度(尽管它仍然指向10字节内存)

    unsafe.putInt(a, 8, 1000000);
    System.out.println(a.length);

it shows 1000000. It was just to prove that the idea works.

它显示了1000000。只是为了证明这个想法是可行的。

Now performance test. I will create an empty byte array a1, allocate a buffer of 1000000 bytes, assign this buffer to a1 an set a1.length = 10000000

现在的性能测试。我将创建一个空字节数组a1,分配一个1000000字节的缓冲区,将这个缓冲区分配给a1一个集合a1。长度= 10000000

    long t0 = System.currentTimeMillis();
    for(int i = 0; i < 1000; i++) {
        byte[] a1 = new byte[0];
        long mem1 = unsafe.allocateMemory(1000000);
        unsafe.putLong(a1, 12, mem);
        unsafe.putInt(a1, 8, 1000000);
    }
    System.out.println(System.currentTimeMillis() - t0);

it takes 10ms.

这需要10 ms。

5) There are malloc and alloc in C++, malloc just allocates memory block , calloc also initializes it with zeroes.

5) c++中有malloc和alloc, malloc只分配内存块,calloc也用0初始化它。

cpp

cpp

...
JNIEXPORT void JNICALL Java_Test_malloc(JNIEnv *env, jobject obj, jint n) {
     malloc(n);
} 

java

java

private native static void malloc(int n);

for (int i = 0; i < 500; i++) {
    malloc(1000000);
}

results malloc - 78 ms; calloc - 468 ms

结果:malloc - 78ms;女士calloc - 468

Conclusions

结论

  1. It seems that Java array creation is slow because of useless element zeroing.
  2. 看起来Java数组的创建速度很慢,因为元素归零是无用的。
  3. We cannot change it, but Oracle can. No need to change anything in JLS, just add native methods to java.lang.reflect.Array like

    我们无法改变,但甲骨文可以。无需更改JLS中的任何内容,只需向java.lang. reflection添加本机方法即可。数组一样

    public static native xxx[] newUninitialziedXxxArray(int size);

    公共静态本机xxx[] newUninitialziedXxxArray(int size);

for all primitive numeric types (byte - double) and char type. It could be used all over the JDK, like in java.util.Arrays

对于所有原始数字类型(字节-双)和字符类型。它可以在JDK中使用,比如在java.util. array中

    public static int[] copyOf(int[] original, int newLength) {
        int[] copy = Array.newUninitializedIntArray(newLength);
        System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength));
        ...

or java.lang.String

或以

   public String concat(String str) {
        ...   
        char[] buf = Array.newUninitializedCharArray(count + otherLen);
        getChars(0, count, buf, 0);
        ...

#2


3  

I'm going to move this to an answer because it probably should be.

我把它移到一个答案里因为它应该是。

An "Array" in java is not what you think it is. It's not just a pointer to a chunk of contiguous memory on the stack or heap.

java中的“数组”不是您想的那样。它不只是指向堆栈或堆上一块连续内存的指针。

An Array in Java is an Object just like everything else (except primitives) and is on the heap. When you call new int[100000] you're creating a new object just like every other object, and it gets initialized, etc.

Java中的数组与其他对象(除了原语)一样,也是堆上的对象。当你调用new int[100000]时,你就像其他对象一样创建了一个新对象,它会被初始化,等等。

The JLS provides all the specific info about this:

JLS提供了关于此的所有具体信息:

http://docs.oracle.com/javase/specs/jls/se5.0/html/arrays.html

http://docs.oracle.com/javase/specs/jls/se5.0/html/arrays.html

So, no. You can't avoid "initializing" an array. That's just not how Java works. There's simply no such thing as uninitialized heap memory; many people call that a "feature" as it prevents you from accessing uninitialized memory.

所以,没有。不能避免“初始化”数组。这不是Java的工作方式。根本不存在未初始化的堆内存;许多人将其称为“特性”,因为它阻止您访问未初始化的内存。