用Java将整数数组写入文件的最快方法?

时间:2021-03-03 04:06:40

As the title says, I'm looking for the fastest possible way to write integer arrays to files. The arrays will vary in size, and will realistically contain anywhere between 2500 and 25 000 000 ints.

正如标题所说,我正在寻找最快的方法将整数数组写入文件。这些数组的大小不同,实际上将包含2500到25000个ints。

Here's the code I'm presently using:

下面是我目前使用的代码:

DataOutputStream writer = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));

for (int d : data)
  writer.writeInt(d);

Given that DataOutputStream has a method for writing arrays of bytes, I've tried converting the int array to a byte array like this:

假设DataOutputStream有一个用于编写字节数组的方法,我尝试将int数组转换为如下所示的字节数组:

private static byte[] integersToBytes(int[] values) throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    DataOutputStream dos = new DataOutputStream(baos);
    for (int i = 0; i < values.length; ++i) {
        dos.writeInt(values[i]);
    }

    return baos.toByteArray();
}

and like this:

就像这样:

private static byte[] integersToBytes2(int[] src) {
    int srcLength = src.length;
    byte[] dst = new byte[srcLength << 2];

    for (int i = 0; i < srcLength; i++) {
        int x = src[i];
        int j = i << 2;
        dst[j++] = (byte) ((x >>> 0) & 0xff);
        dst[j++] = (byte) ((x >>> 8) & 0xff);
        dst[j++] = (byte) ((x >>> 16) & 0xff);
        dst[j++] = (byte) ((x >>> 24) & 0xff);
    }
    return dst;
}

Both seem to give a minor speed increase, about 5%. I've not tested them rigorously enough to confirm that.

两者似乎都有轻微的速度提升,大约5%。我还没有对它们进行足够严格的测试来证实这一点。

Are there any techniques that will speed up this file write operation, or relevant guides to best practice for Java IO write performance?

有什么技术可以加速这个文件的写操作,或者是关于Java IO写性能最佳实践的相关指南?

5 个解决方案

#1


21  

I had a look at three options:

我看了三种选择:

  1. Using DataOutputStream;
  2. 使用DataOutputStream;
  3. Using ObjectOutputStream (for Serializable objects, which int[] is); and
  4. 使用ObjectOutputStream(用于可串行化对象,int[]为);和
  5. Using FileChannel.
  6. 使用FileChannel。

The results are

结果是

DataOutputStream wrote 1,000,000 ints in 3,159.716 ms
ObjectOutputStream wrote 1,000,000 ints in 295.602 ms
FileChannel wrote 1,000,000 ints in 110.094 ms

So the NIO version is the fastest. It also has the advantage of allowing edits, meaning you can easily change one int whereas the ObjectOutputStream would require reading the entire array, modifying it and writing it out to file.

所以NIO版本是最快的。它还具有允许编辑的优点,这意味着您可以轻松地更改一个int,而ObjectOutputStream则需要读取整个数组,修改它并将其写入文件。

Code follows:

代码如下:

private static final int NUM_INTS = 1000000;

interface IntWriter {
  void write(int[] ints);
}

public static void main(String[] args) {
  int[] ints = new int[NUM_INTS];
  Random r = new Random();
  for (int i=0; i<NUM_INTS; i++) {
    ints[i] = r.nextInt();
  }
  time("DataOutputStream", new IntWriter() {
    public void write(int[] ints) {
      storeDO(ints);
    }
  }, ints);
  time("ObjectOutputStream", new IntWriter() {
    public void write(int[] ints) {
      storeOO(ints);
    }
  }, ints);
  time("FileChannel", new IntWriter() {
    public void write(int[] ints) {
      storeFC(ints);
    }
  }, ints);
}

private static void time(String name, IntWriter writer, int[] ints) {
  long start = System.nanoTime();
  writer.write(ints);
  long end = System.nanoTime();
  double ms = (end - start) / 1000000d;
  System.out.printf("%s wrote %,d ints in %,.3f ms%n", name, ints.length, ms);
}

private static void storeOO(int[] ints) {
  ObjectOutputStream out = null;
  try {
    out = new ObjectOutputStream(new FileOutputStream("object.out"));
    out.writeObject(ints);
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void storeDO(int[] ints) {
  DataOutputStream out = null;
  try {
    out = new DataOutputStream(new FileOutputStream("data.out"));
    for (int anInt : ints) {
      out.write(anInt);
    }
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void storeFC(int[] ints) {
  FileOutputStream out = null;
  try {
    out = new FileOutputStream("fc.out");
    FileChannel file = out.getChannel();
    ByteBuffer buf = file.map(FileChannel.MapMode.READ_WRITE, 0, 4 * ints.length);
    for (int i : ints) {
      buf.putInt(i);
    }
    file.close();
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void safeClose(OutputStream out) {
  try {
    if (out != null) {
      out.close();
    }
  } catch (IOException e) {
    // do nothing
  }
}

#2


6  

I would use FileChannel from the nio package and ByteBuffer. This approach seems (on my computer) give 2 to 4 times better write performance:

我将使用来自nio包和ByteBuffer的FileChannel。这个方法似乎(在我的电脑上)给出了2到4倍的更好的写性能:

Output from program:

输出项目:

normal time: 2555
faster time: 765

This is the program:

这是这个项目:

public class Test {

    public static void main(String[] args) throws IOException {

        // create a test buffer
        ByteBuffer buffer = createBuffer();

        long start = System.currentTimeMillis();
        {
            // do the first test (the normal way of writing files)
            normalToFile(new File("first"), buffer.asIntBuffer());
        }
        long middle = System.currentTimeMillis(); 
        {
            // use the faster nio stuff
            fasterToFile(new File("second"), buffer);
        }
        long done = System.currentTimeMillis();

        // print the result
        System.out.println("normal time: " + (middle - start));
        System.out.println("faster time: " + (done - middle));
    }

    private static void fasterToFile(File file, ByteBuffer buffer) 
    throws IOException {

        FileChannel fc = null;

        try {

            fc = new FileOutputStream(file).getChannel();
            fc.write(buffer);

        } finally {

            if (fc != null)
                fc.close();

            buffer.rewind();
        }
    }

    private static void normalToFile(File file, IntBuffer buffer) 
    throws IOException {

        DataOutputStream writer = null;

        try {
            writer = 
                new DataOutputStream(new BufferedOutputStream(
                        new FileOutputStream(file)));

            while (buffer.hasRemaining())
                writer.writeInt(buffer.get());

        } finally {
            if (writer != null)
                writer.close();

            buffer.rewind();
        }
    }

    private static ByteBuffer createBuffer() {
        ByteBuffer buffer = ByteBuffer.allocate(4 * 25000000);
        Random r = new Random(1);

        while (buffer.hasRemaining()) 
            buffer.putInt(r.nextInt());

        buffer.rewind();

        return buffer;
    }
}

#3


3  

I think you should consider using file channels (the java.nio library) instead of plain streams (java.io). A good starting point is this interesting discussion: Java NIO FileChannel versus FileOutputstream performance / usefulness

我认为您应该考虑使用文件通道(java)。nio库)而不是普通的流(java.io)。一个好的起点是这个有趣的讨论:Java NIO FileChannel与FileOutputstream性能/有用性。

and the relevant comments below.

以及下面的相关评论。

Cheers!

干杯!

#4


3  

The main improvement you can have for writing int[] is to either;

你对写作的主要改进是;

  • increase the buffer size. The size is right for most stream, but file access can be faster with a larger buffer. This could yield a 10-20% improvement.

    增加缓冲区大小。大多数流的大小都是合适的,但是使用较大的缓冲区可以更快地访问文件。这将带来10-20%的改善。

  • Use NIO and a direct buffer. This allows you to write 32-bit values without converting to bytes. This may yield a 5% improvement.

    使用NIO和直接缓冲区。这允许您在不转换为字节的情况下编写32位值。这可能会带来5%的改善。

BTW: You should be able to write at least 10 million int values per second. With disk caching you increase this to 200 million per second.

顺便说一句:你应该能够每秒写至少1000万个int值。使用磁盘缓存,您可以将其增加到每秒2亿次。

#5


0  

Array is Serializable - can't you just use writer.writeObject(data);? That's definitely going to be faster than individual writeInt calls.

数组是可序列化的——难道你不能使用writer.writeObject(数据)吗?这肯定比单独的writeInt调用要快。

If you have other requirements on the output data format than retrieval into int[], that's a different question.

如果您对输出数据格式有其他要求,而不是对int[]进行检索,那就另当别论了。

#1


21  

I had a look at three options:

我看了三种选择:

  1. Using DataOutputStream;
  2. 使用DataOutputStream;
  3. Using ObjectOutputStream (for Serializable objects, which int[] is); and
  4. 使用ObjectOutputStream(用于可串行化对象,int[]为);和
  5. Using FileChannel.
  6. 使用FileChannel。

The results are

结果是

DataOutputStream wrote 1,000,000 ints in 3,159.716 ms
ObjectOutputStream wrote 1,000,000 ints in 295.602 ms
FileChannel wrote 1,000,000 ints in 110.094 ms

So the NIO version is the fastest. It also has the advantage of allowing edits, meaning you can easily change one int whereas the ObjectOutputStream would require reading the entire array, modifying it and writing it out to file.

所以NIO版本是最快的。它还具有允许编辑的优点,这意味着您可以轻松地更改一个int,而ObjectOutputStream则需要读取整个数组,修改它并将其写入文件。

Code follows:

代码如下:

private static final int NUM_INTS = 1000000;

interface IntWriter {
  void write(int[] ints);
}

public static void main(String[] args) {
  int[] ints = new int[NUM_INTS];
  Random r = new Random();
  for (int i=0; i<NUM_INTS; i++) {
    ints[i] = r.nextInt();
  }
  time("DataOutputStream", new IntWriter() {
    public void write(int[] ints) {
      storeDO(ints);
    }
  }, ints);
  time("ObjectOutputStream", new IntWriter() {
    public void write(int[] ints) {
      storeOO(ints);
    }
  }, ints);
  time("FileChannel", new IntWriter() {
    public void write(int[] ints) {
      storeFC(ints);
    }
  }, ints);
}

private static void time(String name, IntWriter writer, int[] ints) {
  long start = System.nanoTime();
  writer.write(ints);
  long end = System.nanoTime();
  double ms = (end - start) / 1000000d;
  System.out.printf("%s wrote %,d ints in %,.3f ms%n", name, ints.length, ms);
}

private static void storeOO(int[] ints) {
  ObjectOutputStream out = null;
  try {
    out = new ObjectOutputStream(new FileOutputStream("object.out"));
    out.writeObject(ints);
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void storeDO(int[] ints) {
  DataOutputStream out = null;
  try {
    out = new DataOutputStream(new FileOutputStream("data.out"));
    for (int anInt : ints) {
      out.write(anInt);
    }
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void storeFC(int[] ints) {
  FileOutputStream out = null;
  try {
    out = new FileOutputStream("fc.out");
    FileChannel file = out.getChannel();
    ByteBuffer buf = file.map(FileChannel.MapMode.READ_WRITE, 0, 4 * ints.length);
    for (int i : ints) {
      buf.putInt(i);
    }
    file.close();
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    safeClose(out);
  }
}

private static void safeClose(OutputStream out) {
  try {
    if (out != null) {
      out.close();
    }
  } catch (IOException e) {
    // do nothing
  }
}

#2


6  

I would use FileChannel from the nio package and ByteBuffer. This approach seems (on my computer) give 2 to 4 times better write performance:

我将使用来自nio包和ByteBuffer的FileChannel。这个方法似乎(在我的电脑上)给出了2到4倍的更好的写性能:

Output from program:

输出项目:

normal time: 2555
faster time: 765

This is the program:

这是这个项目:

public class Test {

    public static void main(String[] args) throws IOException {

        // create a test buffer
        ByteBuffer buffer = createBuffer();

        long start = System.currentTimeMillis();
        {
            // do the first test (the normal way of writing files)
            normalToFile(new File("first"), buffer.asIntBuffer());
        }
        long middle = System.currentTimeMillis(); 
        {
            // use the faster nio stuff
            fasterToFile(new File("second"), buffer);
        }
        long done = System.currentTimeMillis();

        // print the result
        System.out.println("normal time: " + (middle - start));
        System.out.println("faster time: " + (done - middle));
    }

    private static void fasterToFile(File file, ByteBuffer buffer) 
    throws IOException {

        FileChannel fc = null;

        try {

            fc = new FileOutputStream(file).getChannel();
            fc.write(buffer);

        } finally {

            if (fc != null)
                fc.close();

            buffer.rewind();
        }
    }

    private static void normalToFile(File file, IntBuffer buffer) 
    throws IOException {

        DataOutputStream writer = null;

        try {
            writer = 
                new DataOutputStream(new BufferedOutputStream(
                        new FileOutputStream(file)));

            while (buffer.hasRemaining())
                writer.writeInt(buffer.get());

        } finally {
            if (writer != null)
                writer.close();

            buffer.rewind();
        }
    }

    private static ByteBuffer createBuffer() {
        ByteBuffer buffer = ByteBuffer.allocate(4 * 25000000);
        Random r = new Random(1);

        while (buffer.hasRemaining()) 
            buffer.putInt(r.nextInt());

        buffer.rewind();

        return buffer;
    }
}

#3


3  

I think you should consider using file channels (the java.nio library) instead of plain streams (java.io). A good starting point is this interesting discussion: Java NIO FileChannel versus FileOutputstream performance / usefulness

我认为您应该考虑使用文件通道(java)。nio库)而不是普通的流(java.io)。一个好的起点是这个有趣的讨论:Java NIO FileChannel与FileOutputstream性能/有用性。

and the relevant comments below.

以及下面的相关评论。

Cheers!

干杯!

#4


3  

The main improvement you can have for writing int[] is to either;

你对写作的主要改进是;

  • increase the buffer size. The size is right for most stream, but file access can be faster with a larger buffer. This could yield a 10-20% improvement.

    增加缓冲区大小。大多数流的大小都是合适的,但是使用较大的缓冲区可以更快地访问文件。这将带来10-20%的改善。

  • Use NIO and a direct buffer. This allows you to write 32-bit values without converting to bytes. This may yield a 5% improvement.

    使用NIO和直接缓冲区。这允许您在不转换为字节的情况下编写32位值。这可能会带来5%的改善。

BTW: You should be able to write at least 10 million int values per second. With disk caching you increase this to 200 million per second.

顺便说一句:你应该能够每秒写至少1000万个int值。使用磁盘缓存,您可以将其增加到每秒2亿次。

#5


0  

Array is Serializable - can't you just use writer.writeObject(data);? That's definitely going to be faster than individual writeInt calls.

数组是可序列化的——难道你不能使用writer.writeObject(数据)吗?这肯定比单独的writeInt调用要快。

If you have other requirements on the output data format than retrieval into int[], that's a different question.

如果您对输出数据格式有其他要求,而不是对int[]进行检索,那就另当别论了。