Java大文件磁盘IO性能

时间:2022-12-07 04:06:52

I have two (2GB each) files on my harddisk and want to compare them with each other:

我的硬盘上有两个(每个2GB)文件,想要将它们相互比较:

  • Copying the original files with Windows explorer takes approx. 2-4 minutes (that is reading and writing - on the same physical and logical disk).
  • 使用Windows资源管理器复制原始文件大约需要。 2-4分钟(即读写 - 在同一物理和逻辑磁盘上)。

  • Reading with java.io.FileInputStream twice and comparing the byte arrays on a byte per byte basis takes 20+ minutes.
  • 使用java.io.FileInputStream读取两次并在每个字节的字节上比较字节数组需要20多分钟。

  • java.io.BufferedInputStream buffer is 64kb, the files are read in chunks and then compared.
  • java.io.BufferedInputStream缓冲区为64kb,文件以块的形式读取然后进行比较。

  • Comparison is done is a tight loop like

    比较完成是一个紧凑的循环之类

    int numRead = Math.min(numRead[0], numRead[1]);
    for (int k = 0; k < numRead; k++)
    {
       if (buffer[1][k] != buffer[0][k])
       {
          return buffer[0][k] - buffer[1][k];
       }
    }
    

What can I do to speed this up? Is NIO supposed to be faster then plain streams? Is Java unable to use DMA/SATA technologies and does some slow OS-API calls instead?

我该怎么做才能加快速度呢? NIO应该比普通的流更快吗? Java无法使用DMA / SATA技术,而是执行一些缓慢的OS-API调用吗?

EDIT:
Thanks for the answers. I did some experiments based on them. As Andreas showed

编辑:谢谢你的答案。我做了一些基于它们的实验。正如安德烈亚斯所示

streams or nio approaches do not differ much.
More important is the correct buffer size.

流或nio方法没有太大差别。更重要的是正确的缓冲区大小。

This is confirmed by my own experiments. As the files are read in big chunks, even additional buffers (BufferedInputStream) do not give anything. Optimising the comparison is possible and I got the best results with 32-fold unrolling, but the time spend in comparison is small compared to disk read, so the speedup is small. Looks like there is nothing I can do ;-(

我的实验证实了这一点。由于文件是以大块读取的,因此即使是额外的缓冲区(BufferedInputStream)也不会提供任何内容。优化比较是可能的,并且我通过32次展开获得了最佳结果,但与磁盘读取相比,花费的时间比较小,因此加速很小。看起来我无能为力;-(

10 个解决方案

#1


I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。第一种方法只使用两个缓冲输入流

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.

第二种方法使用一个线程池,它读入两个不同的线程并在第三个线程中进行比较。这会以高CPU利用率为代价获得略高的吞吐量。对于那些短期运行的任务,线程池的管理需要大量的开销。

the third approach uses nio, as posted by laginimaineb

第三种方法使用nio,由laginimaineb发布

as you can see, the general approach does not differ much. more important is the correct buffer size.

正如您所看到的,一般方法没有太大差异。更重要的是正确的缓冲区大小。

what is strange that i read 1 byte less using threads. i could not spot the error tough.

奇怪的是,我使用线程读取的字节数少了1个字节。我无法发现错误。

comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

the code used:

使用的代码:

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    @Before
    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";
    }

    @Test
    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            compareWithBufferSize(currentBufferSize);
            currentBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            compareWithBufferSizeFutures(myBufferSize);
            myBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
                printAfterEquals(myBufferSize);
            } else {
                Assert.fail("files were not equal");
            }

            myBufferSize *= 2;
        }

    }

    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
                        myBufferSize);
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
                        myBufferSize);
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = inputStream1.read(buff1)) != -1) {
            totalbytes += read1;
            int read2 = inputStream2.read(buff2);
            if (read1 != read2) {
                break;
            }
            if (!Arrays.equals(buff1, buff2)) {
                break;
            }
        }
        if (read1 == -1) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
                        myBufferSize);
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),
                        myBufferSize);

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Even);
            }
        };
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Even);
            }
        };
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Odd);
            }
        };
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Odd);
            }
        };
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
            }
        };
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);
            }
        };

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
                }
            }
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
                }
            }
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                }
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                }
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);
            }
        }
    }

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            }
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead = first.read(firstBuffer);
                totalbytes += firstRead;
                secondRead = seconde.read(secondBuffer);
                if (firstRead != secondRead) {
                    return false;
                }
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
                }
            }
            return true;
        } finally {
            if (first != null) {
                first.close();
            }
            if (seconde != null) {
                seconde.close();
            }
        }
    }

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        }
        first.rewind();
        second.rewind();
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
            }
        }
        return true;
    }

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");
    }
}

#2


With such large files, you are going to get MUCH better performance with java.nio.

有了这么大的文件,你可以用java.nio获得更好的性能。

Additionally, reading single bytes with java streams can be very slow. Using a byte array (2-6K elements from my own experiences, ymmv as it seems platform/application specific) will dramatically improve your read performance with streams.

此外,使用java流读取单个字节可能非常慢。使用字节数组(我自己的经验中的2-6K元素,ymmv,因为它看起来像平台/应用程序特定)将显着提高您使用流的读取性能。

#3


Reading and writing the files with Java can be just as fast. You can use FileChannels. As for comparing the files, obviously this will take a lot of time comparing byte to byte Here's an example using FileChannels and ByteBuffers (could be further optimized):

使用Java读取和写入文件也同样快。您可以使用FileChannels。至于比较文件,显然这需要花费大量时间来比较字节到字节这里是一个使用FileChannels和ByteBuffers的例子(可以进一步优化):

public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
    FileChannel firstIn = null, secondIn = null;
    try {
        firstIn = new FileInputStream(firstPath).getChannel();
        secondIn = new FileInputStream(secondPath).getChannel();
        if (firstIn.size() != secondIn.size())
            return false;
        ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        int firstRead, secondRead;
        while (firstIn.position() < firstIn.size()) {
            firstRead = firstIn.read(firstBuffer);
            secondRead = secondIn.read(secondBuffer);
            if (firstRead != secondRead)
                return false;
            if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
                return false;
        }
        return true;
    } finally {
        if (firstIn != null) firstIn.close();
        if (secondIn != null) firstIn.close();
    }
}

private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit())
        return false;
    if (length > first.limit())
        return false;
    first.rewind(); second.rewind();
    for (int i=0; i<length; i++)
        if (first.get() != second.get())
            return false;
    return true;
}

#4


After modifying your NIO compare function I get the following results.

修改NIO比较功能后,我得到以下结果。

I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 37 MB/s

注意:这意味着正在以37 MB / s的速率读取文件

Running the same thing on a faster drive

在更快的驱动器上运行相同的东西

I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 74.8 MB/s

注意:这意味着正在以74.8 MB / s的速率读取文件

private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit() || length > first.limit()) {
        return false;
    }
    first.rewind();
    second.rewind();
    int i;
    for (i = 0; i < length-7; i+=8) {
        if (first.getLong() != second.getLong()) {
            return false;
        }
    }
    for (; i < length; i++) {
        if (first.get() != second.get()) {
            return false;
        }
    }
    return true;
}

#5


The following is a good article on the relative merits of the different ways to read a file in java. May be of some use:

以下是关于在java中读取文件的不同方法的相对优点的好文章。可能有一些用处:

How to read files quickly

如何快速阅读文件

#6


You can have a look at Suns Article for I/O Tuning (altough already a bit dated), maybe you can find similarities between the examples there and your code. Also have a look at the java.nio package which contains faster I/O elements than java.io. Dr. Dobbs Journal has a quite nice article on high performance IO using java.nio.

你可以看看太阳的文章进行I / O调整(尽管已经有点过时),也许你可以找到那里的例子和你的代码之间的相似之处。还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士有一篇关于使用java.nio的高性能IO的相当不错的文章。

If so, there are further examples and tuning tips available there which should be able to help you to speed up your code.

如果是这样,那里还有其他示例和调优技巧,可以帮助您加快代码速度。

Furthermore the Arrays class has methods for comparing byte arrays build in, maybe these can also be used to make things faster and clear up your loop a bit.

此外,Arrays类具有比较内置字节数组的方法,也许这些方法也可以用来使事情更快并且稍微清理一下你的循环。

#7


For a better comparison try copying two files at once. A hard drive can read one file much more efficiently than reading two (as the head has to move back and forth to read) One way to reduce this is to use larger buffers, e.g. 16 MB. with ByteBuffer.

为了更好地进行比较,请尝试一次复制两个文件。硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为磁头必须来回移动才能读取)。减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。与ByteBuffer。

With ByteBuffer you can compare 8-bytes at a time by comparing long values with getLong()

使用ByteBuffer,您可以通过比较long值和getLong()一次比较8个字节

If your Java is efficient, most of the work is in the disk/OS for reading and writing so it shouldn't be much slower than using any other language (as the disk/OS is the bottleneck)

如果您的Java是高效的,那么大部分工作都在磁盘/操作系统中进行读写,因此它不应该比使用任何其他语言慢得多(因为磁盘/操作系统是瓶颈)

Don't assume Java is slow until you have determined its not a bug in your code.

在确定它不是代码中的错误之前,不要认为Java很慢。

#8


I found that a lot of the articles linked to in this post are really out dated (there is also some very insightful stuff too). There are some articles linked from 2001, and the information is questionable at best. Martin Thompson of mechanical sympathy wrote quite a bit about this in 2011. Please refer to what he wrote for background and theory of this.

我发现在这篇文章中链接的很多文章都是过时的(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。机械同情的Martin Thompson在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。

I have found that NIO or not NIO has very little to do with the performance. It is much more about the size of your output buffers (read byte array on that one). NIO is no magic make it go fast web scale sauce.

我发现NIO与NIO的性能关系不大。它更多地是关于输出缓冲区的大小(在那个上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。

I was able to take Martin's examples and use the 1.0 era OutputStream and make it scream. NIO is fast too, but the biggest indicator is just the size of the output buffer not whether or not you use NIO unless of course you are using a memory mapped NIO then it matters. :)

我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后重要。 :)

If you want up to date authoritative information on this, see Martin's blog:

如果您想了解最新的权威信息,请参阅Martin的博客:

http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

If you want to see how NIO does not make that much of a difference (as I was able to write examples using regular IO that were faster) see this:

如果你想看看NIO如何不会产生那么大的差别(因为我能够使用更快的常规IO编写示例),请参阅:

http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html

I have tested my assumption on new windows laptop with a fast hard disk, my macbook pro with SSD, an EC2 xlarge, and an EC2 4x large with maxed out IOPS/high speed I/O (and soon on an large disk NAS fibre disk array) so it works (there are some issues with it for smaller EC2 instances but if you care about performance... are you going to use a small EC2 instance?). If you use real hardware, in my tests so far, traditional IO always wins. If you use high/IO EC2, then this is also a clear winner. If you use under powered EC2 instances, NIO can win.

我已经测试了我对带有快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘NAS光纤盘上)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能......你会使用一个小的EC2实例吗?)。如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。如果您使用高/ IO EC2,那么这也是一个明显的赢家。如果您在有源EC2实例下使用,NIO可以获胜。

There is no substitution for benchmarking.

基准测试没有替代品。

Anyway, I am no expert, I just did some empirical testing using the framework that Sir Martin Thompson wrote up in his blog post.

无论如何,我不是专家,我只是使用Martin Thompson爵士在他的博客文章中写的框架进行了一些实证测试。

I took this to the next step and used Files.newInputStream (from JDK 7) with TransferQueue to create a recipe for making Java I/O scream (even on small EC2 instances). The recipe can be found at the bottom of this documentation for Boon (https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder). This allows me to use a traditional OutputStream but with something that works well on smaller EC2 instances. (I am the main author of Boon. But I am accepting new authors. The pay sucks. 0$ per hour. But the good news is, I can double your pay whenever you like.)

我把它带到了下一步,并使用带有TransferQueue的Files.newInputStream(来自JDK 7)来创建用于发出Java I / O尖叫的配方(即使在小EC2实例上)。该配方可以在本文档底部找到Boon(https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder)。这允许我使用传统的OutputStream,但在较小的EC2实例上运行良好。 (我是Boon的主要作者。但是我接受新作者。薪水很糟糕。每小时0美元。但好消息是,我可以随时加倍你的工资。)

My 2 cents.

我的2美分。

See this to see why TransferQueue is important. http://php.sabscape.com/blog/?p=557

看看这个,看看为什么TransferQueue很重要。 http://php.sabscape.com/blog/?p=557

Key learnings:

  1. If you care about performance never, ever, ever use BufferedOutputStream.
  2. 如果您关心性能永远不会使用BufferedOutputStream。

  3. NIO does not always equal performance.
  4. NIO并不总是与性能相等。

  5. Buffer size matters most.
  6. 缓冲区大小最重要。

  7. Recycling buffers for high-speed writes is critical.
  8. 用于高速写入的循环缓冲区至关重要。

  9. GC can/will/does implode your performance for high-speed writes.
  10. GC可以/将/确实会破坏您的高速写入性能。

  11. You have to have some mechanism to reuse spent buffers.
  12. 您必须有一些机制来重用已用完的缓冲区。

#9


DMA/SATA are hardware/low-level techlonogies and aren't visible to any programming language whatsoever.

DMA / SATA是硬件/低级技术,任何编程语言都不可见。

For memory mapped input/output you should use java.nio, I believe.

对于内存映射输入/输出,你应该使用java.nio,我相信。

Are you sure that you aren't reading those files by one byte? That would be wasteful, I'd recommend doing it block-by-block, and each block should be something like 64 megabytes to minimize seeking.

你确定你没有按一个字节读取这些文件吗?这将是浪费,我建议逐块进行,每个块应该像64兆字节,以尽量减少搜索。

#10


Try setting the buffer on the input stream up to several megabytes.

尝试将输入流上的缓冲区设置为几兆字节。

#1


I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。第一种方法只使用两个缓冲输入流

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.

第二种方法使用一个线程池,它读入两个不同的线程并在第三个线程中进行比较。这会以高CPU利用率为代价获得略高的吞吐量。对于那些短期运行的任务,线程池的管理需要大量的开销。

the third approach uses nio, as posted by laginimaineb

第三种方法使用nio,由laginimaineb发布

as you can see, the general approach does not differ much. more important is the correct buffer size.

正如您所看到的,一般方法没有太大差异。更重要的是正确的缓冲区大小。

what is strange that i read 1 byte less using threads. i could not spot the error tough.

奇怪的是,我使用线程读取的字节数少了1个字节。我无法发现错误。

comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

the code used:

使用的代码:

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    @Before
    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";
    }

    @Test
    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            compareWithBufferSize(currentBufferSize);
            currentBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            compareWithBufferSizeFutures(myBufferSize);
            myBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
                printAfterEquals(myBufferSize);
            } else {
                Assert.fail("files were not equal");
            }

            myBufferSize *= 2;
        }

    }

    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
                        myBufferSize);
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
                        myBufferSize);
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = inputStream1.read(buff1)) != -1) {
            totalbytes += read1;
            int read2 = inputStream2.read(buff2);
            if (read1 != read2) {
                break;
            }
            if (!Arrays.equals(buff1, buff2)) {
                break;
            }
        }
        if (read1 == -1) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
                        myBufferSize);
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),
                        myBufferSize);

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Even);
            }
        };
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Even);
            }
        };
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Odd);
            }
        };
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Odd);
            }
        };
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
            }
        };
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);
            }
        };

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
                }
            }
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
                }
            }
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                }
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                }
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);
            }
        }
    }

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            }
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead = first.read(firstBuffer);
                totalbytes += firstRead;
                secondRead = seconde.read(secondBuffer);
                if (firstRead != secondRead) {
                    return false;
                }
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
                }
            }
            return true;
        } finally {
            if (first != null) {
                first.close();
            }
            if (seconde != null) {
                seconde.close();
            }
        }
    }

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        }
        first.rewind();
        second.rewind();
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
            }
        }
        return true;
    }

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");
    }
}

#2


With such large files, you are going to get MUCH better performance with java.nio.

有了这么大的文件,你可以用java.nio获得更好的性能。

Additionally, reading single bytes with java streams can be very slow. Using a byte array (2-6K elements from my own experiences, ymmv as it seems platform/application specific) will dramatically improve your read performance with streams.

此外,使用java流读取单个字节可能非常慢。使用字节数组(我自己的经验中的2-6K元素,ymmv,因为它看起来像平台/应用程序特定)将显着提高您使用流的读取性能。

#3


Reading and writing the files with Java can be just as fast. You can use FileChannels. As for comparing the files, obviously this will take a lot of time comparing byte to byte Here's an example using FileChannels and ByteBuffers (could be further optimized):

使用Java读取和写入文件也同样快。您可以使用FileChannels。至于比较文件,显然这需要花费大量时间来比较字节到字节这里是一个使用FileChannels和ByteBuffers的例子(可以进一步优化):

public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
    FileChannel firstIn = null, secondIn = null;
    try {
        firstIn = new FileInputStream(firstPath).getChannel();
        secondIn = new FileInputStream(secondPath).getChannel();
        if (firstIn.size() != secondIn.size())
            return false;
        ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        int firstRead, secondRead;
        while (firstIn.position() < firstIn.size()) {
            firstRead = firstIn.read(firstBuffer);
            secondRead = secondIn.read(secondBuffer);
            if (firstRead != secondRead)
                return false;
            if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
                return false;
        }
        return true;
    } finally {
        if (firstIn != null) firstIn.close();
        if (secondIn != null) firstIn.close();
    }
}

private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit())
        return false;
    if (length > first.limit())
        return false;
    first.rewind(); second.rewind();
    for (int i=0; i<length; i++)
        if (first.get() != second.get())
            return false;
    return true;
}

#4


After modifying your NIO compare function I get the following results.

修改NIO比较功能后,我得到以下结果。

I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 37 MB/s

注意:这意味着正在以37 MB / s的速率读取文件

Running the same thing on a faster drive

在更快的驱动器上运行相同的东西

I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 74.8 MB/s

注意:这意味着正在以74.8 MB / s的速率读取文件

private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit() || length > first.limit()) {
        return false;
    }
    first.rewind();
    second.rewind();
    int i;
    for (i = 0; i < length-7; i+=8) {
        if (first.getLong() != second.getLong()) {
            return false;
        }
    }
    for (; i < length; i++) {
        if (first.get() != second.get()) {
            return false;
        }
    }
    return true;
}

#5


The following is a good article on the relative merits of the different ways to read a file in java. May be of some use:

以下是关于在java中读取文件的不同方法的相对优点的好文章。可能有一些用处:

How to read files quickly

如何快速阅读文件

#6


You can have a look at Suns Article for I/O Tuning (altough already a bit dated), maybe you can find similarities between the examples there and your code. Also have a look at the java.nio package which contains faster I/O elements than java.io. Dr. Dobbs Journal has a quite nice article on high performance IO using java.nio.

你可以看看太阳的文章进行I / O调整(尽管已经有点过时),也许你可以找到那里的例子和你的代码之间的相似之处。还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士有一篇关于使用java.nio的高性能IO的相当不错的文章。

If so, there are further examples and tuning tips available there which should be able to help you to speed up your code.

如果是这样,那里还有其他示例和调优技巧,可以帮助您加快代码速度。

Furthermore the Arrays class has methods for comparing byte arrays build in, maybe these can also be used to make things faster and clear up your loop a bit.

此外,Arrays类具有比较内置字节数组的方法,也许这些方法也可以用来使事情更快并且稍微清理一下你的循环。

#7


For a better comparison try copying two files at once. A hard drive can read one file much more efficiently than reading two (as the head has to move back and forth to read) One way to reduce this is to use larger buffers, e.g. 16 MB. with ByteBuffer.

为了更好地进行比较,请尝试一次复制两个文件。硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为磁头必须来回移动才能读取)。减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。与ByteBuffer。

With ByteBuffer you can compare 8-bytes at a time by comparing long values with getLong()

使用ByteBuffer,您可以通过比较long值和getLong()一次比较8个字节

If your Java is efficient, most of the work is in the disk/OS for reading and writing so it shouldn't be much slower than using any other language (as the disk/OS is the bottleneck)

如果您的Java是高效的,那么大部分工作都在磁盘/操作系统中进行读写,因此它不应该比使用任何其他语言慢得多(因为磁盘/操作系统是瓶颈)

Don't assume Java is slow until you have determined its not a bug in your code.

在确定它不是代码中的错误之前,不要认为Java很慢。

#8


I found that a lot of the articles linked to in this post are really out dated (there is also some very insightful stuff too). There are some articles linked from 2001, and the information is questionable at best. Martin Thompson of mechanical sympathy wrote quite a bit about this in 2011. Please refer to what he wrote for background and theory of this.

我发现在这篇文章中链接的很多文章都是过时的(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。机械同情的Martin Thompson在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。

I have found that NIO or not NIO has very little to do with the performance. It is much more about the size of your output buffers (read byte array on that one). NIO is no magic make it go fast web scale sauce.

我发现NIO与NIO的性能关系不大。它更多地是关于输出缓冲区的大小(在那个上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。

I was able to take Martin's examples and use the 1.0 era OutputStream and make it scream. NIO is fast too, but the biggest indicator is just the size of the output buffer not whether or not you use NIO unless of course you are using a memory mapped NIO then it matters. :)

我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后重要。 :)

If you want up to date authoritative information on this, see Martin's blog:

如果您想了解最新的权威信息,请参阅Martin的博客:

http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

If you want to see how NIO does not make that much of a difference (as I was able to write examples using regular IO that were faster) see this:

如果你想看看NIO如何不会产生那么大的差别(因为我能够使用更快的常规IO编写示例),请参阅:

http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html

I have tested my assumption on new windows laptop with a fast hard disk, my macbook pro with SSD, an EC2 xlarge, and an EC2 4x large with maxed out IOPS/high speed I/O (and soon on an large disk NAS fibre disk array) so it works (there are some issues with it for smaller EC2 instances but if you care about performance... are you going to use a small EC2 instance?). If you use real hardware, in my tests so far, traditional IO always wins. If you use high/IO EC2, then this is also a clear winner. If you use under powered EC2 instances, NIO can win.

我已经测试了我对带有快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘NAS光纤盘上)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能......你会使用一个小的EC2实例吗?)。如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。如果您使用高/ IO EC2,那么这也是一个明显的赢家。如果您在有源EC2实例下使用,NIO可以获胜。

There is no substitution for benchmarking.

基准测试没有替代品。

Anyway, I am no expert, I just did some empirical testing using the framework that Sir Martin Thompson wrote up in his blog post.

无论如何,我不是专家,我只是使用Martin Thompson爵士在他的博客文章中写的框架进行了一些实证测试。

I took this to the next step and used Files.newInputStream (from JDK 7) with TransferQueue to create a recipe for making Java I/O scream (even on small EC2 instances). The recipe can be found at the bottom of this documentation for Boon (https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder). This allows me to use a traditional OutputStream but with something that works well on smaller EC2 instances. (I am the main author of Boon. But I am accepting new authors. The pay sucks. 0$ per hour. But the good news is, I can double your pay whenever you like.)

我把它带到了下一步,并使用带有TransferQueue的Files.newInputStream(来自JDK 7)来创建用于发出Java I / O尖叫的配方(即使在小EC2实例上)。该配方可以在本文档底部找到Boon(https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder)。这允许我使用传统的OutputStream,但在较小的EC2实例上运行良好。 (我是Boon的主要作者。但是我接受新作者。薪水很糟糕。每小时0美元。但好消息是,我可以随时加倍你的工资。)

My 2 cents.

我的2美分。

See this to see why TransferQueue is important. http://php.sabscape.com/blog/?p=557

看看这个,看看为什么TransferQueue很重要。 http://php.sabscape.com/blog/?p=557

Key learnings:

  1. If you care about performance never, ever, ever use BufferedOutputStream.
  2. 如果您关心性能永远不会使用BufferedOutputStream。

  3. NIO does not always equal performance.
  4. NIO并不总是与性能相等。

  5. Buffer size matters most.
  6. 缓冲区大小最重要。

  7. Recycling buffers for high-speed writes is critical.
  8. 用于高速写入的循环缓冲区至关重要。

  9. GC can/will/does implode your performance for high-speed writes.
  10. GC可以/将/确实会破坏您的高速写入性能。

  11. You have to have some mechanism to reuse spent buffers.
  12. 您必须有一些机制来重用已用完的缓冲区。

#9


DMA/SATA are hardware/low-level techlonogies and aren't visible to any programming language whatsoever.

DMA / SATA是硬件/低级技术,任何编程语言都不可见。

For memory mapped input/output you should use java.nio, I believe.

对于内存映射输入/输出,你应该使用java.nio,我相信。

Are you sure that you aren't reading those files by one byte? That would be wasteful, I'd recommend doing it block-by-block, and each block should be something like 64 megabytes to minimize seeking.

你确定你没有按一个字节读取这些文件吗?这将是浪费,我建议逐块进行,每个块应该像64兆字节,以尽量减少搜索。

#10


Try setting the buffer on the input stream up to several megabytes.

尝试将输入流上的缓冲区设置为几兆字节。