
时间:2022-12-07 04:06:52

I have two (2GB each) files on my harddisk and want to compare them with each other:


  • Copying the original files with Windows explorer takes approx. 2-4 minutes (that is reading and writing - on the same physical and logical disk).
  • 使用Windows资源管理器复制原始文件大约需要。 2-4分钟(即读写 - 在同一物理和逻辑磁盘上)。

  • Reading with twice and comparing the byte arrays on a byte per byte basis takes 20+ minutes.
  • 使用读取两次并在每个字节的字节上比较字节数组需要20多分钟。

  • buffer is 64kb, the files are read in chunks and then compared.

  • Comparison is done is a tight loop like


    int numRead = Math.min(numRead[0], numRead[1]);
    for (int k = 0; k < numRead; k++)
       if (buffer[1][k] != buffer[0][k])
          return buffer[0][k] - buffer[1][k];

What can I do to speed this up? Is NIO supposed to be faster then plain streams? Is Java unable to use DMA/SATA technologies and does some slow OS-API calls instead?

我该怎么做才能加快速度呢? NIO应该比普通的流更快吗? Java无法使用DMA / SATA技术,而是执行一些缓慢的OS-API调用吗?

Thanks for the answers. I did some experiments based on them. As Andreas showed


streams or nio approaches do not differ much.
More important is the correct buffer size.


This is confirmed by my own experiments. As the files are read in big chunks, even additional buffers (BufferedInputStream) do not give anything. Optimising the comparison is possible and I got the best results with 32-fold unrolling, but the time spend in comparison is small compared to disk read, so the speedup is small. Looks like there is nothing I can do ;-(


10 个解决方案


I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。第一种方法只使用两个缓冲输入流

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.


the third approach uses nio, as posted by laginimaineb


as you can see, the general approach does not differ much. more important is the correct buffer size.


what is strange that i read 1 byte less using threads. i could not spot the error tough.


comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

the code used:


import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";

    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            currentBufferSize *= 2;

    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            myBufferSize *= 2;

    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
            } else {
      "files were not equal");

            myBufferSize *= 2;


    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = != -1) {
            totalbytes += read1;
            int read2 =;
            if (read1 != read2) {
            if (!Arrays.equals(buff1, buff2)) {
        if (read1 == -1) {
        } else {
  "files were not equal");

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
        } else {
  "files were not equal");

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead =;
                totalbytes += firstRead;
                secondRead =;
                if (firstRead != secondRead) {
                    return false;
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
            return true;
        } finally {
            if (first != null) {
            if (seconde != null) {

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
        return true;

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");


With such large files, you are going to get MUCH better performance with java.nio.


Additionally, reading single bytes with java streams can be very slow. Using a byte array (2-6K elements from my own experiences, ymmv as it seems platform/application specific) will dramatically improve your read performance with streams.



Reading and writing the files with Java can be just as fast. You can use FileChannels. As for comparing the files, obviously this will take a lot of time comparing byte to byte Here's an example using FileChannels and ByteBuffers (could be further optimized):


public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
    FileChannel firstIn = null, secondIn = null;
    try {
        firstIn = new FileInputStream(firstPath).getChannel();
        secondIn = new FileInputStream(secondPath).getChannel();
        if (firstIn.size() != secondIn.size())
            return false;
        ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        int firstRead, secondRead;
        while (firstIn.position() < firstIn.size()) {
            firstRead =;
            secondRead =;
            if (firstRead != secondRead)
                return false;
            if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
                return false;
        return true;
    } finally {
        if (firstIn != null) firstIn.close();
        if (secondIn != null) firstIn.close();

private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit())
        return false;
    if (length > first.limit())
        return false;
    first.rewind(); second.rewind();
    for (int i=0; i<length; i++)
        if (first.get() != second.get())
            return false;
    return true;


After modifying your NIO compare function I get the following results.


I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 37 MB/s

注意:这意味着正在以37 MB / s的速率读取文件

Running the same thing on a faster drive


I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 74.8 MB/s

注意:这意味着正在以74.8 MB / s的速率读取文件

private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit() || length > first.limit()) {
        return false;
    int i;
    for (i = 0; i < length-7; i+=8) {
        if (first.getLong() != second.getLong()) {
            return false;
    for (; i < length; i++) {
        if (first.get() != second.get()) {
            return false;
    return true;


The following is a good article on the relative merits of the different ways to read a file in java. May be of some use:


How to read files quickly



You can have a look at Suns Article for I/O Tuning (altough already a bit dated), maybe you can find similarities between the examples there and your code. Also have a look at the java.nio package which contains faster I/O elements than Dr. Dobbs Journal has a quite nice article on high performance IO using java.nio.

你可以看看太阳的文章进行I / O调整(尽管已经有点过时),也许你可以找到那里的例子和你的代码之间的相似之处。还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士有一篇关于使用java.nio的高性能IO的相当不错的文章。

If so, there are further examples and tuning tips available there which should be able to help you to speed up your code.


Furthermore the Arrays class has methods for comparing byte arrays build in, maybe these can also be used to make things faster and clear up your loop a bit.



For a better comparison try copying two files at once. A hard drive can read one file much more efficiently than reading two (as the head has to move back and forth to read) One way to reduce this is to use larger buffers, e.g. 16 MB. with ByteBuffer.

为了更好地进行比较,请尝试一次复制两个文件。硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为磁头必须来回移动才能读取)。减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。与ByteBuffer。

With ByteBuffer you can compare 8-bytes at a time by comparing long values with getLong()


If your Java is efficient, most of the work is in the disk/OS for reading and writing so it shouldn't be much slower than using any other language (as the disk/OS is the bottleneck)


Don't assume Java is slow until you have determined its not a bug in your code.



I found that a lot of the articles linked to in this post are really out dated (there is also some very insightful stuff too). There are some articles linked from 2001, and the information is questionable at best. Martin Thompson of mechanical sympathy wrote quite a bit about this in 2011. Please refer to what he wrote for background and theory of this.

我发现在这篇文章中链接的很多文章都是过时的(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。机械同情的Martin Thompson在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。

I have found that NIO or not NIO has very little to do with the performance. It is much more about the size of your output buffers (read byte array on that one). NIO is no magic make it go fast web scale sauce.

我发现NIO与NIO的性能关系不大。它更多地是关于输出缓冲区的大小(在那个上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。

I was able to take Martin's examples and use the 1.0 era OutputStream and make it scream. NIO is fast too, but the biggest indicator is just the size of the output buffer not whether or not you use NIO unless of course you are using a memory mapped NIO then it matters. :)

我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后重要。 :)

If you want up to date authoritative information on this, see Martin's blog:


If you want to see how NIO does not make that much of a difference (as I was able to write examples using regular IO that were faster) see this:


I have tested my assumption on new windows laptop with a fast hard disk, my macbook pro with SSD, an EC2 xlarge, and an EC2 4x large with maxed out IOPS/high speed I/O (and soon on an large disk NAS fibre disk array) so it works (there are some issues with it for smaller EC2 instances but if you care about performance... are you going to use a small EC2 instance?). If you use real hardware, in my tests so far, traditional IO always wins. If you use high/IO EC2, then this is also a clear winner. If you use under powered EC2 instances, NIO can win.

我已经测试了我对带有快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘NAS光纤盘上)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能......你会使用一个小的EC2实例吗?)。如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。如果您使用高/ IO EC2,那么这也是一个明显的赢家。如果您在有源EC2实例下使用,NIO可以获胜。

There is no substitution for benchmarking.


Anyway, I am no expert, I just did some empirical testing using the framework that Sir Martin Thompson wrote up in his blog post.

无论如何,我不是专家,我只是使用Martin Thompson爵士在他的博客文章中写的框架进行了一些实证测试。

I took this to the next step and used Files.newInputStream (from JDK 7) with TransferQueue to create a recipe for making Java I/O scream (even on small EC2 instances). The recipe can be found at the bottom of this documentation for Boon ( This allows me to use a traditional OutputStream but with something that works well on smaller EC2 instances. (I am the main author of Boon. But I am accepting new authors. The pay sucks. 0$ per hour. But the good news is, I can double your pay whenever you like.)

我把它带到了下一步,并使用带有TransferQueue的Files.newInputStream(来自JDK 7)来创建用于发出Java I / O尖叫的配方(即使在小EC2实例上)。该配方可以在本文档底部找到Boon(。这允许我使用传统的OutputStream,但在较小的EC2实例上运行良好。 (我是Boon的主要作者。但是我接受新作者。薪水很糟糕。每小时0美元。但好消息是,我可以随时加倍你的工资。)

My 2 cents.


See this to see why TransferQueue is important.


Key learnings:

  1. If you care about performance never, ever, ever use BufferedOutputStream.
  2. 如果您关心性能永远不会使用BufferedOutputStream。

  3. NIO does not always equal performance.
  4. NIO并不总是与性能相等。

  5. Buffer size matters most.
  6. 缓冲区大小最重要。

  7. Recycling buffers for high-speed writes is critical.
  8. 用于高速写入的循环缓冲区至关重要。

  9. GC can/will/does implode your performance for high-speed writes.
  10. GC可以/将/确实会破坏您的高速写入性能。

  11. You have to have some mechanism to reuse spent buffers.
  12. 您必须有一些机制来重用已用完的缓冲区。


DMA/SATA are hardware/low-level techlonogies and aren't visible to any programming language whatsoever.

DMA / SATA是硬件/低级技术,任何编程语言都不可见。

For memory mapped input/output you should use java.nio, I believe.


Are you sure that you aren't reading those files by one byte? That would be wasteful, I'd recommend doing it block-by-block, and each block should be something like 64 megabytes to minimize seeking.



Try setting the buffer on the input stream up to several megabytes.



I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

我尝试了三种不同的方法来比较两个相同的3,8 gb文件,缓冲区大小介于8 kb和1 MB之间。第一种方法只使用两个缓冲输入流

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.


the third approach uses nio, as posted by laginimaineb


as you can see, the general approach does not differ much. more important is the correct buffer size.


what is strange that i read 1 byte less using threads. i could not spot the error tough.


comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

the code used:


import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";

    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            currentBufferSize *= 2;

    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            myBufferSize *= 2;

    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
            } else {
      "files were not equal");

            myBufferSize *= 2;


    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = != -1) {
            totalbytes += read1;
            int read2 =;
            if (read1 != read2) {
            if (!Arrays.equals(buff1, buff2)) {
        if (read1 == -1) {
        } else {
  "files were not equal");

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
        } else {
  "files were not equal");

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead =;
                totalbytes += firstRead;
                secondRead =;
                if (firstRead != secondRead) {
                    return false;
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
            return true;
        } finally {
            if (first != null) {
            if (seconde != null) {

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
        return true;

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");


With such large files, you are going to get MUCH better performance with java.nio.


Additionally, reading single bytes with java streams can be very slow. Using a byte array (2-6K elements from my own experiences, ymmv as it seems platform/application specific) will dramatically improve your read performance with streams.



Reading and writing the files with Java can be just as fast. You can use FileChannels. As for comparing the files, obviously this will take a lot of time comparing byte to byte Here's an example using FileChannels and ByteBuffers (could be further optimized):


public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
    FileChannel firstIn = null, secondIn = null;
    try {
        firstIn = new FileInputStream(firstPath).getChannel();
        secondIn = new FileInputStream(secondPath).getChannel();
        if (firstIn.size() != secondIn.size())
            return false;
        ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        int firstRead, secondRead;
        while (firstIn.position() < firstIn.size()) {
            firstRead =;
            secondRead =;
            if (firstRead != secondRead)
                return false;
            if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
                return false;
        return true;
    } finally {
        if (firstIn != null) firstIn.close();
        if (secondIn != null) firstIn.close();

private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit())
        return false;
    if (length > first.limit())
        return false;
    first.rewind(); second.rewind();
    for (int i=0; i<length; i++)
        if (first.get() != second.get())
            return false;
    return true;


After modifying your NIO compare function I get the following results.


I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 37 MB/s

注意:这意味着正在以37 MB / s的速率读取文件

Running the same thing on a faster drive


I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 74.8 MB/s

注意:这意味着正在以74.8 MB / s的速率读取文件

private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit() || length > first.limit()) {
        return false;
    int i;
    for (i = 0; i < length-7; i+=8) {
        if (first.getLong() != second.getLong()) {
            return false;
    for (; i < length; i++) {
        if (first.get() != second.get()) {
            return false;
    return true;


The following is a good article on the relative merits of the different ways to read a file in java. May be of some use:


How to read files quickly



You can have a look at Suns Article for I/O Tuning (altough already a bit dated), maybe you can find similarities between the examples there and your code. Also have a look at the java.nio package which contains faster I/O elements than Dr. Dobbs Journal has a quite nice article on high performance IO using java.nio.

你可以看看太阳的文章进行I / O调整(尽管已经有点过时),也许你可以找到那里的例子和你的代码之间的相似之处。还要看一下包含比java.io更快的I / O元素的java.nio包。 Dobbs Journal博士有一篇关于使用java.nio的高性能IO的相当不错的文章。

If so, there are further examples and tuning tips available there which should be able to help you to speed up your code.


Furthermore the Arrays class has methods for comparing byte arrays build in, maybe these can also be used to make things faster and clear up your loop a bit.



For a better comparison try copying two files at once. A hard drive can read one file much more efficiently than reading two (as the head has to move back and forth to read) One way to reduce this is to use larger buffers, e.g. 16 MB. with ByteBuffer.

为了更好地进行比较,请尝试一次复制两个文件。硬盘驱动器可以比读取两个文件更有效地读取一个文件(因为磁头必须来回移动才能读取)。减少这种情况的一种方法是使用更大的缓冲区,例如16 MB。与ByteBuffer。

With ByteBuffer you can compare 8-bytes at a time by comparing long values with getLong()


If your Java is efficient, most of the work is in the disk/OS for reading and writing so it shouldn't be much slower than using any other language (as the disk/OS is the bottleneck)


Don't assume Java is slow until you have determined its not a bug in your code.



I found that a lot of the articles linked to in this post are really out dated (there is also some very insightful stuff too). There are some articles linked from 2001, and the information is questionable at best. Martin Thompson of mechanical sympathy wrote quite a bit about this in 2011. Please refer to what he wrote for background and theory of this.

我发现在这篇文章中链接的很多文章都是过时的(也有一些非常有见地的东西)。 2001年有一些文章链接起来,信息充其量是有问题的。机械同情的Martin Thompson在2011年写了很多关于此的内容。请参考他为背景和理论撰写的内容。

I have found that NIO or not NIO has very little to do with the performance. It is much more about the size of your output buffers (read byte array on that one). NIO is no magic make it go fast web scale sauce.

我发现NIO与NIO的性能关系不大。它更多地是关于输出缓冲区的大小(在那个上读取字节数组)。 NIO没有魔力让它快速进行网络规模的酱油。

I was able to take Martin's examples and use the 1.0 era OutputStream and make it scream. NIO is fast too, but the biggest indicator is just the size of the output buffer not whether or not you use NIO unless of course you are using a memory mapped NIO then it matters. :)

我能够采用Martin的例子并使用1.0时代的OutputStream并使其尖叫。 NIO也很快,但最大的指标就是输出缓冲区的大小,不管你是否使用NIO,除非你当然使用内存映射的NIO然后重要。 :)

If you want up to date authoritative information on this, see Martin's blog:


If you want to see how NIO does not make that much of a difference (as I was able to write examples using regular IO that were faster) see this:


I have tested my assumption on new windows laptop with a fast hard disk, my macbook pro with SSD, an EC2 xlarge, and an EC2 4x large with maxed out IOPS/high speed I/O (and soon on an large disk NAS fibre disk array) so it works (there are some issues with it for smaller EC2 instances but if you care about performance... are you going to use a small EC2 instance?). If you use real hardware, in my tests so far, traditional IO always wins. If you use high/IO EC2, then this is also a clear winner. If you use under powered EC2 instances, NIO can win.

我已经测试了我对带有快速硬盘的新Windows笔记本电脑,带有SSD的macbook pro,EC2 xlarge和带有最大IOPS /高速I / O的EC2 4x大的假设(很快就在大磁盘NAS光纤盘上)因此它可以工作(对于较小的EC2实例存在一些问题但是如果你关心性能......你会使用一个小的EC2实例吗?)。如果你使用真正的硬件,在我的测试中到目前为止,传统的IO总是获胜。如果您使用高/ IO EC2,那么这也是一个明显的赢家。如果您在有源EC2实例下使用,NIO可以获胜。

There is no substitution for benchmarking.


Anyway, I am no expert, I just did some empirical testing using the framework that Sir Martin Thompson wrote up in his blog post.

无论如何,我不是专家,我只是使用Martin Thompson爵士在他的博客文章中写的框架进行了一些实证测试。

I took this to the next step and used Files.newInputStream (from JDK 7) with TransferQueue to create a recipe for making Java I/O scream (even on small EC2 instances). The recipe can be found at the bottom of this documentation for Boon ( This allows me to use a traditional OutputStream but with something that works well on smaller EC2 instances. (I am the main author of Boon. But I am accepting new authors. The pay sucks. 0$ per hour. But the good news is, I can double your pay whenever you like.)

我把它带到了下一步,并使用带有TransferQueue的Files.newInputStream(来自JDK 7)来创建用于发出Java I / O尖叫的配方(即使在小EC2实例上)。该配方可以在本文档底部找到Boon(。这允许我使用传统的OutputStream,但在较小的EC2实例上运行良好。 (我是Boon的主要作者。但是我接受新作者。薪水很糟糕。每小时0美元。但好消息是,我可以随时加倍你的工资。)

My 2 cents.


See this to see why TransferQueue is important.


Key learnings:

  1. If you care about performance never, ever, ever use BufferedOutputStream.
  2. 如果您关心性能永远不会使用BufferedOutputStream。

  3. NIO does not always equal performance.
  4. NIO并不总是与性能相等。

  5. Buffer size matters most.
  6. 缓冲区大小最重要。

  7. Recycling buffers for high-speed writes is critical.
  8. 用于高速写入的循环缓冲区至关重要。

  9. GC can/will/does implode your performance for high-speed writes.
  10. GC可以/将/确实会破坏您的高速写入性能。

  11. You have to have some mechanism to reuse spent buffers.
  12. 您必须有一些机制来重用已用完的缓冲区。


DMA/SATA are hardware/low-level techlonogies and aren't visible to any programming language whatsoever.

DMA / SATA是硬件/低级技术,任何编程语言都不可见。

For memory mapped input/output you should use java.nio, I believe.


Are you sure that you aren't reading those files by one byte? That would be wasteful, I'd recommend doing it block-by-block, and each block should be something like 64 megabytes to minimize seeking.



Try setting the buffer on the input stream up to several megabytes.
