比较C#中的两个文件[重复]

时间:2021-10-06 16:07:22

This question already has an answer here:

这个问题在这里已有答案:

I want to compare two files in C# and see if they are different. They have the same file names and they are the exact same size when different. I was just wondering if there is a fast way to do this without having to manually go in and read the file.

我想比较C#中的两个文件,看看它们是否不同。它们具有相同的文件名,并且在不同时它们的大小完全相同。我只是想知道是否有一种快速的方法来做到这一点,而无需手动进入并读取文件。

Thanks

谢谢

7 个解决方案

#1


26  

Depending on how far you're looking to take it, you can take a look at Diff.NET

根据你想要的程度,你可以看看Diff.NET

Here's a simple file comparison function:

这是一个简单的文件比较功能:

// This method accepts two strings the represent two files to 
// compare. A return value of 0 indicates that the contents of the files
// are the same. A return value of any other value indicates that the 
// files are not the same.
private bool FileCompare(string file1, string file2)
{
     int file1byte;
     int file2byte;
     FileStream fs1;
     FileStream fs2;

     // Determine if the same file was referenced two times.
     if (file1 == file2)
     {
          // Return true to indicate that the files are the same.
          return true;
     }

     // Open the two files.
     fs1 = new FileStream(file1, FileMode.Open, FileAccess.Read);
     fs2 = new FileStream(file2, FileMode.Open, FileAccess.Read);

     // Check the file sizes. If they are not the same, the files 
        // are not the same.
     if (fs1.Length != fs2.Length)
     {
          // Close the file
          fs1.Close();
          fs2.Close();

          // Return false to indicate files are different
          return false;
     }

     // Read and compare a byte from each file until either a
     // non-matching set of bytes is found or until the end of
     // file1 is reached.
     do 
     {
          // Read one byte from each file.
          file1byte = fs1.ReadByte();
          file2byte = fs2.ReadByte();
     }
     while ((file1byte == file2byte) && (file1byte != -1));

     // Close the files.
     fs1.Close();
     fs2.Close();

     // Return the success of the comparison. "file1byte" is 
     // equal to "file2byte" at this point only if the files are 
     // the same.
     return ((file1byte - file2byte) == 0);
}

#2


18  

I was just wondering if there is a fast way to do this without having to manually go in and read the file.

我只是想知道是否有一种快速的方法来做到这一点,而无需手动进入并读取文件。

Not really.

不是真的。

If the files came with hashes, you could compare the hashes, and if they are different you can conclude the files are different (same hashes, however, does not mean the files are the same and so you will still have to do a byte by byte comparison).

如果文件带有哈希值,你可以比较哈希值,如果它们不同,你可以得出结论文件是不同的(但是,相同的哈希值并不意味着文件是相同的,所以你仍然需要通过字节比较)。

However, hashes use all the bytes in the file, so no matter what, you at some point have to read the files byte for byte. And in fact, just a straight byte by byte comparison will be faster than computing a hash. This is because a hash reads all the bytes just like comparing byte-by-byte does, but hashes do some other computations that add time. Additionally, a byte-by-byte comparison can terminate early on the first pair of non-equal bytes.

但是,哈希使用文件中的所有字节,因此无论如何,您在某些时候都必须按字节读取文件。事实上,只需逐字节比较,就可以比计算哈希更快。这是因为哈希读取所有字节就像逐字节比较一样,但哈希做了一些其他计算,增加了时间。另外,逐字节比较可以在第一对非相等字节的早期终止。

Finally, you can not avoid the need for a byte-by-byte read. If the hashes are equal, that doesn't mean the files are equal. In this case you still have to compare byte-by-byte.

最后,您无法避免需要逐字节读取。如果哈希值相等,那并不意味着文件是相等的。在这种情况下,您仍然需要逐字节进行比较。

#3


4  

Well, I'm not sure if you can in the file write timestamps. If not, your unique alternative, is comparing the content of the files.

好吧,我不确定你是否可以在文件中写入时间戳。如果没有,您唯一的选择是比较文件的内容。

A simple approach is comparing the files byte-to-byte, but if you're going to compare a file several times with others, you can calculate the hashcode of the files and compare it.

一种简单的方法是逐字节比较文件,但如果要将文件与其他文件进行多次比较,则可以计算文件的哈希码并进行比较。

The following code snippet shows how you can do it:

以下代码段显示了如何执行此操作:

    public static string CalcHashCode(string filename)
    {
        FileStream stream = new FileStream(
            filename,
            System.IO.FileMode.Open,
            System.IO.FileAccess.Read,
            System.IO.FileShare.ReadWrite);

        try
        {
            return CalcHashCode(stream);
        }
        finally
        {
            stream.Close();
        }
    }

    public static string CalcHashCode(FileStream file)
    {
        MD5CryptoServiceProvider md5Provider = new MD5CryptoServiceProvider();
        Byte[] hash = md5Provider.ComputeHash(file);
        return Convert.ToBase64String(hash);
    }

If you're going to compare a file with others more that one time, you can save the file hash and compare it. For a single comparison, the byte-to-byte comparison is better. You need also to recompute hash when the file changes, but if you're going to do massive comparisons (more than one time), I recommend using the hash approach.

如果您要将文件与其他文件进行多次比较,则可以保存文件哈希值并进行比较。对于单个比较,字节到字节的比较更好。您还需要在文件更改时重新计算哈希,但如果您要进行大量比较(多次),我建议使用哈希方法。

#4


3  

If the filenames are the same, and the file sizes are the same, then, no, there is no way to know if they have different content without examining the content.

如果文件名相同,并且文件大小相同,那么,不,没有方法可以在不检查内容的情况下知道它们是否具有不同的内容。

#5


0  

Read the file into a stream, then hash the stream. That should give you a reliable result for comparing.

将文件读入流中,然后对流进行哈希处理。这应该给你一个可靠的比较结果。

byte[] fileHash1, fileHash2;

using (SHA256Managed sha = new SHA256Managed())
{
    fileHash1 = sha.ComputeHash(streamforfile1);
    fileHash2 = sha.ComputeHash(streamforfile2);
}

for (int i = 0; (i < fileHash1.Length) && (i < fileHash2.Length); i++)
    {
        if (fileHash[i] != fileHash2[i]) 
        { 
             //files are not the same
             break; 
        }
    }

#6


-2  

If they are not complied files then use a diff tool like KDiff or WinMerge. It will highlight were they are different.

如果它们不是编译文件,那么使用像KDiff或WinMerge这样的差异工具。它会突出显示它们是不同的。

http://kdiff3.sourceforge.net/

http://kdiff3.sourceforge.net/

http://winmerge.org/

http://winmerge.org/

#7


-2  

pass each file stream through an MD5 hasher and compare the hashes.

通过MD5哈希传递每个文件流并比较哈希值。

#1


26  

Depending on how far you're looking to take it, you can take a look at Diff.NET

根据你想要的程度,你可以看看Diff.NET

Here's a simple file comparison function:

这是一个简单的文件比较功能:

// This method accepts two strings the represent two files to 
// compare. A return value of 0 indicates that the contents of the files
// are the same. A return value of any other value indicates that the 
// files are not the same.
private bool FileCompare(string file1, string file2)
{
     int file1byte;
     int file2byte;
     FileStream fs1;
     FileStream fs2;

     // Determine if the same file was referenced two times.
     if (file1 == file2)
     {
          // Return true to indicate that the files are the same.
          return true;
     }

     // Open the two files.
     fs1 = new FileStream(file1, FileMode.Open, FileAccess.Read);
     fs2 = new FileStream(file2, FileMode.Open, FileAccess.Read);

     // Check the file sizes. If they are not the same, the files 
        // are not the same.
     if (fs1.Length != fs2.Length)
     {
          // Close the file
          fs1.Close();
          fs2.Close();

          // Return false to indicate files are different
          return false;
     }

     // Read and compare a byte from each file until either a
     // non-matching set of bytes is found or until the end of
     // file1 is reached.
     do 
     {
          // Read one byte from each file.
          file1byte = fs1.ReadByte();
          file2byte = fs2.ReadByte();
     }
     while ((file1byte == file2byte) && (file1byte != -1));

     // Close the files.
     fs1.Close();
     fs2.Close();

     // Return the success of the comparison. "file1byte" is 
     // equal to "file2byte" at this point only if the files are 
     // the same.
     return ((file1byte - file2byte) == 0);
}

#2


18  

I was just wondering if there is a fast way to do this without having to manually go in and read the file.

我只是想知道是否有一种快速的方法来做到这一点,而无需手动进入并读取文件。

Not really.

不是真的。

If the files came with hashes, you could compare the hashes, and if they are different you can conclude the files are different (same hashes, however, does not mean the files are the same and so you will still have to do a byte by byte comparison).

如果文件带有哈希值,你可以比较哈希值,如果它们不同,你可以得出结论文件是不同的(但是,相同的哈希值并不意味着文件是相同的,所以你仍然需要通过字节比较)。

However, hashes use all the bytes in the file, so no matter what, you at some point have to read the files byte for byte. And in fact, just a straight byte by byte comparison will be faster than computing a hash. This is because a hash reads all the bytes just like comparing byte-by-byte does, but hashes do some other computations that add time. Additionally, a byte-by-byte comparison can terminate early on the first pair of non-equal bytes.

但是,哈希使用文件中的所有字节,因此无论如何,您在某些时候都必须按字节读取文件。事实上,只需逐字节比较,就可以比计算哈希更快。这是因为哈希读取所有字节就像逐字节比较一样,但哈希做了一些其他计算,增加了时间。另外,逐字节比较可以在第一对非相等字节的早期终止。

Finally, you can not avoid the need for a byte-by-byte read. If the hashes are equal, that doesn't mean the files are equal. In this case you still have to compare byte-by-byte.

最后,您无法避免需要逐字节读取。如果哈希值相等,那并不意味着文件是相等的。在这种情况下,您仍然需要逐字节进行比较。

#3


4  

Well, I'm not sure if you can in the file write timestamps. If not, your unique alternative, is comparing the content of the files.

好吧,我不确定你是否可以在文件中写入时间戳。如果没有,您唯一的选择是比较文件的内容。

A simple approach is comparing the files byte-to-byte, but if you're going to compare a file several times with others, you can calculate the hashcode of the files and compare it.

一种简单的方法是逐字节比较文件,但如果要将文件与其他文件进行多次比较,则可以计算文件的哈希码并进行比较。

The following code snippet shows how you can do it:

以下代码段显示了如何执行此操作:

    public static string CalcHashCode(string filename)
    {
        FileStream stream = new FileStream(
            filename,
            System.IO.FileMode.Open,
            System.IO.FileAccess.Read,
            System.IO.FileShare.ReadWrite);

        try
        {
            return CalcHashCode(stream);
        }
        finally
        {
            stream.Close();
        }
    }

    public static string CalcHashCode(FileStream file)
    {
        MD5CryptoServiceProvider md5Provider = new MD5CryptoServiceProvider();
        Byte[] hash = md5Provider.ComputeHash(file);
        return Convert.ToBase64String(hash);
    }

If you're going to compare a file with others more that one time, you can save the file hash and compare it. For a single comparison, the byte-to-byte comparison is better. You need also to recompute hash when the file changes, but if you're going to do massive comparisons (more than one time), I recommend using the hash approach.

如果您要将文件与其他文件进行多次比较,则可以保存文件哈希值并进行比较。对于单个比较,字节到字节的比较更好。您还需要在文件更改时重新计算哈希,但如果您要进行大量比较(多次),我建议使用哈希方法。

#4


3  

If the filenames are the same, and the file sizes are the same, then, no, there is no way to know if they have different content without examining the content.

如果文件名相同,并且文件大小相同,那么,不,没有方法可以在不检查内容的情况下知道它们是否具有不同的内容。

#5


0  

Read the file into a stream, then hash the stream. That should give you a reliable result for comparing.

将文件读入流中,然后对流进行哈希处理。这应该给你一个可靠的比较结果。

byte[] fileHash1, fileHash2;

using (SHA256Managed sha = new SHA256Managed())
{
    fileHash1 = sha.ComputeHash(streamforfile1);
    fileHash2 = sha.ComputeHash(streamforfile2);
}

for (int i = 0; (i < fileHash1.Length) && (i < fileHash2.Length); i++)
    {
        if (fileHash[i] != fileHash2[i]) 
        { 
             //files are not the same
             break; 
        }
    }

#6


-2  

If they are not complied files then use a diff tool like KDiff or WinMerge. It will highlight were they are different.

如果它们不是编译文件,那么使用像KDiff或WinMerge这样的差异工具。它会突出显示它们是不同的。

http://kdiff3.sourceforge.net/

http://kdiff3.sourceforge.net/

http://winmerge.org/

http://winmerge.org/

#7


-2  

pass each file stream through an MD5 hasher and compare the hashes.

通过MD5哈希传递每个文件流并比较哈希值。