在Python中,是否有一种简洁的方法来比较两个文本文件的内容是否相同?

时间:2021-12-13 22:51:16

I don't care what the differences are. I just want to know whether the contents are different.

我不在乎差异是什么。我只是想知道内容是否不同。

8 个解决方案

#1


53  

The low level way:

低级方式:

from __future__ import with_statement
with open(filename1) as f1:
   with open(filename2) as f2:
      if f1.read() == f2.read():
         ...

The high level way:

高层次的方式:

import filecmp
if filecmp.cmp(filename1, filename2, shallow=False):
   ...

#2


23  

If you're going for even basic efficiency, you probably want to check the file size first:

如果你想提高基本效率,你可能想先检查文件大小:

if os.path.getsize(filename1) == os.path.getsize(filename2):
  if open('filename1','r').read() == open('filename2','r').read():
    # Files are the same.

This saves you reading every line of two files that aren't even the same size, and thus can't be the same.

这样可以节省读取两个文件大小相同的每一行,因此不能相同。

(Even further than that, you could call out to a fast MD5sum of each file and compare those, but that's not "in Python", so I'll stop here.)

(更进一步,你可以调用每个文件的快速MD5sum并比较它们,但这不是“在Python中”,所以我会在这里停止。)

#3


8  

This is a functional-style file comparison function. It returns instantly False if the files have different sizes; otherwise, it reads in 4KiB block sizes and returns False instantly upon the first difference:

这是一个功能风格的文件比较功能。如果文件大小不同,它会立即返回False;否则,它读入4KiB块大小并在第一个差异时立即返回False:

from __future__ import with_statement
import os
import itertools, functools, operator

def filecmp(filename1, filename2):
    "Do the two files have exactly the same contents?"
    with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2:
        if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size:
            return False # different sizes ∴ not equal
        fp1_reader= functools.partial(fp1.read, 4096)
        fp2_reader= functools.partial(fp2.read, 4096)
        cmp_pairs= itertools.izip(iter(fp1_reader, ''), iter(fp2_reader, ''))
        inequalities= itertools.starmap(operator.ne, cmp_pairs)
        return not any(inequalities)

if __name__ == "__main__":
    import sys
    print filecmp(sys.argv[1], sys.argv[2])

Just a different take :)

只是一个不同的采取:)

#4


6  

Since I can't comment on the answers of others I'll write my own.

由于我不能评论别人的答案,我会写自己的。

If you use md5 you definitely must not just md5.update(f.read()) since you'll use too much memory.

如果你使用md5,你肯定不能只是md5.update(f.read()),因为你会使用太多的内存。

def get_file_md5(f, chunk_size=8192):
    h = hashlib.md5()
    while True:
        chunk = f.read(chunk_size)
        if not chunk:
            break
        h.update(chunk)
    return h.hexdigest()

#5


4  


f = open(filename1, "r").read()
f2 = open(filename2,"r").read()
print f == f2


#6


2  

For larger files you could compute a MD5 or SHA hash of the files.

对于较大的文件,您可以计算文件的MD5或SHA哈希值。

#7


2  

I would use a hash of the file's contents using MD5.

我会使用MD5来使用文件内容的哈希值。

import hashlib

def checksum(f):
    md5 = hashlib.md5()
    md5.update(open(f).read())
    return md5.hexdigest()

def is_contents_same(f1, f2):
    return checksum(f1) == checksum(f2)

if not is_contents_same('foo.txt', 'bar.txt'):
    print 'The contents are not the same!'

#8


1  

from __future__ import with_statement

filename1 = "G:\\test1.TXT"

filename2 = "G:\\test2.TXT"


with open(filename1) as f1:

   with open(filename2) as f2:

      file1list = f1.read().splitlines()

      file2list = f2.read().splitlines()

      list1length = len(file1list)

      list2length = len(file2list)

      if list1length == list2length:

          for index in range(len(file1list)):

              if file1list[index] == file2list[index]:

                   print file1list[index] + "==" + file2list[index]

              else:                  

                   print file1list[index] + "!=" + file2list[index]+" Not-Equel"

      else:

          print "difference inthe size of the file and number of lines"

#1


53  

The low level way:

低级方式:

from __future__ import with_statement
with open(filename1) as f1:
   with open(filename2) as f2:
      if f1.read() == f2.read():
         ...

The high level way:

高层次的方式:

import filecmp
if filecmp.cmp(filename1, filename2, shallow=False):
   ...

#2


23  

If you're going for even basic efficiency, you probably want to check the file size first:

如果你想提高基本效率,你可能想先检查文件大小:

if os.path.getsize(filename1) == os.path.getsize(filename2):
  if open('filename1','r').read() == open('filename2','r').read():
    # Files are the same.

This saves you reading every line of two files that aren't even the same size, and thus can't be the same.

这样可以节省读取两个文件大小相同的每一行,因此不能相同。

(Even further than that, you could call out to a fast MD5sum of each file and compare those, but that's not "in Python", so I'll stop here.)

(更进一步,你可以调用每个文件的快速MD5sum并比较它们,但这不是“在Python中”,所以我会在这里停止。)

#3


8  

This is a functional-style file comparison function. It returns instantly False if the files have different sizes; otherwise, it reads in 4KiB block sizes and returns False instantly upon the first difference:

这是一个功能风格的文件比较功能。如果文件大小不同,它会立即返回False;否则,它读入4KiB块大小并在第一个差异时立即返回False:

from __future__ import with_statement
import os
import itertools, functools, operator

def filecmp(filename1, filename2):
    "Do the two files have exactly the same contents?"
    with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2:
        if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size:
            return False # different sizes ∴ not equal
        fp1_reader= functools.partial(fp1.read, 4096)
        fp2_reader= functools.partial(fp2.read, 4096)
        cmp_pairs= itertools.izip(iter(fp1_reader, ''), iter(fp2_reader, ''))
        inequalities= itertools.starmap(operator.ne, cmp_pairs)
        return not any(inequalities)

if __name__ == "__main__":
    import sys
    print filecmp(sys.argv[1], sys.argv[2])

Just a different take :)

只是一个不同的采取:)

#4


6  

Since I can't comment on the answers of others I'll write my own.

由于我不能评论别人的答案,我会写自己的。

If you use md5 you definitely must not just md5.update(f.read()) since you'll use too much memory.

如果你使用md5,你肯定不能只是md5.update(f.read()),因为你会使用太多的内存。

def get_file_md5(f, chunk_size=8192):
    h = hashlib.md5()
    while True:
        chunk = f.read(chunk_size)
        if not chunk:
            break
        h.update(chunk)
    return h.hexdigest()

#5


4  


f = open(filename1, "r").read()
f2 = open(filename2,"r").read()
print f == f2


#6


2  

For larger files you could compute a MD5 or SHA hash of the files.

对于较大的文件,您可以计算文件的MD5或SHA哈希值。

#7


2  

I would use a hash of the file's contents using MD5.

我会使用MD5来使用文件内容的哈希值。

import hashlib

def checksum(f):
    md5 = hashlib.md5()
    md5.update(open(f).read())
    return md5.hexdigest()

def is_contents_same(f1, f2):
    return checksum(f1) == checksum(f2)

if not is_contents_same('foo.txt', 'bar.txt'):
    print 'The contents are not the same!'

#8


1  

from __future__ import with_statement

filename1 = "G:\\test1.TXT"

filename2 = "G:\\test2.TXT"


with open(filename1) as f1:

   with open(filename2) as f2:

      file1list = f1.read().splitlines()

      file2list = f2.read().splitlines()

      list1length = len(file1list)

      list2length = len(file2list)

      if list1length == list2length:

          for index in range(len(file1list)):

              if file1list[index] == file2list[index]:

                   print file1list[index] + "==" + file2list[index]

              else:                  

                   print file1list[index] + "!=" + file2list[index]+" Not-Equel"

      else:

          print "difference inthe size of the file and number of lines"