I know that I can use cmp, diff, etc to compare two files, but what I am looking for is a utility that gives me percentage difference between two files.
我知道我可以使用cmp、diff等来比较两个文件,但是我要寻找的是一个实用程序,它可以提供两个文件之间的百分比差异。
if there is no such utility, any algorithm would do fine too. I have read about fuzzy programming, but I have not quite understand it.
如果没有这样的效用,任何算法都可以。我读过关于模糊编程的书,但是我不太理解它。
3 个解决方案
#1
27
You can use difflib.SequenceMatcher ratio method
您可以使用difflib。SequenceMatcher比率法
From the documentation:
从文档:
Return a measure of the sequences’ similarity as a float in the range [0, 1].
返回序列的相似性度量,作为范围内的浮点数[0,1]。
For example:
例如:
from difflib import SequenceMatcher
text1 = open(file1).read()
text2 = open(file2).read()
m = SequenceMatcher(None, text1, text2)
m.ratio()
#2
2
It looks like Linux has a utility called dwdiff that can give percentage differences by using the "-s" flag
看起来Linux有一个叫做dwdiff的实用程序,它可以通过使用“-s”标志来给出百分比差异
http://www.softpanorama.org/Utilities/diff_tools.shtml
http://www.softpanorama.org/Utilities/diff_tools.shtml
#3
0
Beyond Compare has very nice file difference statistics export to csv. Differences at line level are reported so it's nice to compare source code files.
除了比较有很好的文件差异统计导出到csv。行级的差异被报告,所以比较源代码文件是很好的。
#1
27
You can use difflib.SequenceMatcher ratio method
您可以使用difflib。SequenceMatcher比率法
From the documentation:
从文档:
Return a measure of the sequences’ similarity as a float in the range [0, 1].
返回序列的相似性度量,作为范围内的浮点数[0,1]。
For example:
例如:
from difflib import SequenceMatcher
text1 = open(file1).read()
text2 = open(file2).read()
m = SequenceMatcher(None, text1, text2)
m.ratio()
#2
2
It looks like Linux has a utility called dwdiff that can give percentage differences by using the "-s" flag
看起来Linux有一个叫做dwdiff的实用程序,它可以通过使用“-s”标志来给出百分比差异
http://www.softpanorama.org/Utilities/diff_tools.shtml
http://www.softpanorama.org/Utilities/diff_tools.shtml
#3
0
Beyond Compare has very nice file difference statistics export to csv. Differences at line level are reported so it's nice to compare source code files.
除了比较有很好的文件差异统计导出到csv。行级的差异被报告,所以比较源代码文件是很好的。