如何以编程方式比较两个存档文件的内容?

时间:2021-11-12 23:04:03

I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.

我正在做一些测试,以确保我使用脚本文件创建的一个zip文件将生成与我必须手动点击并通过Web界面创建的几个zip文件的内容相同的输出。因此zip将具有不同的文件夹结构。

Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?

当然我可以手动提取它们并使用我强大的眼球技术扫描它们甚至更懒,我可以写一个脚本来做到这一点,但在我投入更多时间并被我的老板指责为公司时间抢劫,我问如果有更好的方法来做到这一点?

I'm using perl LAMP stack by the way. thanks.

顺便说一句,我正在使用perl LAMP堆栈。谢谢。

4 个解决方案

#1


You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).

您可以使用perl的Archive :: ZIP或Python的zipfile来提取存档中文件的文件名,大小和CRC校验和。创建一个文件,其中包含按文件名排序的结果(忽略路径)。

For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).

对于较小的ZIP,请合并脚本的结果(cat list1 list2 list3 | sort)。

Now, you can use diff to compare the results.

现在,您可以使用diff来比较结果。

#2


I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.

我可以全心全意地推荐Beyond Compare。除非你的薪水过低,否则这对你(老板)来说是最大的收获。

[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.

[编辑]我似乎已经扫描了不同的文件夹结构,抱歉.Beyond Compare可以比较具有相同文件夹结构的文件夹中的所有文件。它没有(我相信)在不同文件夹中的文件中搜索匹配的智能。

Regards,
Lieven

#3


Create a crc checksum for your files.

为您的文件创建crc校验和。

If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.

如果原始文件和解压缩文件的校验和相同,则可以确保文件相同。甚至适用于非文本数据。

A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).

可以使用外部程序(例如“SFV Checker”)或以编程方式(.net / java,例如包括库来执行此操作)轻松创建校验和。

#4


Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm

从卡拉的回答中得到启示...如果A.zip是你的单个大档案,而B.zip是通过网络生成的档案,那么使用以下算法

  1. Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)

    从A.zip中提取所有文件并递归(wrt文件夹)计算文件夹中存在的文件的校验和(使用cksum,md5sum等),其中提取内容并在对其进行排序(通过排序管道)后保存此信息文件(比如A.txt)

  2. Do the same for B.zip and generate B.txt

    对B.zip执行相同操作并生成B.txt

  3. Compare A.txt with B.txt they should be exactly the same.

    将A.txt与B.txt进行比较,它们应完全相同。

OR

Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.

使用unzip -l获取(zip)存档的文件/目录列表,然后展平用户生成的zip文件的层次结构,并使用diff之类的内容与脚本生成的zip文件的内容进行比较。通过展平层次结构,我的意思是你可能需要在一个或两个列表上进行某种预先处理,然后才能与diff进行有意义的比较。

#1


You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).

您可以使用perl的Archive :: ZIP或Python的zipfile来提取存档中文件的文件名,大小和CRC校验和。创建一个文件,其中包含按文件名排序的结果(忽略路径)。

For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).

对于较小的ZIP,请合并脚本的结果(cat list1 list2 list3 | sort)。

Now, you can use diff to compare the results.

现在,您可以使用diff来比较结果。

#2


I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.

我可以全心全意地推荐Beyond Compare。除非你的薪水过低,否则这对你(老板)来说是最大的收获。

[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.

[编辑]我似乎已经扫描了不同的文件夹结构,抱歉.Beyond Compare可以比较具有相同文件夹结构的文件夹中的所有文件。它没有(我相信)在不同文件夹中的文件中搜索匹配的智能。

Regards,
Lieven

#3


Create a crc checksum for your files.

为您的文件创建crc校验和。

If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.

如果原始文件和解压缩文件的校验和相同,则可以确保文件相同。甚至适用于非文本数据。

A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).

可以使用外部程序(例如“SFV Checker”)或以编程方式(.net / java,例如包括库来执行此操作)轻松创建校验和。

#4


Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm

从卡拉的回答中得到启示...如果A.zip是你的单个大档案,而B.zip是通过网络生成的档案,那么使用以下算法

  1. Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)

    从A.zip中提取所有文件并递归(wrt文件夹)计算文件夹中存在的文件的校验和(使用cksum,md5sum等),其中提取内容并在对其进行排序(通过排序管道)后保存此信息文件(比如A.txt)

  2. Do the same for B.zip and generate B.txt

    对B.zip执行相同操作并生成B.txt

  3. Compare A.txt with B.txt they should be exactly the same.

    将A.txt与B.txt进行比较,它们应完全相同。

OR

Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.

使用unzip -l获取(zip)存档的文件/目录列表,然后展平用户生成的zip文件的层次结构,并使用diff之类的内容与脚本生成的zip文件的内容进行比较。通过展平层次结构,我的意思是你可能需要在一个或两个列表上进行某种预先处理,然后才能与diff进行有意义的比较。