如何对两个文件进行diff并报告diff发生的部分?

时间:2022-09-04 11:08:18

I have two text files with several sections in them. Each section has has a header with the section name (grep can extract all the section names without extracting anything else from the file). How can I report the differences between the two files AND report the section that the difference occurs in? I would also need to be able to report added/missing sections. Ideally, identical sections would not be mentioned in the report at all.

我有两个文本文件,其中有几个部分。每个部分都有一个带有节名的头(grep可以提取所有的节名,而不需要从文件中提取任何其他内容)。如何报告两个文件之间的差异,以及如何报告出现差异的部分?我还需要能够报告添加/丢失的部分。理想情况下,报告中完全没有提到相同的部分。

2 个解决方案

#1


2  

Use diff's --show-function-line parameter :

使用diff -show-function-line参数:

diff -U 0 --show-function-line='^HEAD ' old-file new-file

It won't report the correct section if it apears only in the output file (for example if you added a new section at the end of the file, the added lines will appear as being in the last section of the old file).

如果只在输出文件中分配了正确的部分,它将不会报告正确的部分(例如,如果在文件末尾添加了一个新的部分,那么添加的行将显示为旧文件的最后一部分)。

The following script might help, though it's far from being a one-liner. It will print :

下面的脚本可能会有所帮助,尽管它远不是一行代码。它将打印:

  • sections from the old file that have deleted lines, prefixed with " -"
  • 已删除行、以“-”为前缀的旧文件的节
  • sections from the new file that have inserted lines, prefixed with " +"
  • 新文件中插入行的部分,前缀为“+”
  • deleted lines (including deleted section headers), prefixed with "+"
  • 已删除的行(包括已删除的节头),前缀为“+”
  • inserted lines (including new section headers), prefixed with "-"
  • 插入行(包括新的节头),前缀为“-”

Here is the script :

下面是剧本:

#!/bin/bash
# Usage : ./script old-file new-file
diff \
    --new-line-format='+%dn'$'\n' \
    --old-line-format='-%dn'$'\n' \
    --unchanged-line-format='' \
    $1 \
    $2 \
    | \
(
    lnumOld=0;
    lnumNew=0;
    header='NO HEADER'
    printheader=1
    while read lprint; do
        if [ "$((lprint))" -gt 0 ]; then
            sep='+'
            while [ $lnumNew -lt $lprint ]; do
                read line <&4
                if [ "${line#HEAD }" != "$line" ]; then
                    header="$sep$line"
                    printheader=1
                fi
                ((lnumNew++));
            done
        else
            sep='-'
            while [ $lnumOld -lt $((-1*$lprint)) ]; do
                read line <&3
                if [ "${line#HEAD }" != "$line" ]; then
                    header="$sep$line"
                    printheader=1
                fi
                ((lnumOld++));
            done
        fi
        if [ $printheader = 1 ]; then
            echo " $header"
            printheader=0
        fi
        echo "$sep$line";
    done) 3<"$1" 4<"$2"

#2


1  

If you introduce an artificial change in the headers, that will force them to show up in the diff. Not exactly what you want, but maybe that will give you an idea.

如果你在标题中引入一个人为的改变,这将迫使它们出现在diff中。

Assuming your regex for finding headers is ^HEAD:

假设你的正则表达式查找头^头:

sed -e 's/^HEAD/>HEAD/' file1.txt | diff -u - file2.txt

Edit: If you want the resulting diff to be a real diff, you can use sed to remove the HEAD difference lines.

编辑:如果您希望得到的差异是一个真正的差异,您可以使用sed来删除头部差异线。

sed -e 's/^HEAD/>HEAD/' file1.txt | diff -u - file2.txt | sed -e 's/^->HEAD/ HEAD/; /^+HEAD/D'

#1


2  

Use diff's --show-function-line parameter :

使用diff -show-function-line参数:

diff -U 0 --show-function-line='^HEAD ' old-file new-file

It won't report the correct section if it apears only in the output file (for example if you added a new section at the end of the file, the added lines will appear as being in the last section of the old file).

如果只在输出文件中分配了正确的部分,它将不会报告正确的部分(例如,如果在文件末尾添加了一个新的部分,那么添加的行将显示为旧文件的最后一部分)。

The following script might help, though it's far from being a one-liner. It will print :

下面的脚本可能会有所帮助,尽管它远不是一行代码。它将打印:

  • sections from the old file that have deleted lines, prefixed with " -"
  • 已删除行、以“-”为前缀的旧文件的节
  • sections from the new file that have inserted lines, prefixed with " +"
  • 新文件中插入行的部分,前缀为“+”
  • deleted lines (including deleted section headers), prefixed with "+"
  • 已删除的行(包括已删除的节头),前缀为“+”
  • inserted lines (including new section headers), prefixed with "-"
  • 插入行(包括新的节头),前缀为“-”

Here is the script :

下面是剧本:

#!/bin/bash
# Usage : ./script old-file new-file
diff \
    --new-line-format='+%dn'$'\n' \
    --old-line-format='-%dn'$'\n' \
    --unchanged-line-format='' \
    $1 \
    $2 \
    | \
(
    lnumOld=0;
    lnumNew=0;
    header='NO HEADER'
    printheader=1
    while read lprint; do
        if [ "$((lprint))" -gt 0 ]; then
            sep='+'
            while [ $lnumNew -lt $lprint ]; do
                read line <&4
                if [ "${line#HEAD }" != "$line" ]; then
                    header="$sep$line"
                    printheader=1
                fi
                ((lnumNew++));
            done
        else
            sep='-'
            while [ $lnumOld -lt $((-1*$lprint)) ]; do
                read line <&3
                if [ "${line#HEAD }" != "$line" ]; then
                    header="$sep$line"
                    printheader=1
                fi
                ((lnumOld++));
            done
        fi
        if [ $printheader = 1 ]; then
            echo " $header"
            printheader=0
        fi
        echo "$sep$line";
    done) 3<"$1" 4<"$2"

#2


1  

If you introduce an artificial change in the headers, that will force them to show up in the diff. Not exactly what you want, but maybe that will give you an idea.

如果你在标题中引入一个人为的改变,这将迫使它们出现在diff中。

Assuming your regex for finding headers is ^HEAD:

假设你的正则表达式查找头^头:

sed -e 's/^HEAD/>HEAD/' file1.txt | diff -u - file2.txt

Edit: If you want the resulting diff to be a real diff, you can use sed to remove the HEAD difference lines.

编辑:如果您希望得到的差异是一个真正的差异,您可以使用sed来删除头部差异线。

sed -e 's/^HEAD/>HEAD/' file1.txt | diff -u - file2.txt | sed -e 's/^->HEAD/ HEAD/; /^+HEAD/D'