两个目录中相同文件的Git Diff总是导致“重命名”

时间:2022-09-04 11:08:36

git diff --no-index --no-prefix --summary -U4000 directory1 directory2

git diff --no-index --no-prefix --summary -U4000 directory1 directory2

This works as expected in that it returns a diff of all the files between the two directories. Files that are added output as expected, files that are deleted also result in the expected diff output.

这可以正常工作,因为它返回两个目录之间所有文件的差异。按预期添加输出的文件,删除的文件也会导致预期的diff输出。

However because the diff takes into account the file path as part of the file name, files with the same name, in the two different directories, result in a diff output with the renamed flag instead of changed.

但是,因为diff将文件路径作为文件名的一部分考虑在内,所以在两个不同的目录中具有相同名称的文件会导致带有重命名标志的diff输出而不是更改。

  1. Is there a way to tell git to not take into account the full file path in the diff and only look at the file name, as if the files were originating from the same directory?

    有没有办法告诉git不要考虑差异中的完整文件路径,只看文件名,好像文件来自同一目录?

  2. Is there a way for git to actually know if a copy of the same file in a different directory was actually renamed? I don't see how, unless it has a way of comparing the files md5s somehow or something (probably a bad guess lol).

    有没有办法让git实际知道是否实际重命名了不同目录中的同一文件的副本?我不知道如何,除非它有办法比较文件md5s以某种方式或某事(可能是一个糟糕的猜测大声笑)。

  3. Would using branches instead of directories resolve this issue easily and if so what is the branch version of the command listed above?

    使用分支而不是目录很容易解决这个问题,如果是这样,上面列出的命令的分支版本是什么?

1 个解决方案

#1


3  

There are multiple questions here, whose answers intertwine. Let's start with rename and copy detection, then move on to branches.

这里有很多问题,其答案交织在一起。让我们从重命名和复制检测开始,然后转到分支。

Rename detection

However because the diff takes into account the file path as part of the file name, files with the same name, in the two different directories, result in a diff output with the renamed flag instead of changed.

但是,因为diff将文件路径作为文件名的一部分考虑在内,所以在两个不同的目录中具有相同名称的文件会导致带有重命名标志的diff输出而不是更改。

This is not quite right. (The text below is meant to address both your items 1 and 2.)

这不太对。 (以下文字旨在解决您的第1项和第2项。)

Although you are using --no-index (presumably, to make Git work on directories outside the repository), Git's diff code behaves the same way in all cases. In order to diff (compare) two files in two trees, Git must first determine file identity. That is, there are two sets of files: those in the "left side" or source tree (the first directory name), and those in the "right side" or destination tree (the second directory name). Some files on the left are the same file as some files on the right. Some files on the left are different files that have no corresponding right-side file, i.e., they have been deleted. Finally, some files on the right side are new, i.e., they have been created.

虽然你使用的是--no-index(可能是为了让Git在存储库之外的目录上工作),但Git的diff代码在所有情况下都表现相同。为了在两棵树中区分(比较)两个文件,Git必须首先确定文件标识。也就是说,有两组文件:“左侧”或源树(第一个目录名)中的文件,以及“右侧”或目标树中的文件(第二个目录名)。左侧的某些文件与右侧的某些文件是同一个文件。左边的一些文件是没有相应右侧文件的不同文件,即它们已被删除。最后,右侧的一些文件是新的,即它们已被创建。

Files that are "the same file" need not have the same path name. In this case, those files have been renamed.

“同一文件”的文件不必具有相同的路径名。在这种情况下,这些文件已被重命名。

Here's how it works in detail. Note that "full path name" is modified somewhat when using git diff --no-index dir1 dir2: the "full path name" is what is left after stripping off the dir1 and dir2 prefixes.

以下是它的详细工作原理。注意,当使用git diff --no-index dir1 dir2时,“完整路径名称”会有所修改:“完整路径名称”是剥离dir1和dir2前缀后剩余的内容。

When comparing the left and right side trees, files that have the same full path names are normally automatically considered "the same file". We place all these files into a queue of "files to be diffed", and none will show up as being renamed. Note the word "normally" here—we'll come back to this in a moment.

比较左侧和右侧树时,具有相同完整路径名的文件通常会自动被视为“同一文件”。我们将所有这些文件放入“要扩散的文件”的队列中,并且没有一个将显示为重命名。请注意这里的“正常”一词 - 我们马上回过头来看看。

This leaves us with two remaining lists of files:

这留下了两个剩余的文件列表:

  • paths that exist on the left, but not the right: source without destination
  • 存在于左侧但不是右侧的路径:没有目标的源
  • paths that exist on the right, but not the left: destination without source
  • 存在于右侧但不是左侧的路径:没有源的目标

Naïvely, we can simply declare that all of these source-side files have been deleted, and all of these destination files have been created. You can instruct git diff to behave this way: set the --no-renames flag to disable rename detection.

天真地,我们可以简单地声明所有这些源端文件都已被删除,并且所有这些目标文件都已创建。您可以指示git diff以这种方式运行:设置--no-renames标志以禁用重命名检测。

Or, Git can go on to use a smarter algorithm: set the --find-renames and/or -M <threshold> flag to do this. In Git versions 2.9 and later, rename detection is on by default.

或者,Git可以继续使用更智能的算法:设置--find-renames和/或-M 标志来执行此操作。在Git版本2.9及更高版本中,默认情况下启用重命名检测。

Now, how shall Git decide that a source file has the same identity as a destination file? They have different paths; which right-side file does a/b/c.txt on the left correspond to? It might be d/e/f.bin, or d/e/f.txt, or a/b/renamed.txt, and so on. The actual algorithm is relatively simple, and in the past did not take final name component into effect (I'm not sure if it does now, Git is constantly evolving):

现在,Git如何确定源文件与目标文件具有相同的标识?他们有不同的道路;哪个右侧文件左边的a / b / c.txt对应?它可能是d / e / f.bin,或者是d / e / f.txt,或者是/ b / renamed.txt,依此类推。实际算法相对简单,并且在过去没有使最终名称组件生效(我不确定它是否现在,Git不断发展):

  • If there are source and destination files whose contents match exactly, pair them. Because Git hashes contents, this comparison is very fast. We can compare left-side a/b/c.txt by its hash ID to every file on the right, simply by looking at all of their hash IDs. Therefore, we run through all source files first, finding destination files that match, putting the new pairs into the diff queue and pulling them out of the two lists.

    如果存在内容完全匹配的源文件和目标文件,请将它们配对。因为Git哈希内容,这种比较非常快。我们可以通过查看所有哈希ID,将左侧a / b / c.txt的哈希值与右侧的每个文件进行比较。因此,我们首先遍历所有源文件,找到匹配的目标文件,将新对放入diff队列并将它们从两个列表中拉出。

  • For all remaining source and destination files, run an efficient, but unsuitable for git diff output, algorithm to compute "file similarity". A source file that is at least <threshold> similar to some destination file causes a pairing, and that file-pair is removed. The default threshold is 50%: if you have enabled rename detection without choosing a particular threshold, two files that are still in the lists by this point, and are 50% similar, get paired.

    对于所有剩余的源文件和目标文件,运行一个有效但不适合git diff输出的算法来计算“文件相似性”。至少 类似于某个目标文件的源文件会导致配对,并删除该文件对。默认阈值为50%:如果您在未选择特定阈值的情况下启用了重命名检测,那么此时仍在列表中并且50%相似的两个文件将进行配对。

  • Any remaining files are either deleted or created.

    删除或创建任何剩余文件。

Now that we have found all pairings, git diff proceeds to diff the paired, same-identity files, and tells us that deleted files are deleted, and newly-created files are created. If the two path names for same-identity files differ, git diff says the file is renamed.

现在我们找到了所有配对,git diff继续对配对的同一身份文件进行区分,并告诉我们删除了已删除的文件,并创建了新创建的文件。如果同一标识文件的两个路径名不同,git diff表示该文件已重命名。

The arbitrary-file-pairing code is expensive (even though the same-name-gives-a-pair code is very cheap), so Git has a limit on how many names go into these pairing source and destination lists. That limit is configured through git config diff.renameLimit. The default has climbed over the years and is now several thousand files. You can set it to 0 (zero) to make Git use its own internal maximum at all times.

任意文件配对代码很昂贵(即使同名代码对代码非常便宜),因此Git对这些配对源和目标列表中有多少名称进行了限制。该限制是通过git config diff.renameLimit配置的。多年来,默认值已经攀升,目前已有数千个文件。您可以将其设置为0(零)以使Git始终使用其自己的内部最大值。

Breaking pairs

Above, I said that normally, files with the same name are paired automatically. This is usually the right thing to do, so it is Git's default. In some cases, however, the left-side file that is named a/b/c.txt is actually not related to the right-side file named a/b/c.txt, it's really related to the right-side a/doc/c.txt for instance. We can tell Git to break pairings of files that are "too different".

上面,我说通常,具有相同名称的文件会自动配对。这通常是正确的做法,因此它是Git的默认值。但是,在某些情况下,名为a / b / c.txt的左侧文件实际上与名为a / b / c.txt的右侧文件无关,它实际上与右侧文件相关a /例如doc / c.txt。我们可以告诉Git打破“太不同”的文件配对。

We saw the "similarity index" used above to form pairings of files. This same similarity index can be used to split files: -B20%/60%, for instance. The two numbers need not add up to 100% and you can actually omit either one, or both: there's a default value for each if you set -B mode.

我们看到上面使用的“相似性索引”形成了文件配对。可以使用相同的相似性索引来分割文件:例如,-B20%/ 60%。这两个数字不需要加起来达到100%,你实际上可以省略一个或两个:如果你设置-B模式,每个都有一个默认值。

The first number is the point at which a default-already-paired file can be put into the rename detection lists. With -B20%, if the files are 20% dis-similar (i.e., only 80% similar), the file goes into the "source for renames" list. If it never gets taken as a rename, it can re-pair with its automatic destination—but at this point, the second number, the one after the slash, takes effect.

第一个数字是可以将默认配对文件放入重命名检测列表的点。使用-B20%,如果文件不相似20%(即只有80%相似),则文件进入“重命名源”列表。如果它永远不会被视为重命名,它可以与其自动目标重新配对 - 但此时,第二个数字(斜杠之后的数字)生效。

The second number sets the point at which a pairing is definitely broken. With -B/70%, for instance, if the files are 70% dis-similar (i.e., only 30% similar), the pairing is broken. (Of course, if the file was taken away as a rename source, the pairing is already broken.)

第二个数字设定了配对肯定被打破的点。例如,如果文件是70%不相似(即,只有30%相似),则使用-B / 70%,配对被破坏。 (当然,如果文件被删除为重命名源,则配对已经破坏。)

Copy detection

Besides the usual pairing and rename detection, you can ask Git to find copies of source files. After running all the usual pairing code, including finding renames and breaking pairs, if you have specified -C, Git will look for "new" (i.e., unpaired) destination files that are actually copied from existing sources. There are two modes for this, depending on whether you specify -C twice or add --find-copies-harder: one considers only source files that are modified (that's the single -C case), and one that considers every source file (that's the two -C or --find-copies-harder case). Note that this "was a source file modified" means, in this case, that the source file is already in the paired queue—if not, it's not "modified" by definition—and its corresponding destination file has a different hash ID (again, this is a very low-cost test, which helps keep a single -C option cheap).

除了通常的配对和重命名检测之外,您还可以让Git查找源文件的副本。运行所有常用的配对代码(包括查找重命名和破坏对)后,如果指定了-C,Git将查找实际从现有源复制的“新”(即未配对)目标文件。有两种模式,取决于您是指定-C两次还是添加--find-copies-harder:一种只考虑被修改的源文件(即单个-C情况),另一种考虑每个源文件(这是两个-C或 - 难以复制的情况)。请注意,这个“被修改的源文件”意味着,在这种情况下,源文件已经在配对队列中 - 如果不是,它不是按定义“修改” - 并且其对应的目标文件具有不同的哈希ID(再次,这是一个非常低成本的测试,有助于保持单个-C选项便宜)。

Branches don't matter

Would using branches instead of directories resolve this issue easily and if so what is the branch version of the command listed above?

使用分支而不是目录很容易解决这个问题,如果是这样,上面列出的命令的分支版本是什么?

Branches make no difference here.

分店在这里没什么区别。

In Git, the term branch is ambiguous. See What exactly do we mean by "branch"? For git diff, though, a branch name simply resolves to a single commit, namely the tip commit of that branch.

在Git中,术语分支是模糊的。看看“分支”究竟是什么意思?但是对于git diff,分支名称只是解析为单个提交,即该分支的提示。

I like to draw Git's branches like this:

我喜欢这样绘制Git的分支:

...--o--o--o   <-- branch1
         \
          o--o--o   <-- branch2

The small round os each represent a commit. The two branch names are simply pointers, in Git: they point to one specific commit. The name branch1 points to the rightmost commit on the top line, and the name branch2 points to the rightmost commit on the bottom line.

小圆孔每个代表一个提交。在Git中,两个分支名称只是指针:它们指向一个特定的提交。名称branch1指向顶行最右边的提交,名称branch2指向最底层的最右边提交。

Each commit, in Git, points back to its parent or parents (most commits have just one parent, while a merge commit is simply a commit with two or more parents). This is what forms the chain of commits that we also call "a branch". The branch name points directly to the tip of a chain.1

Git中的每次提交都指向其父级或父级(大多数提交只有一个父级,而合并提交只是一个包含两个或更多父级的提交)。这就形成了我们也称之为“分支”的提交链。分支名称直接指向链的尖端

When you run:

当你运行:

$ git diff branch1 branch2

all that Git does is resolve each name to its corresponding commit. For instance, if branch1 names commit 1234567... and branch2 names commit 89abcde..., this just does the same thing as:

Git所做的就是将每个名称解析为相应的提交。例如,如果branch1名称提交1234567 ...而branch2名称提交89abcde ...,这只是做同样的事情:

$ git diff 1234567 89abcde

Git's diff takes two trees

Git does not even care that these are commits, really. Git just needs a left side or source tree, and a right side or destination tree. These two trees can come from a commit, because a commit names a tree: the tree of any commit is the source snapshot taken when you made that commit. They can come from a branch, because a branch-name names a commit, which names a tree. One of the trees can come from Git's "index" (aka "staging area" aka "cache"), as the index is basically a flattened tree.2 One of the trees can be your work-tree. One or both trees can even be outside of Git's control (hence the --no-index flag).

Git甚至不关心这些是提交,真的。 Git只需要左侧或源树,以及右侧或目标树。这两个树可以来自提交,因为提交命名树:任何提交的树是您进行提交时获取的源快照。它们可以来自分支,因为分支名称命名提交,命名树。其中一棵树可以来自Git的“索引”(又名“staging area”又名“cache”),因为索引基本上是一棵扁平的树.2其中一棵树可以是你的工作树。一个或两个树甚至可以在Git的控制之外(因此--no-index标志)。

Of course, Git can just diff two files

If you run git diff --no-index /path/to/file1 /path/to/file2, Git will simply diff the two files, i.e., treat them as a pair. This bypasses all the pairing and rename-detecting code entirely. If no amount of fiddling with --no-renames, --find-renames, --rename-threshold, etc., options does the trick, you can explicitly diff file paths, rather than directory (tree) paths. For a large set of files, this will, of course, be painful.

如果你运行git diff --no-index / path / to / file1 / path / to / file2,Git将简单地区分两个文件,即将它们视为一对。这完全绕过了所有配对和重命名检测代码。如果没有使用--no-renames, - find-renames, - rename-threshold等等的选项,那么你可以明确区分文件路径,而不是目录(树)路径。对于大量文件,这当然会很痛苦。


1There can be more commits past that point, but it's still the tip of its chain. Moreover, multiple names can point to a single commit. I draw this situation as:

1过去可能有更多的提交,但它仍然是其链条的一角。而且,多个名称可以指向单个提交。我将这种情况描述为:

...--o--o   <-- tip1
         \
          o--o   <-- tip2, tip3

Note that commits that are "behind" more than one branch name are, in fact, on all of those branches. So both bottom-row commits are on both tip2 and tip3 branches, while both top-row commits are on all three branches. Nonetheless, each branch name resolves to one, and only one, commit.

请注意,实际上,所有这些分支上的“后面”多个分支名称的提交。因此,两个底行提交都在tip2和tip3分支上,而两个顶行提交都在所有三个分支上。尽管如此,每个分支名称都解析为一个,并且只有一个提交。

2In fact, to make a new commit, Git simply converts the index, just as it stands right now, into a tree using git write-tree, and then makes a commit that names that tree (and that uses the current commit as its parent, and has an author and committer, and a commit message). The fact that Git uses the existing index is why you must git add your updated work-tree files into the index before committing.

2实际上,为了进行新的提交,Git只需将索引(就像它现在一样)使用git write-tree转换为树,然后进行命名该树的提交(并使用当前提交作为其父,并有一个作者和提交者,以及一个提交消息)。 Git使用现有索引的事实是您必须在提交之前将更新的工作树文件添加到索引中的原因。

There are some convenience short-cuts that let you tell git commit to add files to the index, e.g., git commit -a or git commit <path>. These can be a bit tricky as they don't always produce the index you might expect. See the --include vs --only options to git commit <path>, for instance. They also work by copying the main index to a new, temporary index; and this can have surprising results, because if the commit succeeds, the temporary index is copied back over the regular index.

有一些方便的快捷方式可以告诉git commit将文件添加到索引中,例如git commit -a或git commit 。这些可能有点棘手,因为它们并不总是产生您可能期望的索引。例如,请参阅git commit 的--include vs --only选项。他们还通过将主索引复制到新的临时索引来工作;这可能会产生令人惊讶的结果,因为如果提交成功,临时索引将复制回常规索引。

#1


3  

There are multiple questions here, whose answers intertwine. Let's start with rename and copy detection, then move on to branches.

这里有很多问题,其答案交织在一起。让我们从重命名和复制检测开始,然后转到分支。

Rename detection

However because the diff takes into account the file path as part of the file name, files with the same name, in the two different directories, result in a diff output with the renamed flag instead of changed.

但是,因为diff将文件路径作为文件名的一部分考虑在内,所以在两个不同的目录中具有相同名称的文件会导致带有重命名标志的diff输出而不是更改。

This is not quite right. (The text below is meant to address both your items 1 and 2.)

这不太对。 (以下文字旨在解决您的第1项和第2项。)

Although you are using --no-index (presumably, to make Git work on directories outside the repository), Git's diff code behaves the same way in all cases. In order to diff (compare) two files in two trees, Git must first determine file identity. That is, there are two sets of files: those in the "left side" or source tree (the first directory name), and those in the "right side" or destination tree (the second directory name). Some files on the left are the same file as some files on the right. Some files on the left are different files that have no corresponding right-side file, i.e., they have been deleted. Finally, some files on the right side are new, i.e., they have been created.

虽然你使用的是--no-index(可能是为了让Git在存储库之外的目录上工作),但Git的diff代码在所有情况下都表现相同。为了在两棵树中区分(比较)两个文件,Git必须首先确定文件标识。也就是说,有两组文件:“左侧”或源树(第一个目录名)中的文件,以及“右侧”或目标树中的文件(第二个目录名)。左侧的某些文件与右侧的某些文件是同一个文件。左边的一些文件是没有相应右侧文件的不同文件,即它们已被删除。最后,右侧的一些文件是新的,即它们已被创建。

Files that are "the same file" need not have the same path name. In this case, those files have been renamed.

“同一文件”的文件不必具有相同的路径名。在这种情况下,这些文件已被重命名。

Here's how it works in detail. Note that "full path name" is modified somewhat when using git diff --no-index dir1 dir2: the "full path name" is what is left after stripping off the dir1 and dir2 prefixes.

以下是它的详细工作原理。注意,当使用git diff --no-index dir1 dir2时,“完整路径名称”会有所修改:“完整路径名称”是剥离dir1和dir2前缀后剩余的内容。

When comparing the left and right side trees, files that have the same full path names are normally automatically considered "the same file". We place all these files into a queue of "files to be diffed", and none will show up as being renamed. Note the word "normally" here—we'll come back to this in a moment.

比较左侧和右侧树时,具有相同完整路径名的文件通常会自动被视为“同一文件”。我们将所有这些文件放入“要扩散的文件”的队列中,并且没有一个将显示为重命名。请注意这里的“正常”一词 - 我们马上回过头来看看。

This leaves us with two remaining lists of files:

这留下了两个剩余的文件列表:

  • paths that exist on the left, but not the right: source without destination
  • 存在于左侧但不是右侧的路径:没有目标的源
  • paths that exist on the right, but not the left: destination without source
  • 存在于右侧但不是左侧的路径:没有源的目标

Naïvely, we can simply declare that all of these source-side files have been deleted, and all of these destination files have been created. You can instruct git diff to behave this way: set the --no-renames flag to disable rename detection.

天真地,我们可以简单地声明所有这些源端文件都已被删除,并且所有这些目标文件都已创建。您可以指示git diff以这种方式运行:设置--no-renames标志以禁用重命名检测。

Or, Git can go on to use a smarter algorithm: set the --find-renames and/or -M <threshold> flag to do this. In Git versions 2.9 and later, rename detection is on by default.

或者,Git可以继续使用更智能的算法:设置--find-renames和/或-M 标志来执行此操作。在Git版本2.9及更高版本中,默认情况下启用重命名检测。

Now, how shall Git decide that a source file has the same identity as a destination file? They have different paths; which right-side file does a/b/c.txt on the left correspond to? It might be d/e/f.bin, or d/e/f.txt, or a/b/renamed.txt, and so on. The actual algorithm is relatively simple, and in the past did not take final name component into effect (I'm not sure if it does now, Git is constantly evolving):

现在,Git如何确定源文件与目标文件具有相同的标识?他们有不同的道路;哪个右侧文件左边的a / b / c.txt对应?它可能是d / e / f.bin,或者是d / e / f.txt,或者是/ b / renamed.txt,依此类推。实际算法相对简单,并且在过去没有使最终名称组件生效(我不确定它是否现在,Git不断发展):

  • If there are source and destination files whose contents match exactly, pair them. Because Git hashes contents, this comparison is very fast. We can compare left-side a/b/c.txt by its hash ID to every file on the right, simply by looking at all of their hash IDs. Therefore, we run through all source files first, finding destination files that match, putting the new pairs into the diff queue and pulling them out of the two lists.

    如果存在内容完全匹配的源文件和目标文件,请将它们配对。因为Git哈希内容,这种比较非常快。我们可以通过查看所有哈希ID,将左侧a / b / c.txt的哈希值与右侧的每个文件进行比较。因此,我们首先遍历所有源文件,找到匹配的目标文件,将新对放入diff队列并将它们从两个列表中拉出。

  • For all remaining source and destination files, run an efficient, but unsuitable for git diff output, algorithm to compute "file similarity". A source file that is at least <threshold> similar to some destination file causes a pairing, and that file-pair is removed. The default threshold is 50%: if you have enabled rename detection without choosing a particular threshold, two files that are still in the lists by this point, and are 50% similar, get paired.

    对于所有剩余的源文件和目标文件,运行一个有效但不适合git diff输出的算法来计算“文件相似性”。至少 类似于某个目标文件的源文件会导致配对,并删除该文件对。默认阈值为50%:如果您在未选择特定阈值的情况下启用了重命名检测,那么此时仍在列表中并且50%相似的两个文件将进行配对。

  • Any remaining files are either deleted or created.

    删除或创建任何剩余文件。

Now that we have found all pairings, git diff proceeds to diff the paired, same-identity files, and tells us that deleted files are deleted, and newly-created files are created. If the two path names for same-identity files differ, git diff says the file is renamed.

现在我们找到了所有配对,git diff继续对配对的同一身份文件进行区分,并告诉我们删除了已删除的文件,并创建了新创建的文件。如果同一标识文件的两个路径名不同,git diff表示该文件已重命名。

The arbitrary-file-pairing code is expensive (even though the same-name-gives-a-pair code is very cheap), so Git has a limit on how many names go into these pairing source and destination lists. That limit is configured through git config diff.renameLimit. The default has climbed over the years and is now several thousand files. You can set it to 0 (zero) to make Git use its own internal maximum at all times.

任意文件配对代码很昂贵(即使同名代码对代码非常便宜),因此Git对这些配对源和目标列表中有多少名称进行了限制。该限制是通过git config diff.renameLimit配置的。多年来,默认值已经攀升,目前已有数千个文件。您可以将其设置为0(零)以使Git始终使用其自己的内部最大值。

Breaking pairs

Above, I said that normally, files with the same name are paired automatically. This is usually the right thing to do, so it is Git's default. In some cases, however, the left-side file that is named a/b/c.txt is actually not related to the right-side file named a/b/c.txt, it's really related to the right-side a/doc/c.txt for instance. We can tell Git to break pairings of files that are "too different".

上面,我说通常,具有相同名称的文件会自动配对。这通常是正确的做法,因此它是Git的默认值。但是,在某些情况下,名为a / b / c.txt的左侧文件实际上与名为a / b / c.txt的右侧文件无关,它实际上与右侧文件相关a /例如doc / c.txt。我们可以告诉Git打破“太不同”的文件配对。

We saw the "similarity index" used above to form pairings of files. This same similarity index can be used to split files: -B20%/60%, for instance. The two numbers need not add up to 100% and you can actually omit either one, or both: there's a default value for each if you set -B mode.

我们看到上面使用的“相似性索引”形成了文件配对。可以使用相同的相似性索引来分割文件:例如,-B20%/ 60%。这两个数字不需要加起来达到100%,你实际上可以省略一个或两个:如果你设置-B模式,每个都有一个默认值。

The first number is the point at which a default-already-paired file can be put into the rename detection lists. With -B20%, if the files are 20% dis-similar (i.e., only 80% similar), the file goes into the "source for renames" list. If it never gets taken as a rename, it can re-pair with its automatic destination—but at this point, the second number, the one after the slash, takes effect.

第一个数字是可以将默认配对文件放入重命名检测列表的点。使用-B20%,如果文件不相似20%(即只有80%相似),则文件进入“重命名源”列表。如果它永远不会被视为重命名,它可以与其自动目标重新配对 - 但此时,第二个数字(斜杠之后的数字)生效。

The second number sets the point at which a pairing is definitely broken. With -B/70%, for instance, if the files are 70% dis-similar (i.e., only 30% similar), the pairing is broken. (Of course, if the file was taken away as a rename source, the pairing is already broken.)

第二个数字设定了配对肯定被打破的点。例如,如果文件是70%不相似(即,只有30%相似),则使用-B / 70%,配对被破坏。 (当然,如果文件被删除为重命名源,则配对已经破坏。)

Copy detection

Besides the usual pairing and rename detection, you can ask Git to find copies of source files. After running all the usual pairing code, including finding renames and breaking pairs, if you have specified -C, Git will look for "new" (i.e., unpaired) destination files that are actually copied from existing sources. There are two modes for this, depending on whether you specify -C twice or add --find-copies-harder: one considers only source files that are modified (that's the single -C case), and one that considers every source file (that's the two -C or --find-copies-harder case). Note that this "was a source file modified" means, in this case, that the source file is already in the paired queue—if not, it's not "modified" by definition—and its corresponding destination file has a different hash ID (again, this is a very low-cost test, which helps keep a single -C option cheap).

除了通常的配对和重命名检测之外,您还可以让Git查找源文件的副本。运行所有常用的配对代码(包括查找重命名和破坏对)后,如果指定了-C,Git将查找实际从现有源复制的“新”(即未配对)目标文件。有两种模式,取决于您是指定-C两次还是添加--find-copies-harder:一种只考虑被修改的源文件(即单个-C情况),另一种考虑每个源文件(这是两个-C或 - 难以复制的情况)。请注意,这个“被修改的源文件”意味着,在这种情况下,源文件已经在配对队列中 - 如果不是,它不是按定义“修改” - 并且其对应的目标文件具有不同的哈希ID(再次,这是一个非常低成本的测试,有助于保持单个-C选项便宜)。

Branches don't matter

Would using branches instead of directories resolve this issue easily and if so what is the branch version of the command listed above?

使用分支而不是目录很容易解决这个问题,如果是这样,上面列出的命令的分支版本是什么?

Branches make no difference here.

分店在这里没什么区别。

In Git, the term branch is ambiguous. See What exactly do we mean by "branch"? For git diff, though, a branch name simply resolves to a single commit, namely the tip commit of that branch.

在Git中,术语分支是模糊的。看看“分支”究竟是什么意思?但是对于git diff,分支名称只是解析为单个提交,即该分支的提示。

I like to draw Git's branches like this:

我喜欢这样绘制Git的分支:

...--o--o--o   <-- branch1
         \
          o--o--o   <-- branch2

The small round os each represent a commit. The two branch names are simply pointers, in Git: they point to one specific commit. The name branch1 points to the rightmost commit on the top line, and the name branch2 points to the rightmost commit on the bottom line.

小圆孔每个代表一个提交。在Git中,两个分支名称只是指针:它们指向一个特定的提交。名称branch1指向顶行最右边的提交,名称branch2指向最底层的最右边提交。

Each commit, in Git, points back to its parent or parents (most commits have just one parent, while a merge commit is simply a commit with two or more parents). This is what forms the chain of commits that we also call "a branch". The branch name points directly to the tip of a chain.1

Git中的每次提交都指向其父级或父级(大多数提交只有一个父级,而合并提交只是一个包含两个或更多父级的提交)。这就形成了我们也称之为“分支”的提交链。分支名称直接指向链的尖端

When you run:

当你运行:

$ git diff branch1 branch2

all that Git does is resolve each name to its corresponding commit. For instance, if branch1 names commit 1234567... and branch2 names commit 89abcde..., this just does the same thing as:

Git所做的就是将每个名称解析为相应的提交。例如,如果branch1名称提交1234567 ...而branch2名称提交89abcde ...,这只是做同样的事情:

$ git diff 1234567 89abcde

Git's diff takes two trees

Git does not even care that these are commits, really. Git just needs a left side or source tree, and a right side or destination tree. These two trees can come from a commit, because a commit names a tree: the tree of any commit is the source snapshot taken when you made that commit. They can come from a branch, because a branch-name names a commit, which names a tree. One of the trees can come from Git's "index" (aka "staging area" aka "cache"), as the index is basically a flattened tree.2 One of the trees can be your work-tree. One or both trees can even be outside of Git's control (hence the --no-index flag).

Git甚至不关心这些是提交,真的。 Git只需要左侧或源树,以及右侧或目标树。这两个树可以来自提交,因为提交命名树:任何提交的树是您进行提交时获取的源快照。它们可以来自分支,因为分支名称命名提交,命名树。其中一棵树可以来自Git的“索引”(又名“staging area”又名“cache”),因为索引基本上是一棵扁平的树.2其中一棵树可以是你的工作树。一个或两个树甚至可以在Git的控制之外(因此--no-index标志)。

Of course, Git can just diff two files

If you run git diff --no-index /path/to/file1 /path/to/file2, Git will simply diff the two files, i.e., treat them as a pair. This bypasses all the pairing and rename-detecting code entirely. If no amount of fiddling with --no-renames, --find-renames, --rename-threshold, etc., options does the trick, you can explicitly diff file paths, rather than directory (tree) paths. For a large set of files, this will, of course, be painful.

如果你运行git diff --no-index / path / to / file1 / path / to / file2,Git将简单地区分两个文件,即将它们视为一对。这完全绕过了所有配对和重命名检测代码。如果没有使用--no-renames, - find-renames, - rename-threshold等等的选项,那么你可以明确区分文件路径,而不是目录(树)路径。对于大量文件,这当然会很痛苦。


1There can be more commits past that point, but it's still the tip of its chain. Moreover, multiple names can point to a single commit. I draw this situation as:

1过去可能有更多的提交,但它仍然是其链条的一角。而且,多个名称可以指向单个提交。我将这种情况描述为:

...--o--o   <-- tip1
         \
          o--o   <-- tip2, tip3

Note that commits that are "behind" more than one branch name are, in fact, on all of those branches. So both bottom-row commits are on both tip2 and tip3 branches, while both top-row commits are on all three branches. Nonetheless, each branch name resolves to one, and only one, commit.

请注意,实际上,所有这些分支上的“后面”多个分支名称的提交。因此,两个底行提交都在tip2和tip3分支上,而两个顶行提交都在所有三个分支上。尽管如此,每个分支名称都解析为一个,并且只有一个提交。

2In fact, to make a new commit, Git simply converts the index, just as it stands right now, into a tree using git write-tree, and then makes a commit that names that tree (and that uses the current commit as its parent, and has an author and committer, and a commit message). The fact that Git uses the existing index is why you must git add your updated work-tree files into the index before committing.

2实际上,为了进行新的提交,Git只需将索引(就像它现在一样)使用git write-tree转换为树,然后进行命名该树的提交(并使用当前提交作为其父,并有一个作者和提交者,以及一个提交消息)。 Git使用现有索引的事实是您必须在提交之前将更新的工作树文件添加到索引中的原因。

There are some convenience short-cuts that let you tell git commit to add files to the index, e.g., git commit -a or git commit <path>. These can be a bit tricky as they don't always produce the index you might expect. See the --include vs --only options to git commit <path>, for instance. They also work by copying the main index to a new, temporary index; and this can have surprising results, because if the commit succeeds, the temporary index is copied back over the regular index.

有一些方便的快捷方式可以告诉git commit将文件添加到索引中,例如git commit -a或git commit 。这些可能有点棘手,因为它们并不总是产生您可能期望的索引。例如,请参阅git commit 的--include vs --only选项。他们还通过将主索引复制到新的临时索引来工作;这可能会产生令人惊讶的结果,因为如果提交成功,临时索引将复制回常规索引。