My organisation is preparing to release an open-source version of our software using github, however I'm not sure the best way to approach this:
我的组织正在准备使用github发布我们软件的开源版本,但是我不确定解决这个问题的最佳方法:
We have two branches master and release, master contains some proprietary components that we have decided not to release, and release contains the cleaned-up version that we want to distribute. The problem is, if we just push the release branch to github, the proprietary components can be retrieved by looking through the revision history.
我们有两个分支master和release,master包含一些我们决定不发布的专有组件,release包含我们想要分发的清理版本。问题是,如果我们只是将发布分支推送到github,可以通过查看修订历史来检索专有组件。
I was considering creating a separate repository, copying the HEAD of relase into it, doing a git init
, and pushing that repository to github. However, we want to retain the ability to cherry-pick certain patches from master into release in the future, and push those changes up to github.
我正在考虑创建一个单独的存储库,将HEAD of relase复制到其中,执行git init,并将该存储库推送到github。但是,我们希望保留从主服务器中挑选某些补丁到将来发布的能力,并将这些更改推送到github。
Is there a way to do this without maintaining two separte repositories?
有没有办法在不维护两个separte存储库的情况下执行此操作?
Thanks!
谢谢!
Update:
更新:
To be a little more specific, this is sort-of what our commit history looks like at the moment:
更具体一点,这就是我们的提交历史目前的样子:
--- o - o - o - o - f - o - o - f - master
\
c - c - c - c - c - c - c - REL - f - f
Where 'o' are commits in the master, proprietary branch, 'c' are commits that remove things that should not be published (often not removing entire files, but reworking existing ones not to rely on proprietary components), and 'f' are fixes in master that apply to release as well, and so have been cherry-picked. REL is a tagged version of the code we deem safe to publish, with no history whatsoever (even previous versions of the release branch, since not all the proprietary material had been removed before the REL tag).
'o'是主,专有分支中的提交,'c'是提交,删除不应发布的内容(通常不删除整个文件,但重新处理现有文件不依赖于专有组件),'f'是修复了适用于发布的大师,因此被挑选出来。 REL是我们认为可以安全发布的代码的标记版本,没有任何历史记录(甚至以前版本的发布分支,因为并非所有专有材料都在REL标记之前被删除)。
2 个解决方案
#1
17
Ben Jackson's answer already covers the general idea, but I'd like to add a few notes (more than a comment's worth) about the ultimate goal here.
本杰克逊的答案已经涵盖了一般的想法,但我想在这里添加一些关于最终目标的笔记(不仅仅是评论的价值)。
You can quite easily have two branches, one with an entirely clean (no private files) history, and one complete (with the private files), and share content appropriately. The key is to be careful about how you merge. An oversimplified history might look something like this:
您可以很容易地拥有两个分支,一个具有完全干净(无私有文件)历史记录,一个完整(具有私有文件),并且可以适当地共享内容。关键是你要如何合并。过度简化的历史记录可能如下所示:
o - o - o - o - o - o - o (public)
\ \ \ \
x ----- x ----x---- x - x (private)
The o
commits are the "clean" ones, and the x
are the ones containing some private information. As long as you merge from public to private, they can both have all the desired shared content, without ever leaking anything. As Ben said, you do need to be careful about this - you can't ever merge the other way. Still, it's quite possible to avoid - and you don't have to limit yourself to cherry-picking. You can use your normal desired merge workflow.
o提交是“干净的”,x是包含一些私人信息的提交。只要您从公共合并到私有,它们都可以拥有所有所需的共享内容,而不会泄漏任何内容。正如Ben所说,你需要小心这一点 - 你不可能以其他方式合并。尽管如此,它仍然可以避免 - 而且你不必限制自己采摘樱桃。您可以使用正常的所需合并工作流程。
In reality, your workflow could end up a little more complex, of course. You could develop a topic (feature/bugfix) on its own branch, then merge it into both the public and the private versions. You could even cherry-pick now and then. Really, anything goes, with the key exception of merging private into public.
实际上,当然,您的工作流程可能会更复杂一些。您可以在自己的分支上开发主题(feature / bugfix),然后将其合并到公共版本和私有版本中。你甚至可以偶尔挑选。真的,任何事情都有,除了将私人合并到公众之外的关键例外。
filter-branch
So, your problem right now is simply getting your repository into this state. Unfortunately, this can be pretty tricky. Assuming that some commits exist which touch both private and public files, I believe that the simplest method is to use filter-branch
to create the public (clean) version:
所以,你现在的问题只是让你的存储库进入这种状态。不幸的是,这可能非常棘手。假设存在一些触及私有和公共文件的提交,我认为最简单的方法是使用filter-branch来创建公共(干净)版本:
git branch public master # create the public branch from current master
git filter-branch --tree-filter ... -- public # filter it (remove private files with a tree filter)
then create a temporary private-only branch, containing only the private content:
然后创建一个临时的仅限私有分支,仅包含私有内容:
git branch private-temp master
git filter-branch --tree-filter ... -- private-temp # remove public files
And finally, create the private branch. If you're okay with only having one complete version, you can simply merge once:
最后,创建私有分支。如果你只有一个完整版本就可以了,你可以简单地合并一次:
git branch private private-temp
git merge public
That'll get you a history with only one merge:
那将只给你一个只有一个合并的历史:
o - o - o - o - o - o - o - o - o - o (public)
\
x -- x -- x -- x -- x -- x -- x --- x (private)
Note: there are two separate root commits here. That's a little weird; if you want to avoid it, you can use git rebase --root --onto <SHA1>
to transplant the entire private-temp branch onto some ancestor of the public branch.
注意:这里有两个单独的root提交。这有点奇怪;如果你想避免它,你可以使用git rebase --root --onto
If you'd like to have some intermediate complete versions, you can do the exact same thing, just stopping here and there to merge and rebase:
如果您想要一些中间完整版本,您可以做同样的事情,只需在此处停止合并和变基:
git checkout -b private <private-SHA1> # use the SHA1 of the first ancestor of private-temp
# you want to merge something from public into
git merge <public-SHA1> # merge a corresponding commit of the public branch
git rebase private private-temp # rebase private-temp to include the merge
git checkout private
git merge <private-SHA1> # use the next SHA1 on private-temp you want to merge into
# this is a fast-forward merge
git merge <public-SHA1> # merge something from public
git rebase private private-temp # and so on and so on...
This will get you a history something like this:
这将为您提供如下历史记录:
o - o - o - o - o - o - o - o - o - o (public)
\ \ \
x -- x -- x -- x -- x -- x -- x --- x (private)
Again, if you want them to have a common ancestor, you can do an initial git rebase --root --onto ...
to get started.
同样,如果你想让他们有一个共同的祖先,你可以做一个初始的git rebase --root --onto ...来开始。
Note: if you have merges in your history already, you'll want to use the -p
option on any rebases to preserve the merges.
注意:如果您已在历史记录中进行合并,则需要在任何rebase上使用-p选项来保留合并。
fake it
Edit: If reworking the history really turns out to be intractable, you can always totally fudge it: squash the entire history down to one commit, on top of the same root commit you already have. Something like this:
编辑:如果重新编写历史记录真的是难以处理的话,你总是可以完全捏造它:将整个历史记录压缩到一个提交,在你已经拥有的相同根提交之上。像这样的东西:
git checkout public
git reset --soft <root SHA1>
git commit
So you'll end up with this:
所以你最终会得到这个:
o - A' (public)
\
o - x - o - x - X - A (public@{1}, the previous position of public)
\
x - x (private)
where A
and A'
contain exactly the same content, and X
is the commit in which you removed all private content from the public branch.
其中A和A'包含完全相同的内容,X是您从公共分支中删除所有私有内容的提交。
At this point, you can do a single merge of public into private, and from then on, follow the workflow that I described at the top of the answer:
此时,您可以将公共单独合并为私有,从那时起,按照我在答案顶部描述的工作流程:
git checkout private
git merge -s ours public
The -s ours
tells git to use the "ours" merge strategy. This means it keeps all content exactly as it is in the private branch, and simply records a merge commit showing that you merged the public branch into it. This prevents git from ever applying those "remove private" changes from commit X
to the private branch.
-s our告诉git使用“我们的”合并策略。这意味着它保留所有内容与私有分支中的完全相同,并简单地记录合并提交,显示您将公共分支合并到其中。这可以防止git将提交X中的“删除私有”更改应用到专用分支。
If the root commit has private information in it, then you'll probably want to create a new root commit, instead of committing once on top of the current one.
如果根提交中包含私有信息,那么您可能希望创建一个新的根提交,而不是在当前提交之前提交一次。
#2
5
The SHA of a commit is based on the commit blob, which includes the parent SHA, the commit text and the SHA of the tree of files. The tree contains the SHA of every blob in the tree. Thus any given commit depends on everything in that revision and every parent revision back to an empty repository. If you have a commit derived from a version (no matter how indirectly) that includes files you don't want to release, then you don't want to release that branch.
提交的SHA基于提交blob,其中包括父SHA,提交文本和文件树的SHA。树包含树中每个blob的SHA。因此,任何给定的提交都依赖于该修订中的所有内容以及每个父修订版本返回到空的存储库如果您从包含您不想发布的文件的版本(无论多么间接)派生提交,那么您不希望发布该分支。
The very first example of git filter-branch
talks about removing a confidential file from a repository. It does this by creating an alternate history (rewriting all of the trees and commits). You can see why this must be true if you understand the first part of my answer.
git filter-branch的第一个例子讨论了从存储库中删除机密文件的问题。它通过创建备用历史记录(重写所有树和提交)来实现。如果你理解我的答案的第一部分,你可以看出为什么这一定是真的。
You should be able to run the filter-branch commands to create a new commit from your "clean" commit. The history will be somewhat odd (older versions may not build because they are now incomplete or otherwise broken). This won't destroy any of your existing branches or blobs in your repository. It will create all new (parallel) ones which share the file blobs but not the trees or commits. You should be able to safely push that branch without exposing any of the objects that it does not refer to (when you push a branch, only the SHA named by that branch and its dependencies are pushed). However, this would be somewhat risky because one git merge
into the "clean" branch and you could end up dragging in "private" branches and objects. You may want to use a hook (commit or push trigger) to double check that private files are not escaping.
您应该能够运行filter-branch命令以从“干净”提交创建新提交。历史将有些奇怪(旧版本可能无法构建,因为它们现在已不完整或以其他方式破坏)。这不会破坏存储库中的任何现有分支或blob。它将创建所有新的(并行)共享文件blob而不是树或提交。您应该能够安全地推送该分支而不暴露任何它没有引用的对象(当您推送分支时,只有该分支命名的SHA及其依赖项被推送)。但是,这有点冒险,因为一个git合并到“干净”分支中,你最终可能会拖入“私有”分支和对象。您可能希望使用钩子(提交或推送触发器)来仔细检查私有文件是否未转义。
#1
17
Ben Jackson's answer already covers the general idea, but I'd like to add a few notes (more than a comment's worth) about the ultimate goal here.
本杰克逊的答案已经涵盖了一般的想法,但我想在这里添加一些关于最终目标的笔记(不仅仅是评论的价值)。
You can quite easily have two branches, one with an entirely clean (no private files) history, and one complete (with the private files), and share content appropriately. The key is to be careful about how you merge. An oversimplified history might look something like this:
您可以很容易地拥有两个分支,一个具有完全干净(无私有文件)历史记录,一个完整(具有私有文件),并且可以适当地共享内容。关键是你要如何合并。过度简化的历史记录可能如下所示:
o - o - o - o - o - o - o (public)
\ \ \ \
x ----- x ----x---- x - x (private)
The o
commits are the "clean" ones, and the x
are the ones containing some private information. As long as you merge from public to private, they can both have all the desired shared content, without ever leaking anything. As Ben said, you do need to be careful about this - you can't ever merge the other way. Still, it's quite possible to avoid - and you don't have to limit yourself to cherry-picking. You can use your normal desired merge workflow.
o提交是“干净的”,x是包含一些私人信息的提交。只要您从公共合并到私有,它们都可以拥有所有所需的共享内容,而不会泄漏任何内容。正如Ben所说,你需要小心这一点 - 你不可能以其他方式合并。尽管如此,它仍然可以避免 - 而且你不必限制自己采摘樱桃。您可以使用正常的所需合并工作流程。
In reality, your workflow could end up a little more complex, of course. You could develop a topic (feature/bugfix) on its own branch, then merge it into both the public and the private versions. You could even cherry-pick now and then. Really, anything goes, with the key exception of merging private into public.
实际上,当然,您的工作流程可能会更复杂一些。您可以在自己的分支上开发主题(feature / bugfix),然后将其合并到公共版本和私有版本中。你甚至可以偶尔挑选。真的,任何事情都有,除了将私人合并到公众之外的关键例外。
filter-branch
So, your problem right now is simply getting your repository into this state. Unfortunately, this can be pretty tricky. Assuming that some commits exist which touch both private and public files, I believe that the simplest method is to use filter-branch
to create the public (clean) version:
所以,你现在的问题只是让你的存储库进入这种状态。不幸的是,这可能非常棘手。假设存在一些触及私有和公共文件的提交,我认为最简单的方法是使用filter-branch来创建公共(干净)版本:
git branch public master # create the public branch from current master
git filter-branch --tree-filter ... -- public # filter it (remove private files with a tree filter)
then create a temporary private-only branch, containing only the private content:
然后创建一个临时的仅限私有分支,仅包含私有内容:
git branch private-temp master
git filter-branch --tree-filter ... -- private-temp # remove public files
And finally, create the private branch. If you're okay with only having one complete version, you can simply merge once:
最后,创建私有分支。如果你只有一个完整版本就可以了,你可以简单地合并一次:
git branch private private-temp
git merge public
That'll get you a history with only one merge:
那将只给你一个只有一个合并的历史:
o - o - o - o - o - o - o - o - o - o (public)
\
x -- x -- x -- x -- x -- x -- x --- x (private)
Note: there are two separate root commits here. That's a little weird; if you want to avoid it, you can use git rebase --root --onto <SHA1>
to transplant the entire private-temp branch onto some ancestor of the public branch.
注意:这里有两个单独的root提交。这有点奇怪;如果你想避免它,你可以使用git rebase --root --onto
If you'd like to have some intermediate complete versions, you can do the exact same thing, just stopping here and there to merge and rebase:
如果您想要一些中间完整版本,您可以做同样的事情,只需在此处停止合并和变基:
git checkout -b private <private-SHA1> # use the SHA1 of the first ancestor of private-temp
# you want to merge something from public into
git merge <public-SHA1> # merge a corresponding commit of the public branch
git rebase private private-temp # rebase private-temp to include the merge
git checkout private
git merge <private-SHA1> # use the next SHA1 on private-temp you want to merge into
# this is a fast-forward merge
git merge <public-SHA1> # merge something from public
git rebase private private-temp # and so on and so on...
This will get you a history something like this:
这将为您提供如下历史记录:
o - o - o - o - o - o - o - o - o - o (public)
\ \ \
x -- x -- x -- x -- x -- x -- x --- x (private)
Again, if you want them to have a common ancestor, you can do an initial git rebase --root --onto ...
to get started.
同样,如果你想让他们有一个共同的祖先,你可以做一个初始的git rebase --root --onto ...来开始。
Note: if you have merges in your history already, you'll want to use the -p
option on any rebases to preserve the merges.
注意:如果您已在历史记录中进行合并,则需要在任何rebase上使用-p选项来保留合并。
fake it
Edit: If reworking the history really turns out to be intractable, you can always totally fudge it: squash the entire history down to one commit, on top of the same root commit you already have. Something like this:
编辑:如果重新编写历史记录真的是难以处理的话,你总是可以完全捏造它:将整个历史记录压缩到一个提交,在你已经拥有的相同根提交之上。像这样的东西:
git checkout public
git reset --soft <root SHA1>
git commit
So you'll end up with this:
所以你最终会得到这个:
o - A' (public)
\
o - x - o - x - X - A (public@{1}, the previous position of public)
\
x - x (private)
where A
and A'
contain exactly the same content, and X
is the commit in which you removed all private content from the public branch.
其中A和A'包含完全相同的内容,X是您从公共分支中删除所有私有内容的提交。
At this point, you can do a single merge of public into private, and from then on, follow the workflow that I described at the top of the answer:
此时,您可以将公共单独合并为私有,从那时起,按照我在答案顶部描述的工作流程:
git checkout private
git merge -s ours public
The -s ours
tells git to use the "ours" merge strategy. This means it keeps all content exactly as it is in the private branch, and simply records a merge commit showing that you merged the public branch into it. This prevents git from ever applying those "remove private" changes from commit X
to the private branch.
-s our告诉git使用“我们的”合并策略。这意味着它保留所有内容与私有分支中的完全相同,并简单地记录合并提交,显示您将公共分支合并到其中。这可以防止git将提交X中的“删除私有”更改应用到专用分支。
If the root commit has private information in it, then you'll probably want to create a new root commit, instead of committing once on top of the current one.
如果根提交中包含私有信息,那么您可能希望创建一个新的根提交,而不是在当前提交之前提交一次。
#2
5
The SHA of a commit is based on the commit blob, which includes the parent SHA, the commit text and the SHA of the tree of files. The tree contains the SHA of every blob in the tree. Thus any given commit depends on everything in that revision and every parent revision back to an empty repository. If you have a commit derived from a version (no matter how indirectly) that includes files you don't want to release, then you don't want to release that branch.
提交的SHA基于提交blob,其中包括父SHA,提交文本和文件树的SHA。树包含树中每个blob的SHA。因此,任何给定的提交都依赖于该修订中的所有内容以及每个父修订版本返回到空的存储库如果您从包含您不想发布的文件的版本(无论多么间接)派生提交,那么您不希望发布该分支。
The very first example of git filter-branch
talks about removing a confidential file from a repository. It does this by creating an alternate history (rewriting all of the trees and commits). You can see why this must be true if you understand the first part of my answer.
git filter-branch的第一个例子讨论了从存储库中删除机密文件的问题。它通过创建备用历史记录(重写所有树和提交)来实现。如果你理解我的答案的第一部分,你可以看出为什么这一定是真的。
You should be able to run the filter-branch commands to create a new commit from your "clean" commit. The history will be somewhat odd (older versions may not build because they are now incomplete or otherwise broken). This won't destroy any of your existing branches or blobs in your repository. It will create all new (parallel) ones which share the file blobs but not the trees or commits. You should be able to safely push that branch without exposing any of the objects that it does not refer to (when you push a branch, only the SHA named by that branch and its dependencies are pushed). However, this would be somewhat risky because one git merge
into the "clean" branch and you could end up dragging in "private" branches and objects. You may want to use a hook (commit or push trigger) to double check that private files are not escaping.
您应该能够运行filter-branch命令以从“干净”提交创建新提交。历史将有些奇怪(旧版本可能无法构建,因为它们现在已不完整或以其他方式破坏)。这不会破坏存储库中的任何现有分支或blob。它将创建所有新的(并行)共享文件blob而不是树或提交。您应该能够安全地推送该分支而不暴露任何它没有引用的对象(当您推送分支时,只有该分支命名的SHA及其依赖项被推送)。但是,这有点冒险,因为一个git合并到“干净”分支中,你最终可能会拖入“私有”分支和对象。您可能希望使用钩子(提交或推送触发器)来仔细检查私有文件是否未转义。