在仅二进制产品中搜索我的源代码

时间:2022-08-01 20:34:11

Let say I have a project that I have released under GPL, with the sources available to anyone. Later I find a very similar product, but as closed source, distributed binary-only by someone else.

假设我有一个我在GPL下发布的项目,任何人都可以使用这些项目。后来我发现了一个非常相似的产品,但是作为封闭源代码,仅由其他人分发二进制文件。

Is there a good way to find out they are using my source code in their product?

有没有一种好方法可以找出他们在他们的产品中使用我的源代码?

If the solution is to somehow reverse-engineer the binary, is it possible to somehow automate it?

如果解决方案是以某种方式对二进制文件进行逆向工程,是否有可能以某种方式自动化它?

EDIT: Clarification. The bug hunt is one option, but not definitive, especially if the project is a library and the binary has added its own GUI, for example. The situation I'm interested is when its not blatantly obvious that the code is lifted.

编辑:澄清。 bug搜索是一种选择,但不是决定性的,特别是如果项目是一个库并且二进制文件添加了自己的GUI,例如。我感兴趣的情况是代码被解除时并不明显。

8 个解决方案

#1


2  

Look for Software Birthmarks. This method tries to establish links between software based on binary code or dynamic behavior. Christian Collberg is an expert on Software Watermarks, from which birthmarks were derived. This is all still in research land.

寻找软件胎记。该方法试图在基于二进制代码或动态行为的软件之间建立链接。 Christian Collberg是软件水印专家,从中衍生出胎记。这仍然在研究领域。

#2


5  

Bugs.

If the closed source release shares most of it's bugs with your project, it's probably 'lifted'.

如果关闭的源代码版本与您的项目共享大部分错误,则可能会“解除”。

You could also try decompiling your own binary with a decompiled version of the closed source binary... though this would probably not be reliable.

您也可以尝试使用反馈源二进制文件的反编译版本来反编译您自己的二进制文件...尽管这可能不可靠。

#3


3  

Obviously, if the suspected binary is not stripped, you can just look for any symbols that share the same name as your code's.

显然,如果未删除可疑二进制文件,您只需查找与代码名称相同的任何符号。

#4


2  

There's a large body of work on decompiling and reverse-engineering binary codes. The world expert is probably Cristina Cifuentes. She's done a lot with decompilation. It would also be interesting to write to Alex Aiken and ask if his tool for Measure o f Software Similarity could be adapted to binary codes.

有关反编译和逆向工程二进制代码的大量工作。世界专家可能是Cristina Cifuentes。她在反编译方面做了很多。写给Alex Aiken并询问他的用于测量软件相似性的工具是否可以适应二进制代码也会很有趣。

#5


2  

An obvious method is to search for strings. run the unix strings tool and see if the binary contains any of the literal strings from your code. mainly stuff like error messages and text in messageboxes.

一种显而易见的方法是搜索字符串。运行unix字符串工具,看看二进制文件是否包含代码中的任何文字字符串。主要是错误消息和消息框中的文本。

#6


1  

You could try to disassemble both programs and compare the assembly, but if they used a different compiler then thier program could have minor differences. There are a few free disassemblers or a debugger could also step through in assembly.

您可以尝试反汇编这两个程序并比较程序集,但如果它们使用不同的编译器,那么它们的程序可能会有微小的差异。有一些免费的反汇编程序或调试程序也可以在程序集中逐步完成。

Other than that there really isn't an easy way to find out that kind of thing.

除此之外,确实没有一种简单的方法可以找到那种东西。

#7


0  

The most surefire way I can think of is similar to the word 'Esquivalience' in the oxford dictionary.
Simply add some binary array with a unique content somewhere in the code and don't forget to make some simple use of it so the linker won't optimize it away. You should probably obfuscate it somewhat so that it will not be obvious to the casual reader that that it's redundant.
Then open the compiled binary with a hex editior and look for it.

我能想到的最可靠的方式类似于牛津词典中的“Esquivalience”。只需在代码中的某处添加一些带有唯一内容的二进制数组,并且不要忘记简单地使用它,这样链接器就不会对其进行优化。你可能应该对它进行一些模糊处理,以便随意的读者不会明白它是多余的。然后使用十六进制编辑器打开已编译的二进制文件并查找它。

#8


0  

Why don't you look at the symbol table using nm?

你为什么不用nm看符号表?

$ nm a.out
...

#1


2  

Look for Software Birthmarks. This method tries to establish links between software based on binary code or dynamic behavior. Christian Collberg is an expert on Software Watermarks, from which birthmarks were derived. This is all still in research land.

寻找软件胎记。该方法试图在基于二进制代码或动态行为的软件之间建立链接。 Christian Collberg是软件水印专家,从中衍生出胎记。这仍然在研究领域。

#2


5  

Bugs.

If the closed source release shares most of it's bugs with your project, it's probably 'lifted'.

如果关闭的源代码版本与您的项目共享大部分错误,则可能会“解除”。

You could also try decompiling your own binary with a decompiled version of the closed source binary... though this would probably not be reliable.

您也可以尝试使用反馈源二进制文件的反编译版本来反编译您自己的二进制文件...尽管这可能不可靠。

#3


3  

Obviously, if the suspected binary is not stripped, you can just look for any symbols that share the same name as your code's.

显然,如果未删除可疑二进制文件,您只需查找与代码名称相同的任何符号。

#4


2  

There's a large body of work on decompiling and reverse-engineering binary codes. The world expert is probably Cristina Cifuentes. She's done a lot with decompilation. It would also be interesting to write to Alex Aiken and ask if his tool for Measure o f Software Similarity could be adapted to binary codes.

有关反编译和逆向工程二进制代码的大量工作。世界专家可能是Cristina Cifuentes。她在反编译方面做了很多。写给Alex Aiken并询问他的用于测量软件相似性的工具是否可以适应二进制代码也会很有趣。

#5


2  

An obvious method is to search for strings. run the unix strings tool and see if the binary contains any of the literal strings from your code. mainly stuff like error messages and text in messageboxes.

一种显而易见的方法是搜索字符串。运行unix字符串工具,看看二进制文件是否包含代码中的任何文字字符串。主要是错误消息和消息框中的文本。

#6


1  

You could try to disassemble both programs and compare the assembly, but if they used a different compiler then thier program could have minor differences. There are a few free disassemblers or a debugger could also step through in assembly.

您可以尝试反汇编这两个程序并比较程序集,但如果它们使用不同的编译器,那么它们的程序可能会有微小的差异。有一些免费的反汇编程序或调试程序也可以在程序集中逐步完成。

Other than that there really isn't an easy way to find out that kind of thing.

除此之外,确实没有一种简单的方法可以找到那种东西。

#7


0  

The most surefire way I can think of is similar to the word 'Esquivalience' in the oxford dictionary.
Simply add some binary array with a unique content somewhere in the code and don't forget to make some simple use of it so the linker won't optimize it away. You should probably obfuscate it somewhat so that it will not be obvious to the casual reader that that it's redundant.
Then open the compiled binary with a hex editior and look for it.

我能想到的最可靠的方式类似于牛津词典中的“Esquivalience”。只需在代码中的某处添加一些带有唯一内容的二进制数组,并且不要忘记简单地使用它,这样链接器就不会对其进行优化。你可能应该对它进行一些模糊处理,以便随意的读者不会明白它是多余的。然后使用十六进制编辑器打开已编译的二进制文件并查找它。

#8


0  

Why don't you look at the symbol table using nm?

你为什么不用nm看符号表?

$ nm a.out
...