I'm storing papers in SQL Server 2005 and am looking for a way to paste in the text of a paper and then search for potential plagiarism (copied content) in the database.
我正在SQL Server 2005中存储文件,我正在寻找一种粘贴文本文本然后在数据库中搜索潜在抄袭(复制内容)的方法。
What's the best way to go about this? Is there a way to get a gauge for the extent to which something is similar to something else using full-text indexing, for several paragraphs of content?
最好的方法是什么?对于几段内容,有没有办法在某种程度上使用全文索引来衡量某些东西与其他内容类似的东西?
2 个解决方案
#1
why don't you install google desktop and have it only index that one directory
为什么不安装谷歌桌面并让它只索引一个目录
then you can have google do the indexing for you
那么你可以让谷歌为你做索引
#2
This is not really the sort of problem that full-text indexing in SQL Server is designed to solve. There's nothing built in to SQL Server that you can really use to help with this.
这实际上不是SQL Server中的全文索引旨在解决的问题。 SQL Server中没有任何内置功能可以帮助您实现此目的。
There are a number of specialised plagiarism detection tools, which a Google search will turn up for you. That's probably your best bet.
有许多专门的抄袭检测工具,谷歌搜索将为您提供。这可能是你最好的选择。
#1
why don't you install google desktop and have it only index that one directory
为什么不安装谷歌桌面并让它只索引一个目录
then you can have google do the indexing for you
那么你可以让谷歌为你做索引
#2
This is not really the sort of problem that full-text indexing in SQL Server is designed to solve. There's nothing built in to SQL Server that you can really use to help with this.
这实际上不是SQL Server中的全文索引旨在解决的问题。 SQL Server中没有任何内置功能可以帮助您实现此目的。
There are a number of specialised plagiarism detection tools, which a Google search will turn up for you. That's probably your best bet.
有许多专门的抄袭检测工具,谷歌搜索将为您提供。这可能是你最好的选择。