A Copy Detection Mechanism for Digital Documents

时间:2012-04-06 14:03:49
【文件属性】:

文件名称:A Copy Detection Mechanism for Digital Documents

文件大小:212KB

文件格式:PDF

更新时间:2012-04-06 14:03:49

Copy Detection Mechanism Digital

Abstract Copy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed ser¬vices to o~er valuable on-line data. We consider the case for a registration server that maintains regis¬tered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on compar¬ing the word frequency occurrences of the new docu¬ment against those of registered documents. We also report on an experimental comparison between our proposed scheme and COPS [6], a detection scheme based on sentence overlap. The tests involve over a million comparisons of netnews articles and show that in general the new scheme performs better in detecting documents that have partial overlap.


网友评论