Stanford InfoLab Publication Server

SCAM: A Copy Detection Mechanism for Digital Documents

Shivakumar, N. and Garcia-Molina, H. (1995) SCAM: A Copy Detection Mechanism for Digital Documents. In: 2nd International Conference in Theory and Practice of Digital Libraries (DL 1995), June 11-13, 1995, Austin, Texas.




Copy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed services to offer valuable on-line data. We consider the case for a registration server that maintains registered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on comparing the word frequency occurrences of the new document against those of registered documents. We also report on an experimental comparison between our proposed scheme and COPS [6], a detection scheme based on sentence overlap. The tests involve over a million comparisons of netnews articles and show that in general the new scheme pbetter in detecting documents that have partial overlap. Keywords: Copy Detection, Plagiarism, Registration Ser-ver, Databases

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:SCAM, Copy detection, Plagiarism, Copyright
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:95
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:05 Feb 2009 15:05

Download statistics

Repository Staff Only: item control page