Stanford InfoLab Publication Server

Building a Scalable and Accurate Copy Detection Mechanism

Shivakumar, N. and Garcia-Molina, H. (1996) Building a Scalable and Accurate Copy Detection Mechanism. In: Proceedings of 1st ACM International Conference on Digital Libraries (DL'96) , March 1996, Bethesda Maryland.




Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial copies). The results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:SCAM, Copy detection, Plagiarism, Copyright
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:180
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:09 Dec 2008 09:36

Download statistics

Repository Staff Only: item control page