Stanford InfoLab Publication Server

Gloss: Text-Source Discovery over the Internet

Gravano, L. and Garcia-Molina, H. and Tomasic, A. (2000) Gloss: Text-Source Discovery over the Internet. Technical Report. Stanford InfoLab. (Publication Note: ACM Transactions on Database Systems, 24(2), June 1999. )




GlOSS: T ext-Source Discovery over the Internet Luis Gravano Columbia University and H ector Garc a-Molina Stanford University and Anthony T omasic INRIA RocquencourtThe dramatic growth of the Internet has created a new problem for users: the location of relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the text-source discovery problem. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Then, users present queries to the service, which returns an ordered list of promising text sources. This article describes GlOSS { Glossary of Servers Server {, with two versions: bGlOSS, which provides a Boolean query retrieval moand vGlOSS, which provides a vector-space retrieval model. W e also present hGlOSS, which proa decentralized version of the system. W e extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimenevidence, based on actual data, that all three systems are highly effective at determining promising text sources for a given query . Categories and Subject Descriptors: H.3 [Information Systems]: Information Storage and Retrieval General T erms: Performance, Measurement Additional Key W ords and Phrases: Internet search and retrievdigital libraries, text databases, distributed information retrievalName: Luis Gravano Address: Computer Science Department, Columbia University , 1214 Amsterdam Avenue, New Y ork, NY 10027, USA; email: Name: H ector Garc a-Molina Affliation: Computer Science Department, University , USA Address: Name: Anthony T omasic Affliation: INRIA Rocquencourt, F rance Address: Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct comme

Item Type:Techreport (Technical Report)
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepage
ID Code:432
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:27 Dec 2008 14:29

Download statistics

Repository Staff Only: item control page