Stanford InfoLab Publication Server

SONIA: A Service for Organizing Networked Information Autonomously

Sahami, Mehran and Yusufali, Salim and Baldonado, Michelle Q. W. (2000) SONIA: A Service for Organizing Networked Information Autonomously. Technical Report. Stanford InfoLab. (Publication Note: Third ACM conference on Digital libraries, Pittsburgh, Pennsylvania, June 23 - 26, 1998 )

BibTeXDublinCoreEndNoteHTML

[img]
Preview
PDF
128Kb

Abstract

The recent explosion of on-line information in Digital Liraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topical hierarchies. However, these tools are quickly becoming inadequate as query results grow incomprehensibily large and manual classification in topic hierarchies creates an immense bottleneck. We address these problems with a system for topical information space navigation that combines query-based and taxonomic systems. We employ machine learning techniques to create dynamic document categorizations based on the full-text of articles that are retrieved by users' queries. Our system, named SONIA (Service for Organizing Networked Information Autonomously), has been implemented as part of the Stanford Digital Libraries Testbed. It employs a combination of technologies that take the results of queries to networked information sources and, in real-time, automatically retrieve, parse and organize these documents into coherent categories for presention to the user. Moreover, the system can then save such document oragnizations in user profiles which can then be used to help classify future query results by the same user. SONIA uses a multi-tier approach to extracting relevant terms from documents as well as statistical clustering methods to determine potential topics within a document collection. It also makes use of Bayesian classification techniques to classify new documents within an existing catetgorization scheme. In this way, it allows navigate the results of a query at a more topical level than having to examine each document text separately.

Item Type:Techreport (Technical Report)
Additional Information:Previous number = SIDL-WP-1998-0086
Subjects:Computer Science > Digital Libraries
Projects:Digital Libraries
Related URLs:Project Homepagehttp://www-diglib.stanford.edu/diglib/pub/
ID Code:469
Deposited By:Import Account
Deposited On:29 Oct 2001 16:00
Last Modified:27 Dec 2008 15:37

Download statistics

Repository Staff Only: item control page