Digital Library Project
Stanford University
Quarterly Report. Feb 1, 1997
Reporting Period: Nov 1, 1997-Jan 31, 1998

Content:

Administrative matters
InfoBus Architecture and Testbed
Economics
User Interfaces
Searching
Miscellaneous Activities
References to Papers Produced During Reporting Period

1. Administrative

The all projects meeting fell into this reporting period. This time, the meeting was held on the Berkeley campus. As always, the preparations for the meeting were intense and fruitful. We were able to show a variety of new system components.

We intensified our ties to NPACI. Our InfoBus architecture was presented during a meeting there, generating lots of interest. Likewise, our metadata architecture has been attracting attention.

Intel has generously awarded us server machines which we immediately put to use for the project. Similarly, IBM has been kind enough to supply us with servers and disk space which is being used in our Google project (see below).

Rebecca Wesley hosted the Digital Library Metrics group here at Stanford. This group is working towards developing effective metrics for digital library results.

2. InfoBus Architecture and Testbed

Much of our current InfoBus work is dedicated to making the InfoBus accessible to a broader community. In part, we do this by testing and ensuring inter-ORB communication capabilities between our ILU-based system and a commercial ORB. We have selected Visigenic's VisiBroker as our ORB of choice because it is now bundled into Netscape 4. Once all the parts are tested and modified appropriately, this feature will allow our InfoBus applets to rely on the presence of an ORB whenever they are loaded into a Netscape browser.

Also in the interest of making the distribution of InfoBus technology easy, we are building a set of example 'template' servers and clients that will serve as the core of a simple tutorial for InfoBus developers. The search service proxy we are creating as an example is an interface to ACM's publication server. As an example for non-search proxies we are building a Java-based proxy to MapQuest, a Web service that computes street routing based on street addresses. We have developed a presentation on how to build a non-search proxy.

In order to achieve inter-ORB compatibility we have had to switch all our operations to IIOP, a protocol designed by the Object Management Group to enable ORBs of different vendors to communicate. Switching to IIOP has in turn required us to upgrade some our system software.

Another related effort has led to a new release of our proxy generator. The proxy generator is a software facility that allows users to create InfoBus proxies to Web search engines interactively. It allows us to bring new and interesting Web resources online quickly and efficiently. The new version adds several important features to the previous system:

Forms support
Multiple statements that are executed in cascade when the previous one fails
Support for page redirection.

As part of our continued enhancements to the InfoBus, we made our collection objects more versatile. They now have the capability to store information items of different attribute models. They also support search over items coded with mixed models.

3. Economics

We progressed with the development of our shopping model approach. In particular we accomplished the following during the reporting period:

Definition of the U-DEL Delivery API
Creation & Evaluation of heuristics for two competitive sourcing domains
Completed treatment of "re-use of trusted intermediaries" for distributed transaction framework

We have also begun an effort to unify several existing notions of electronic wallets.

4. User Interfaces

We developed an entirely new interface for our Web agents. It allows users much more control over the agents' news recommendation behavior. The interface was formally tested, and the feedback is currently being integrated into the design.

In our DLITE user interface for Digital Libraries we added a facility for submitting queries to multiple sources simultaneously. While this capability was always present in our InfoBus, it was not accessible at the DLITE user interface level. The user can now drag icons representing sources onto a new multisearch widget. Then the user drags a query over the same widget. The result sets of all sources are animated out, and are populated with document icons.

In our work on audio-supported Web interfaces, we analyzed results from a user study related to the AHA (Audio HTML Access) framework, which tested three audio browsers to determine the appropriateness of certain types of audio markings for various HTML structures. The results added another dimension to the AHA framework, so that the principles outlined in it for choosing sounds to use in an audio presentation of HTML are now: (1) Vocal Source Identity (when to use speaker changes to mark structures), (2) Recognizability, and (3) Distraction (new).

We then turned to the application of the principles in the AHA framework to the actual choice of sounds in scenario interfaces. By looking at scenarios, we can see that other factors related to users (such as musical ability, culture, reading style, etc.) are needed in combination with the AHA principles to select specific sounds.

Our work on automated construction of specialized query interfaces progressed well. We have a prototype which we showed at the Berkeley DLI meeting. The facility uses our distributed metadata architecture to help query interface designers understand which information attributes are supported at different information sources. It then generates an appropriate interface.

5. Searching

Our Google search engine underwent a major upgrade. Google is our Web search engine whose ranking algorithm is based on the number of incoming links for each page. We now store 24 million pages of full text. An additional 400GB of disk readies us for scaling to about 100 million pages. The performance of Google as a search engine is quite good. The service receives many hits from outside and was very well received at the Berkeley meeting.

We are also working on an API that will allow us to gain access to Google's internal data, such as link anchors, and page ranks. This will create an opportunity for us to experiment with novel data analysis and search techniques.

We finished a framework for both query and data translation. For data translation, we model information as a set of conjunctive constraints that are satisfied by real-world objects (eg, documents). Through application of semantic rules and value transformation functions, constraints are mapped into ones that are understood and supported in another context. The machinery can also deal with hierarchically structured information.

Our KSS server work has resulted in a poster accepted to the 7th WWW conference in Brisbane, Australia: "Collaborative value filtering on the Web". KSS is a Web proxy that monitors a large group's access patterns. The information gathered can in turn be used to help users in navigating the Web, and in ranking result documents. KSS is a distributed system, in which multiple proxies consult with each other.

In the area of applying AI techniques to query result analysis, we finished implementing the new SONIA system that supports an interactive user-interface for building hierarchically organized document collections and classifying into them. The system allows for querying multiple information sources via the InfoBus and then provides AI tools to allow the user to quickly organize the retrieved results. We created a video tape demonstrating the use of the system.

6. Miscellaneous Activities

6.1 Visitors and Industry Contacts

Demo for visitors from NEC including Dr. Hiroyuki Tarumi
Carl Lagoze (Cornell)
Meeting with Genichiro Kikui (NTT Japan, currently visiting Stanford CSLI) and Leong Mun Kew (National Univ of Singapore). Discussed how to use STARTS in a Singapore-Japan joint project on multi-lingual information retrieval.
Dr. Tsuji of Hitachi, Japan
Mike Lesk of Bellcore
Allan Kuchinsky of Hewlett-Packard Laboratory
David Loy, Dialog Corporation
Steve Griffin, NSF

6.2 Public Presentations and Meetings Attended

Ed Chang: Presented a paper titled "BubbleUp: Low Latency Fast-Scan for Media Servers" at ACM International Conference on Multimedia, held at Seattle on 11/14/97
Presented "A Media Data Delivery Architecture" at NEC Research Labs, and at Hewlett-Packard Laboratories.
Arturo Crespo: Awareness Services for Digital Libraries (DLI Meeting)
Frankie James: Presentation to CSLI IAP meeting, November 12.
ICAD 97 (International Conference on Auditory Display), 2-5 November, including a workshop on "Audio and the WWW".
Steve Ketchpel: UN E-commerce and the creation of a Stanford CA: Carlos Moreira, head of the United Nations Trade Point Development Center
Delivered presentation at Intel, Hillsboro, OR: Information Commerce (incl. DLEcon activities)
Gerard Rodriguez: NPACI all hands meeting (National Partnership for Advanced Computational Infrastructure). San Diego, California
Rebecca Wesley: Attended the annual meeting of ASIS
Hosted the DLIB Metrics Group

6.3 Regular Meetings/Seminars

Weekly Digital Library seminar to which we invite prominent contributors to digital library research as speakers
Executive committee meetings when required
Weekly coordination meetings for all faculty, staff, and students
Several weekly technical design meetings of smaller subgroups

During the reporting period we published the following papers.

7. References

[1] Marko Balabanovic. Exploring versus Exploiting when Learning User Models for Text Recommendation. User Modeling and User-Adapted Interaction (to appear), 8(1), 1998.

[2] Marko Balabanovic. The ``Slider'' Interface. IBM interVisions, 11, February, 1998.

[3] Marko Balabanovic. An Interface for Learning Multi-topic User Profiles from Implicit Feedback. In Submitted to 21st International ACM/SIGIR Conference on Research and Development in Information Retrieval, 1998.

[4] Sergey Brin and Lawrence Page. Dynamic Data Mining: A New Architecture for Data with High Dimensionality. In Submitted to VLDB98, 1998.

[5] Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Submitted to the World-Wide Web Conference WWW7, 1998.

[6] Chen-Chuan K. Chang, Héctor García-Molina, and Andreas Paepcke. Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System. Number SIDL-WP-1996-0028. Stanford University, 1996. Accessible at http://www-diglib.stanford.edu.

[7] Edward Chang and Hector Garcia-Molina. MEDIC: A Memory & Disk Cache for Multimedia Clients. Number SIDL-WP-1997-0076. Stanford University, October, 1997.

[8] Edward Chang and Hector Garcia-Molina. Cost-Based Media Server Design. In To appear in the proceedings of the 8th Research Issues in Data Engineering, Feb, 1998.

[9] Edward Chang. An Image Coding and Reconstruction Scheme for Mobile Computing. In Submitted to IDMS '98, 1998.

[10] Chen-Chuan K. Chang and Hector Garcia-Molina. Conjunctive Constraint Mapping for Data Translation. Number SIDL-WP-1998-0083. Stanford Univ. January, 1998. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1998-0083.

[11] Junghoo Cho, Hector Garcia-Molina, and Lawrence Page. Efficient Crawling Through URL Ordering. In Submitted to the World-Wide Web Conference WWW7, 1998.

[12] Arturo Crespo and Hector Garcia-Molina. Archival Storage for Digital Libraries. In Submitted to Digital Libraries DL98, 1998. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1998-0082.

[13] Moises Goldszmidt and Mehran Sahami. A Probabilistic Approach to Full-Text Document Clustering. Submitted to the Nationa Conference on Artificial Intelligence (AAAI) 1998.

[14] Frankie James. Distinguishability vs. Distraction in Audio HTML Interfaces. In Submitted to International Journal on Digital Libraries, 1997.

[15] Frankie James. Lessons from Developing Audio HTML Interfaces. Number SIDL-WP-1997-0079. Stanford University, Nov, 1997.

[16] Michelle Baldonado, Seth Katz, Andreas Paepcke, Chen-Chuan K. Chang, Hector Garcia-Molina, and Terry Winograd. An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers. In Submitted to Digital Libraries DL98, 1998. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1998-0085.

[17] Shiva Narayanan and Hector Garcia-Molina. Computing iceberg queries efficiently. In Submitted to VLDB '98, 1998.

[18] Andreas Paepcke, Chen-Chuan K. Chang, Hector Garcia-Molina, and Terry Winograd. Interoperability for Digital Libraries Worldwide. To appear in Communications of the ACM, 41(4), April, 1998. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1998-0087.

[19] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. In Submitted to SIGIR '98, 1998.

[20] Mehran Sahami, Salim Yusufali, and Michelle Baldonado. SONIA: A Service for Organizing Networked Information Autonomously. Submitted to Digital Libraries 1998.