Digital Library Project
Stanford University
Quarterly Report. May 1, 1997
Reporting Period: Feb 1, 1997-April 30, 1997

Content:

Administrative matters
InfoBus Architecture and Testbed
Economics
User Interfaces
Searching
Miscellaneous Activities
References to Papers Produced During Reporting Period

1. Administrative

On April 29 we had our annual site visit. Our presentations are online at our Web site. They provide an excellent snapshot of our project.

Our annotated Digital Library bibliography has been mirrored as part of a large set CS Bibliographies in Germany.

The Math/CS Library received a Windows 95 PC to begin Phase II testing of the InfoBus and our DLITE interface. Several library staff members have accounts on the system and will be introducing it to patrons. The NASA Ames and Xerox PARC libraries will be the other two Phase II deployment sites.

2. InfoBus Architecture and Testbed

Work on the InfoBus infrastructure focused mainly on moving us to threaded operations. This is turning out to be an 'interesting' problem in that threading can interact with the many TCP/IP connections maintained by the underlying CORBA mechanisms. Nevertheless, we have decided to go ahead with this move because of efficiency and engineering considerations. We have also completed an exploratory implementation of pass-by-value for our ILU implementation. CORBA currently only provides for pass-by-reference of parameters and return values. While this is often the right mechanism, in our Digital Library applications, we frequently want to have objects migrate easily, with natural syntax for expressing the desired behavior.

3. Economics

Our work on shopping models progressed significantly, leading to a paper in the upcoming Digital Library conference (DL97). Shopping models separate the details of ordering, delivery, and payment from the sequence in which these activities occur. Our architecture and implementation provide for one CORBA object each to handle the merchant- and customer-specific ordering, delivery, and payment details. A new proxy, the shopping model, acts as a 'traffic cop' for messages between customer and merchant. This allows customer and merchants to participate in a variety of interaction models without changing their code. For example, we are able to model subscriptions, pay-per-view, and auctions without changes in the customer and merchant code.

4. User Interfaces

We have completed our Phase I user interface tests. Based on the results, we have been redesigning several aspects of our system.

Our SenseMaker interface has been redesigned to use 'Hi-cites'. These solve the problem of letting users scan a set of citations quickly, while being able to compare selected aspects of those citations. For example, a user may want to scan a large number of citations to find similar titles, identical authors, or the same publication year.

A tabular presentation of the citation components makes this easy. But when the citations come from different sources, that tablular representation frequently contains many empty fields. This may be happen, for example, when some citations refer to a bibliographic source, while others point to Web sites: for the latter, the author field will usually be empty.

On the other hand, one can arrange citations in the form of a traditional bibliography. This is much more efficient with respect to screen real-estate, but it does not allow scanning of attributes as easily as does the tabular representation.

Hi-cites combine the best of both worlds. They show citations in the form of a bibliography, but when the cursor is placed over one attribute within a citation, all corresponding attributes are highlighted in the other citations. If, for example, the cursor is placed over the title of one citation, the titles of all other citations are highlighted as well.

Our SenseMaker interface was also completely re-written in Java to facilitate distribution over the Web.

5. Searching

We continued work on our SONIA clustering service. This service takes a set of documents and clusters them through various algorithms. SONIA is now tightly integrated with our SenseMaker system. We also integrated the AutoClass clustering algorithm into SONIA and added a number of algorithmic extensions to the SONIA implementation.

We began developing the future interaction model for SONIA as both a document clustering and classification (user profiling) system. This sets the stage for integrating our work on hierarchical document classification into the testbed. The vision is for users to build their own small Yahoo! hierarchy. SONIA will then place incoming new documents into that hierarchy.

In the area of attribute model translation, we investigated how attribtute mappings between models can be specified. Specifically, we extended our original work so that the correspondence of attributes between models can be specified as one-to-many mappings under multiple relationships. For example, we are able to express that attribute 'author' in the Bib1 model maps to attribute 'creator' in Dublin core under the relationship 'generalizes-to'.

Using this more powerful scheme, we have implemented attribute translators for Bib1<->StanfordFront, Bib1<->DublinCore, and Refer<->BibTex. We have also done preliminary work in attribute value translation to cast field values as called for by one model into values acceptable in another.

6. Miscellaneous Activities

We participated in the NSF museum exhibit on Digital Libraries, providing demonstrations and posters.

6.1 Visitors and Industry Contacts

Marko Balabanovic: Met with Joerg Mueller (Mitsubishi digital libraries research spinoff, called Zuno).
Michelle Baldonado: Demo for visitors from Rockwell who are interested in SenseMaker.
Steve Ketchpel: Hosted a group of 12 students from Denmark
Steve Ketchpel: Discussions with Mike Wellman, University of Michigan, leading to further discussions with him & his group for making the UMDL AuctionBot a shopping model.
Rebecca Lasher: Bruce Antelmen and the InfoExpress staff met with Diglib staff.
Rebecca Lasher: Carl Lagoze to discuss NCSTRL and DL collaboration issues.
Rebecca Lasher: David Ely from CNRI and Melissa Dadant from the Library of Congress to discuss the Library of Congress copyright registration system. 16 technical reports were registered.
Rebecca Lasher: Participated in the Dublin Core conference in Canberra, Australia in March.
Rebecca Lasher: Rebecca hosted five visitors from Norway and Finland visited Stanford.
Andreas Paepcke Met with Thomas Sand of Hochschul-Informations-System GMBH to talk about the DL project.
Andreas Paepcke Was interviewed by John Adam for a story in NY Times Magazine.

6.2 Public Presentations and Meetings Attended

Marko Balabanovic: Presented paper and gave live demonstration of Fab at First Intnl Conf on Autonomous Agents, Marina del Rey CA.
Michelle Baldonado: SenseMaker at Apple Research.
Michelle Baldonado: SenseMaker at the Stanford Computer Forum, a Stanford industrial affiliates program.
Michelle Baldonado: Presented a paper at CHI '97
Michelle Baldonado: Presentation at the Seminar on People, Computers, and Design, a televised series of technical talks aimed at industry in Silicon Valley.
Hector Garcia-Molina: Next-Generation Digital Library System Research and Development Project Conference, Invited Keynote Speaker, Tokyo, Japan, March 18, 1997.
Hector Garcia-Molina: COMPCON Conference, Invited Keynote Speaker, San Jose, February 26, 1997.
Steve Ketchpel: Gave a talk at the Nobots research group at Stanford.
Andreas Paepcke: Attended WWW6.
Martin Roscheisen: The Stanford FIRM Framework for Interoperable Rights Management. Forum on Technologies for Intellectual Property Protection, Washington, DC. Interactive Media Association, White House Office of Science and Technology, White House Economic Council.
Mehran Sahami: "Hierarchically classfying documents using very few words" summarizing our results on feature selection, Bayesian model building and hierarchical classification techniques using text data at the Seminar on Computational Learning and Adaptation at Stanford University. The audience was primarily local researchers.
Mehran Sahami: "Creating personalized Yahoo!'s: Automated Hierarchical Clustering and Classification of Documents" presenting current results with the SONIA service as well as plans for future extensions of it. At the 29th Annual Meeting of the Stanford Computer Forum at Stanford University. The audience was predominantly industrial affilates.

6.3 Regular Meetings/Seminars

Weekly Digital Library seminar
Executive committee meetings when required
Weekly technical design meetings

7. References

[1] Michelle Q Wang Baldonado and Terry Winograd. SenseMaker: An Information-Exploration Interface Supporting the Contextual Evolution of a User's Interests. In Proceedings of the Conference on Human Factors in Computing Systems, 1997.

[2] Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for Digital Libraries: Architecture and Design Rationale. Number SIDL-WP-1997-0055. Stanford University, 1997. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0055.

[3] Chen-Chuan K. Chang and Hector Garcia-Molina. Evaluating the Cost of Boolean Query Mapping. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0053.

[4] Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier. The Digital Library Integrated Task Environment (DLITE). In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0049.

[5] Luis Gravano and Hector Garcia-Molina. Merging Ranks from Heterogeneous Internet Sources. In Submitted to the 23rd International Conference on Very Large Data Bases (VLDB'97), 1997.

[6] Steven Ketchpel. Distributed Commerce Transactions with Timing Deadlines and Direct Trust. 1997. Poster at International Joint Conference on AI'97.

[7] D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), 1997.

[8] Martin Röscheisen and Terry Winograd. A Network-Centric Design for Rights Management. Journal of Computer Security, 1997.