Digital Library Project
Stanford University
Quarterly Report. May 1, 1997
Reporting Period: Feb 1, 1997-April 30, 1997


Administrative matters
InfoBus Architecture and Testbed
User Interfaces
Miscellaneous Activities
References to Papers Produced During Reporting Period

1. Administrative

On April 29 we had our annual site visit. Our presentations are online at our Web site. They provide an excellent snapshot of our project.

Our annotated Digital Library bibliography has been mirrored as part of a large set CS Bibliographies in Germany.

The Math/CS Library received a Windows 95 PC to begin Phase II testing of the InfoBus and our DLITE interface. Several library staff members have accounts on the system and will be introducing it to patrons. The NASA Ames and Xerox PARC libraries will be the other two Phase II deployment sites.

2. InfoBus Architecture and Testbed

Work on the InfoBus infrastructure focused mainly on moving us to threaded operations. This is turning out to be an 'interesting' problem in that threading can interact with the many TCP/IP connections maintained by the underlying CORBA mechanisms. Nevertheless, we have decided to go ahead with this move because of efficiency and engineering considerations. We have also completed an exploratory implementation of pass-by-value for our ILU implementation. CORBA currently only provides for pass-by-reference of parameters and return values. While this is often the right mechanism, in our Digital Library applications, we frequently want to have objects migrate easily, with natural syntax for expressing the desired behavior.

3. Economics

Our work on shopping models progressed significantly, leading to a paper in the upcoming Digital Library conference (DL97). Shopping models separate the details of ordering, delivery, and payment from the sequence in which these activities occur. Our architecture and implementation provide for one CORBA object each to handle the merchant- and customer-specific ordering, delivery, and payment details. A new proxy, the shopping model, acts as a 'traffic cop' for messages between customer and merchant. This allows customer and merchants to participate in a variety of interaction models without changing their code. For example, we are able to model subscriptions, pay-per-view, and auctions without changes in the customer and merchant code.

4. User Interfaces

We have completed our Phase I user interface tests. Based on the results, we have been redesigning several aspects of our system.

Our SenseMaker interface has been redesigned to use 'Hi-cites'. These solve the problem of letting users scan a set of citations quickly, while being able to compare selected aspects of those citations. For example, a user may want to scan a large number of citations to find similar titles, identical authors, or the same publication year.

A tabular presentation of the citation components makes this easy. But when the citations come from different sources, that tablular representation frequently contains many empty fields. This may be happen, for example, when some citations refer to a bibliographic source, while others point to Web sites: for the latter, the author field will usually be empty.

On the other hand, one can arrange citations in the form of a traditional bibliography. This is much more efficient with respect to screen real-estate, but it does not allow scanning of attributes as easily as does the tabular representation.

Hi-cites combine the best of both worlds. They show citations in the form of a bibliography, but when the cursor is placed over one attribute within a citation, all corresponding attributes are highlighted in the other citations. If, for example, the cursor is placed over the title of one citation, the titles of all other citations are highlighted as well.

Our SenseMaker interface was also completely re-written in Java to facilitate distribution over the Web.

5. Searching

We continued work on our SONIA clustering service. This service takes a set of documents and clusters them through various algorithms. SONIA is now tightly integrated with our SenseMaker system. We also integrated the AutoClass clustering algorithm into SONIA and added a number of algorithmic extensions to the SONIA implementation.

We began developing the future interaction model for SONIA as both a document clustering and classification (user profiling) system. This sets the stage for integrating our work on hierarchical document classification into the testbed. The vision is for users to build their own small Yahoo! hierarchy. SONIA will then place incoming new documents into that hierarchy.

In the area of attribute model translation, we investigated how attribtute mappings between models can be specified. Specifically, we extended our original work so that the correspondence of attributes between models can be specified as one-to-many mappings under multiple relationships. For example, we are able to express that attribute 'author' in the Bib1 model maps to attribute 'creator' in Dublin core under the relationship 'generalizes-to'.

Using this more powerful scheme, we have implemented attribute translators for Bib1<->StanfordFront, Bib1<->DublinCore, and Refer<->BibTex. We have also done preliminary work in attribute value translation to cast field values as called for by one model into values acceptable in another.

6. Miscellaneous Activities

We participated in the NSF museum exhibit on Digital Libraries, providing demonstrations and posters.

6.1 Visitors and Industry Contacts

6.2 Public Presentations and Meetings Attended

6.3 Regular Meetings/Seminars

7. References

[1] Michelle Q Wang Baldonado and Terry Winograd. SenseMaker: An Information-Exploration Interface Supporting the Contextual Evolution of a User's Interests. In Proceedings of the Conference on Human Factors in Computing Systems, 1997.

[2] Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for Digital Libraries: Architecture and Design Rationale. Number SIDL-WP-1997-0055. Stanford University, 1997. Accessible at

[3] Chen-Chuan K. Chang and Hector Garcia-Molina. Evaluating the Cost of Boolean Query Mapping. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At

[4] Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier. The Digital Library Integrated Task Environment (DLITE). In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. Accessible at

[5] Luis Gravano and Hector Garcia-Molina. Merging Ranks from Heterogeneous Internet Sources. In Submitted to the 23rd International Conference on Very Large Data Bases (VLDB'97), 1997.

[6] Steven Ketchpel. Distributed Commerce Transactions with Timing Deadlines and Direct Trust. 1997. Poster at International Joint Conference on AI'97.

[7] D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), 1997.

[8] Martin Röscheisen and Terry Winograd. A Network-Centric Design for Rights Management. Journal of Computer Security, 1997.