Digital Library Project
Stanford University
Quarterly Report. Aug 1, 1997
Reporting Period: May 1, 1997-July 31, 1997

Content:

Administrative matters
InfoBus Architecture and Testbed
Economics
User Interfaces
Searching
Miscellaneous Activities
References to Papers Produced During Reporting Period

1. Administrative

All students who were interested found summer positions in Silicon Valley companies. We look forward to having them all back. Steve Cousins joined Xerox PARC as a full-time employee. We are still working closely with him. DLITE will be undergoing continued development, both at Stanford, and at PARC.

Scott Hassan left us to join a startup company. We miss him, but wish him the best of luck. We're sure he will propel the company to great fortunes.

Many of us attended the DL all projects meeting in Pittsburgh where we participated with talks and demonstrations of our testbed.

The whole group held a two-day retreat at Asilomar, Ca, where we coordinated this coming year's focus, but mostly explored research challenges for the coming years.

2. InfoBus Architecture and Testbed

We installed a DLITE machine at the NASA Ames library. It will be used for continued user testing.

We constructed InfoBus access through electronic mail. Queries can now be submitted via email. The queries are translated by our InfoBus translation service, and are then distributed simultaneously to all destinations via the InfoBus. The returned information is combined and emailed back to the sender.

We used this new email interface to access the InfoBus from a Palm Pilot, a hand-held device.

The implementation of our metadata architecture has also progressed. We finished work on attribute name translation during the last quarter. This quarter we started to work on attribute value translation.

3. Economics

Our work on shopping models progressed significantly, leading to a paper in the 1997 ACM Digital Library conference (DL97). Shopping models separate the details of ordering, delivery, and payment from the sequence in which these activities occur. Our architecture and implementation provide for one CORBA object each to handle the merchant- and customer-specific ordering, delivery, and payment details. A new proxy, the shopping model, acts as a 'traffic cop' for messages between customer and merchant. This allows customer and merchants to participate in a variety of interaction models without changing their code. For example, we are able to model subscriptions, pay-per-view, and auctions without changes in the customer and merchant code.

4. User Interfaces

We have completed our Phase I user interface tests. Based on the results, we have been redesigning several aspects of our system.

Our SenseMaker interface has been redesigned to use 'Hi-cites'. These solve the problem of letting users scan a set of citations quickly, while being able to compare selected aspects of those citations. For example, a user may want to scan a large number of citations to find similar titles, identical authors, or the same publication year.

A tabular presentation of the citation components makes this easy. But when the citations come from different sources, that tablular representation frequently contains many empty fields. This may be happen, for example, when some citations refer to a bibliographic source, while others point to Web sites: for the latter, the author field will usually be empty.

On the other hand, one can arrange citations in the form of a traditional bibliography. This is much more efficient with respect to screen real-estate, but it does not allow scanning of attributes as easily as does the tabular representation.

Hi-cites combine the best of both worlds. They show citations in the form of a bibliography, but when the cursor is placed over one attribute within a citation, all corresponding attributes are highlighted in the other citations. If, for example, the cursor is placed over the title of one citation, the titles of all other citations are highlighted as well.

Our SenseMaker interface was also completely re-written in Java to facilitate distribution over the Web.

We performed a 14-person user study of hi-cites in order to compare them to tables and bibliographies for the task of comparing attributes. We completed the statistical analysis of this study, which showed that hi-cites are preferred to the other conditions, are subjectively judged to be the fastest for this task, are significantly faster than bibliographies, and are not significantly different in actual time from tables.

5. Searching

We began a new effort in our information search thrust. The result of this effort will allow us to build specialized graphical query constructors easily. For example, if we wish to make it easy for users to access a special database for information on consumer electronics, we will be able to interactively construct a graphical component in DLITE which displays the relevant input fields for the user. Our initial prototype will stress textual queries. However, the system is designed to allow the specification of more complex input widgets, such as images the user points to for specifying query inputs such as latitude/longitude on a map. The interface design for this 'query constructor constructor' (QCC) is nearing completion. Implementation of the prototype is just beginning.

The SONIA work has continued. We incorporated a user profiling and classification component into the SONIA service. This summer, we are working to develop the theory and practical algorithms to improve SONIA's ability to cluster and classify documents autonomously. We are making significant technical progress on new and improved algorithms for automatically generating topic hierarchies from unclassified data. This is in contrast to the learning from classified data that much of our previous work has focused on.

We constructed a large-scale (1000+ users) simulation of our Fab system. The last pieces are currently being debugged. Users are simulated by assuming they have preferences among topics which are represented by human-generated "editorial categories", like the ones generated by Yahoo or Reuters. We are also negotiating the use of a large movie preference dataset from Digital Equipment Corporation, to conduct collaborative filtering experiments.

6. Miscellaneous Activities

6.1 Visitors and Industry Contacts

6.2 Public Presentations and Meetings Attended

6.3 Regular Meetings/Seminars

We created a video of our SenseMaker system. In addition, we published the following papers.

7. References

[1] Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for Digital Libraries: Architecture and Design Rationale. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0055.

[2] Chen-Chuan K. Chang and Hector Garcia-Molina. Evaluating the Cost of Boolean Query Mapping. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0053.

[3] Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier. The Digital Library Integrated Task Environment (DLITE). In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0049.

[4] Arturo Crespo and Hector Garcia-Molina. Awareness Services for Digital Libraries. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997.

[5] Daniela Florescu, Daphne Koller, and Alon Levy. Awareness Services for Digital Libraries. In Proceedings of the Twenty-third International Conference on Very Large Databases, 1997.

[6] Steven P. Ketchpel, Hector Garcia-Molina, and Andreas Paepcke. Shopping Models: A Flexible Architecture for Information Commerce. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0052.

[7] D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), 1997.

[8] Mehran Sahami. Applications of Machine Learning to Information Access. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997.

[9] Mehran Sahami, Salim Yusufali, and Michelle Q. Wang Baldonado. Real-time Full-text Clustering of Networked Documents. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997.

[10] Terry Winograd. The Design of Interaction. In Peter Denning and Bob Metcalfe, editors, Beyond Calculation, The Next 50 Years of Computing, pp. 149-162. Springer-Verlag, 1997.