Content:
Administrative matters
InfoBus Architecture and Testbed
Economics
User Interfaces
Searching
Miscellaneous Activities
References to Papers Produced During Reporting Period
1. Administrative
All students who were interested found summer positions in Silicon
Valley companies. We look forward to having them all back. Steve
Cousins joined Xerox PARC as a full-time employee. We are still
working closely with him. DLITE will be undergoing continued
development, both at Stanford, and at PARC.
Scott Hassan left us to join a startup company. We miss him, but wish him the best of luck. We're sure he will propel the company to great fortunes.
Many of us attended the DL all projects meeting in Pittsburgh where we participated with talks and demonstrations of our testbed.
The whole group held a two-day retreat at Asilomar, Ca, where we
coordinated this coming year's focus, but mostly explored research
challenges for the coming years.
2. InfoBus Architecture and Testbed
We installed a DLITE machine at the NASA Ames library. It will be
used for continued user testing.
We constructed InfoBus access through electronic mail. Queries can now be submitted via email. The queries are translated by our InfoBus translation service, and are then distributed simultaneously to all destinations via the InfoBus. The returned information is combined and emailed back to the sender.
We used this new email interface to access the InfoBus from a Palm Pilot, a hand-held device.
The implementation of our metadata architecture has also progressed. We finished work on attribute name translation during the last quarter. This quarter we started to work on attribute value translation.
Our work on shopping models progressed significantly, leading to a paper in the 1997 ACM Digital Library conference (DL97). Shopping models separate the details of ordering, delivery, and payment from the sequence in which these activities occur. Our architecture and implementation provide for one CORBA object each to handle the merchant- and customer-specific ordering, delivery, and payment details. A new proxy, the shopping model, acts as a 'traffic cop' for messages between customer and merchant. This allows customer and merchants to participate in a variety of interaction models without changing their code. For example, we are able to model subscriptions, pay-per-view, and auctions without changes in the customer and merchant code.
We have completed our Phase I user interface tests. Based on the results, we have been redesigning several aspects of our system.
Our SenseMaker interface has been redesigned to use 'Hi-cites'. These solve the problem of letting users scan a set of citations quickly, while being able to compare selected aspects of those citations. For example, a user may want to scan a large number of citations to find similar titles, identical authors, or the same publication year.
A tabular presentation of the citation components makes this easy. But when the citations come from different sources, that tablular representation frequently contains many empty fields. This may be happen, for example, when some citations refer to a bibliographic source, while others point to Web sites: for the latter, the author field will usually be empty.
On the other hand, one can arrange citations in the form of a traditional bibliography. This is much more efficient with respect to screen real-estate, but it does not allow scanning of attributes as easily as does the tabular representation.
Hi-cites combine the best of both worlds. They show citations in the form of a bibliography, but when the cursor is placed over one attribute within a citation, all corresponding attributes are highlighted in the other citations. If, for example, the cursor is placed over the title of one citation, the titles of all other citations are highlighted as well.
Our SenseMaker interface was also completely re-written in Java to facilitate distribution over the Web.
We performed a 14-person user study of hi-cites in order to compare them to tables and bibliographies for the task of comparing attributes. We completed the statistical analysis of this study, which showed that hi-cites are preferred to the other conditions, are subjectively judged to be the fastest for this task, are significantly faster than bibliographies, and are not significantly different in actual time from tables.
We began a new effort in our information search thrust. The result of this effort will allow us to build specialized graphical query constructors easily. For example, if we wish to make it easy for users to access a special database for information on consumer electronics, we will be able to interactively construct a graphical component in DLITE which displays the relevant input fields for the user. Our initial prototype will stress textual queries. However, the system is designed to allow the specification of more complex input widgets, such as images the user points to for specifying query inputs such as latitude/longitude on a map. The interface design for this 'query constructor constructor' (QCC) is nearing completion. Implementation of the prototype is just beginning.
The SONIA work has continued. We incorporated a user profiling and classification component into the SONIA service. This summer, we are working to develop the theory and practical algorithms to improve SONIA's ability to cluster and classify documents autonomously. We are making significant technical progress on new and improved algorithms for automatically generating topic hierarchies from unclassified data. This is in contrast to the learning from classified data that much of our previous work has focused on.
We constructed a large-scale (1000+ users) simulation of our Fab system. The last pieces are currently being debugged. Users are simulated by assuming they have preferences among topics which are represented by human-generated "editorial categories", like the ones generated by Yahoo or Reuters. We are also negotiating the use of a large movie preference dataset from Digital Equipment Corporation, to conduct collaborative filtering experiments.
[1] Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for Digital Libraries: Architecture and Design Rationale. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0055.
[2] Chen-Chuan K. Chang and Hector Garcia-Molina. Evaluating the Cost of Boolean Query Mapping. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0053.
[3] Steve B. Cousins, Andreas Paepcke, Terry Winograd, Eric A. Bier, and Ken Pier. The Digital Library Integrated Task Environment (DLITE). In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0049.
[4] Arturo Crespo and Hector Garcia-Molina. Awareness Services for Digital Libraries. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997.
[5] Daniela Florescu, Daphne Koller, and Alon Levy. Awareness Services for Digital Libraries. In Proceedings of the Twenty-third International Conference on Very Large Databases, 1997.
[6] Steven P. Ketchpel, Hector Garcia-Molina, and Andreas Paepcke. Shopping Models: A Flexible Architecture for Information Commerce. In Proceedings of the Fourth Annual Conference on the Theory and Practice of Digital Libraries, 1997. At http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0052.
[7] D. Koller and M. Sahami. Hierarchically Classifying Documents Using Very Few Words. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), 1997.
[8] Mehran Sahami. Applications of Machine Learning to Information Access. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997.
[9] Mehran Sahami, Salim Yusufali, and Michelle Q. Wang Baldonado. Real-time Full-text Clustering of Networked Documents. In AAAI-97, Proceedings of the Fourteenth National Conference on Artificial Intelligence, 1997.
[10] Terry Winograd. The Design of Interaction. In Peter Denning and Bob Metcalfe, editors, Beyond Calculation, The Next 50 Years of Computing, pp. 149-162. Springer-Verlag, 1997.