Digital Library Project Stanford University Quarterly Report. November 1, 1996 Reporting Period: August 1-October 31, 1996 http://www-diglib.stanford.edu 1. Administrative ----------------- We started preparations for Stanford's hosting of the December 16/17 '96 meeting of DLI participants. We also spent time preparing for an additional workshop day on library issues to be held Dec. 15. (http://www-diglib.stanford.edu/dli). Carl Lagoze and David Fielding of Cornell University, and Jim Davis of Xerox PARC joined us part time. They bring rich, relevant experience from their work on the NCSTRL project. Several students spent their summer at surrounding companies, mostly continuing Digital Library research, though usually with their host companies' goals driving their direction. This has enriched the project, as this is bringing new ideas and feedback to us. 2. InfoBus Architecture and Testbed ------------------------------------ We added a stand-alone collection service to the InfoBus. This allows InfoBus clients to store documents and other objects in DLIOP compliant collections with a variety of underlying storage managers. This facility is beginning to get used throughout the system. The ability for search proxies to support subcollections was added. This allows convenient access to external services with multiple collection offerings. Example: Knight-Ridder's Dialog Information Service. A new proxy to the Xerox PARC document summarization service was developed. A new proxy for the NCSTRL collection was added to the InfoBus by our colleagues at Cornell. A proxy was constructed which uses a converter from NZDL to convert postscript to approximate text. We began building a DLITE component and proxy to TextBridge, a Xerox remotely accessible OCR service. The intent is for users to send a document image. The service then returns an OCRed copy. This work is ongoing, but will probably be limited to use within Xerox PARC. InterBib was developed further to accept HTML documents, in addition to Framemaker and Word. Citations embedded in the documents are resolved from BibTex and Refer files submitted with the documents. InterBib was also fixed to properly handle submissions originating from Macintosh computers. Previously, only PC and Unix were handled properly. We booked first successes with a DLITE interface version implemented entirely in Java. The applet delivers the interface and the required CORBA capabilities. Once the applet is delivered and runs at the client host, it communicates with the InfoBus components via CORBA (ILU) remote method calls. Work progressed on making the InfoBus testbed thread-safe. We have several proxies and parts of SenseMaker running with threads. More work is needed in this area. 3. Economics ------------ The following economics-related interface components (with their corresponding backend proxies) are now available in the Dlite interface: - person objects, contract objects, and certificate objects - searchable person "home provider" services for Stanford and Xerox PARC - a searchable contract forms provider for standard contract forms - a contract manager and an offer creation service - "file system" manager based on persistent collections - miscellaneous proxies for searching persons (whois). - certification services: sample proxies for certifying simple properties such as affiliation with Stanford (Stanford lookup proxy). DL proxies now also have an "owner" with whom users can contract for usage terms and conditions. Authentication: Based on the person representation and public-key credentials (RSA/md5) issued by the home provider, a "network login" facility has been added to the testbed. Both the browser (Netscape via cookies) and the DLITE task viewers are thus able to convey who is using them, and testbed services can securely identify their users. Work progressed on InterPay II. We completed a paper on the payment part of the new system (See U-PAI publication below). Its implementation is ongoing. We are also implementing a 'financial center' object to allow interactive payment through various payment schemes. 4. User Interfaces ------------------ We ran several user studies for our DLITE interface. Several changes have been made to the system in response to these studies. We added another interface component to DLITE for helping users compose fielded queries. This is to help novice users who need fielded, but keyword-only query entry. This work was undertaken in response to our user testing. Subcollection support was added to our source constructor. This allows users to create interface components that represent subcollections of external services. Dropping queries into these will cause searches in those corresponding subcollections. We undertook several user tests in which users were asked to complete a bibliographic task with different versions of DLITE. These tests have caused us to change various aspects of the interface. Our WebWriter and InterBib systems were incorporated into the testbed and into DLITE. 5. Searching ------------ Our SenseMaker search interface progressed further this quarter. Recall that SenseMaker users "make sense" out of their result collections by looking at them through multiple views. Within a view, complexity is reduced through user-directed "merging" and "bundling" of results. Now, SenseMaker users can also contextually evolve the direction of the search process once they have made sense of the current collection of results. They can expand upon, limit, or replace the current collection of results. Examples of expand actions implemented this quarter include: 1) Query-by-example. Users can point to "bundles" of related results and ask for them to serve as examples of what is to be found. 2) Query refinement. Users can change their queries directly and can also accept suggestions as to new terms they might use in their query. These suggestions are obtained from a proxy that we built for the U. of Arizona/ U. of Illinois CSQuest service. Two user studies were conducted to test the SenseMaker interface. In our query translation project we conducted experiments for measuring the cost of our query translation approach. We have compared the selectivity of front-end and translated queries to understand the post-filtering cost. Specifically, the experiment was desined to measure selectivity degeneration with respect to the following translation schemes: - When a query consisting of proximity operators must be replaced by weaker operators such as AND. - When a query includes stopwords that must be removed from the query. - When a query uses the Equals operator (i.e. phrase search) which must be replaced by the Contains operator (i.e., keyword search). In support of several subprojects we designed a set of metadata components which will work on the infobus, and which will satisfy several of our metadata needs. The work is documented in a paper by Baldonado/Chang/Gravano/Paepcke (see below). 6. Agents --------- Fab system progess: Following the introduction of a collaborative filtering component, significant speedups and a redesigned user interface were added to Fab, our learning Web agent. Fab will start to take on users at the beginning of November for a new experiment to compare collaborative and content-based recommendation, and to investigate the possible uses of psychographic profiles. We hope to ramp up to 500 users by the end of the year (from the current 47). Additionally, during this reporting period an interface has been built to allow use of Fab from within our DLITE interface. 7. STARTS Proposal for Meta-Search Support ------------------------------------------ As reported last quarter, we have been active in the definition of a proposal to support metasearching on the Internet. The proposal addresses three problems encountered by services that search multiple, heterogeneous search engines to satisfy a given query: finding promising collections, submitting appropriate forms of the query to the corresponding engines, and merging result rankings. We held a one day workshop with several major search engine providers and consumers to reach agreement on a final draft. This draft is available at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1996-0043. The Z39.50 community is working on a Z39.50 profile based on STARTS. Our Cornell colleagues are working on a reference implementation of the protocol. 8. Miscellaneous Activities --------------------------- 8.1 Visitors and Industry Contacts Prof. Jerry Saltzer of MIT visited for one month, meeting individually with project team members and attending the seminars and weekly technical design meetings. - Marko Balabanovic Gerry Andeen and John Eastling, Personal Discovery (will collaborate on psychographic profiling for web page recommendation) - Marko Balabanovic Paul Francis, NTT Japan (investigating relationships to their Ingrid system) - Marko Balabanovic Journalist from Frankfurt newspaper - Marko Balabanovic Thomas Bayer, Daimler Benz Research (automatically distributing email inquiries) - Michelle Baldonado, Steve Cousins, Luis Gravano Two visitors from India. - Michelle Baldonado Paul Francis from NTT - Kevin Chang Talked with Carl Lagoze (NCSTRL, Cornell Univ.) about suggested changes to the Dienst query language to facilitate integration of NCSTRL into our testbed. - Talked with Doreen Cheng from Phillips Palo Alto Research Lab on our work on query translation. - Steve Cousins Met with Roy Jones from the Stanford Business School to discuss DL issues. - Steve Cousins Met with Stanford Alumni to present the Digital Library project. - Hector Garcia-Molina Talked to John Sarborg Pedersen from Embassy of Denmark about emerging DigLib project in Denmark. - Hector Garcia-Molina, Andreas Paepcke, Terry Winograd, Rebecca Lasher: Steve Griffin of NSF - Steve Ketchpel Attended AAAI as member of program committee - Daphne Koller Met with Dr. Iwayama and Dr. Niwa from Hitachi. - Daphne Koller Met with Dr. Alon Levy from AT&T Research on the topic of architectures for intelligent information gathering. - Rebecca Lasher Carole Alcock, librarian from Australia. - Rebecca Lasher Carl Lagoze, from Cornell at Stanford. - Rebecca Lasher Met with five visitors from Denmark, who are guiding digital library development there. The group represented the Ministry of Research and Information Technology, Ministry of Culture, Ministry of Research, Ministry of Education, and the Royal Danish Embassy. - Rebecca Lasher Andrew Odlyzko, from AT&T. - Vicky Reich Entertained visitors from Max Planck Institute. - Vicky Reich and Martin Roscheisen Invitational Workshop on Terms and Conditions, NY, Sept 22-24th. Organized by Jim Davis and Judith Klavans. - Terry Winograd, Steve Cousins, Scott Hassan Visitors from Sony (included discussion and demo): Toshitada Doi, President of the D21 laboratory, and member of Sony's Board of Directors (Formerly, Dr. Doi ran Sony's workstation operations worldwide). Masao Watari, Manager of the Speech Group, D21 laboratory Hiroaki Ogawa, Engineer of the Speech Group, D21 Laboratory Mick Tanaka, Manager of Speech Recognition, Sony Research Labs (San Jose) - Terry Winograd Tomoyuki Yoshida, National Institute of Bioscience and Human Technology Watanabe Masayoshi, MITI Kazushige Suzuki, Research Institute of Human Engineering for Quality Life Masaki Taniguchi, Osaka National Research Institute 8.2 Public Presentations and Meetings Attended We organized a one-day workshop where several major search engine providers and consumers discussed the STARTS proposal. - Luis Gravano Attended a meeting of Z39.50 implementors to help launch a new Z39.50 STARTS-based profile. - Steve Ketchpel Gave talk at Rudgers University during workshop on Trust Management in Networks - Steve Ketchpel Talk at AT&T Research to their online information systems & Services group about U-PAI and Distributed Transactions. - Daphne Koller Attended the annual American Association for Artificial Intelligence conference in Portland, Oregon. - Daphne Koller Attended the annual Uncertainty in Artificial Intelligence conference in Portland, Oregon. - Rebecca Lasher and Vicky Reich Met with ACM publications staff as part of a librarian advisory committee on electronic publications. - Andreas Paepcke Presented our metadata architecture at the 2nd Delos workshop of ERCIM, a research consortium of the European Union. - Mehran Sahami Attended and presented at the annual Machine Learning conference in Bari, Italy. - Mehran Sahami Attended the annual American Association for Artificial Intelligence conference in Portland, Oregon. - Mehran Sahami Attended the annual Uncertainty in Artificial Intelligence conference in Portland, Oregon. - Mehran Sahami Attended and presented at the annual Knowledge Discovery in Databases conference in Portland, Oregon. - Terry Winograd Talk at Adobe, Mountain View, on the digital libraries project: Digital Libraries, Documents, and Services - Terry Winograd Organized and spoke at workshop on HCI Design, including talk on "Designing the Space of Interactions" - Terry Winograd Workshop at Stanford on copyright and new technology, organized by the Register of Copyrights, and Stanford Law School. - Terry Winograd Invited speaker at the Silicon Valley Chapter of the Association for Software Design 8.4 Regular Meetings/Seminars - Weekly Digital Library seminar - Executive committee meetings when required - Weekly technical design meetings 9. Bibliography --------------- The following are publications that were submitted or accepted for publication. Please see also the working papers section at our Web site for up-to-date information of a less formal nature. Marko Balabanovic. An Adaptive Web Page Recommendation Service. To appear in the First International Conference on Autonomous Agents, February 1997, Marina del Rey, CA. Marko Balabanovic and Yoav Shoham. Combining Content-Based and Collaborative Recommendation. To appear in the Communications of the ACM, Special Issue on Recommender Systems, March 97. Marko Balabanovic. Table of Publicly Available On-line Recommendation Services. Prepared for Communications of the ACM, special issue on recommender systems. March 97. Michelle Q Wang Baldonado and Terry Winograd. SenseMaker: An Information-Exploration Interface Supporting the Contextual Evolution of a User's Interests. Submitted to CHI '97. Michelle Baldonado, Kevin Chang, Luis Gravano, Andreas Paepcke. The Stanford Digital Library Metadata Architecture. Submitted to the International Journal of Digital Libraries. Edward Chang and Hector Garcia-Molina. Reducing Initial Latency In A Multimedia Storage System. Proceedings of the Third International Workshop on Multimedia Database Systems, August 1996. Edward Chang and Hector Garcia-Molina. Minimizing Memory Use In Video Servers. Submitted to SIGMOD. K.C.C. Chang, H. Garcia-Molina, A. Paepcke. Boolean Query Mapping Across Heterogeneous Information Sources (Extended Version). 1996. This is an extended version of our paper in IEEE Transactions on Knowledge and Data Engineering (Aug, 1996) which we reported on last quarter. Luis Gravano, Kevin Chang, Hector Garcia-Molina, Andreas Paepcke. STARTS: Stanford Proposal for Internet Meta-Searching. Submitted to SIGMOD. Ron Kohavi and Mehran Sahami. Error-Based and Entropy-Based Discretization of Continuous Features. Second International Conference on Knowledge Discovery in Databases, 1996. Available at ftp://starry.stanford.edu/pub/sahami/papers/kdd96-disc.ps D.Koller and Y. Shoham. Information agents: A new challenge for AI. IEEE Expert, June 1996, pages 8--10. (By invitation) D. Koller and M. Sahami. Toward Optimal Feature Selection. In ICML-96: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284-292, San Francisco, CA: Morgan Kaufmann. 1996. M. Sahami, M. Hearst and E. Saund. Applying the Multiple Cause Mixture Model to Text Categorization. In ICML-96: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 435-443, San Francisco, CA: Morgan Kaufmann. 1996. This paper was based on work done at Xerox Palo Alto Research Center through a summer internship funded by the Stanford Digital Libraries Project. Available at ftp://starry.stanford.edu/pub/sahami/papers/ml96-mcmm.ps Mehran Sahami. Learning Limited Dependence Bayesian Classifiers. Second International Conference on Knowledge Discovery in Databases, 1996. Available at ftp://starry.stanford.edu/pub/sahami/papers/kdd96-learn-bn.ps 10. Other Publications ---------------------- With the help of Xerox PARC a new, updated video tape of the DLITE interface was produced.