Digital Library Project Stanford University Quarterly Report. August 1, 1995 1 Administrative ---------------- A survey of four of the six DLI Projects was produced and disseminated by Rebecca Lasher July 20th. On June 19 we held a meeting of the Advisory Board of our Digital Library Project, where our progress and plans were discussed. Eugene Miya, our NASA funding contact, was present. We conducted a search for an additional systems programmer. After we interviewed four candidates, we decided to reopen the search and continue interviewing others. 2 Infobus Architecture and Testbed ---------------------------------- Together with the University of Michigan and the University of Illinois we have started a joint Interoperability Experiment, whose goal is the development of a common protocol to access the collections at these and other sites. A design document had been written and circulated to the other universities. Bill Birmingham Doug Orr, Bruce Schatz, Andreas Paepcke, and Hector Garcia-Molina met in Ft. Lauderdale (at ARPA CSTO meeting) and discussed plans. A first implementation of the protocol has been done at Stanford, currently with three different servers. We have significantly expanded the Annotated Bibliography of digital library items, (53K on 5/1, 111K on 8/1), including entries with abstracts for all technical papers from DAGS'95, and over 2/3 of DL'95. 3 Economics ----------- We completed an evaluation of existing network payment mechanisms to select appropriate candidates for inclusion in the Stanford digital library project. Obtained merchant and user accounts for the most promising candidates: First Virtual and DigiCash. Began implementation of InterPay objects required to support these two. We have enhanced our prototype to reflect published version of InterPay architecture. We have continued investigation of issues and mechanisms for security in the digital library. Developed a list of security/protection properties for information transactions. We have continued our work on copy detection of digital documents. In particular, we have sought to understand the accuracy of relevance measures for detecting partial overlap of digital documents in a registration server based copy detection system. The SCAM (Stanford Copy Analysis Mechanism) system was developed to register documents and to compare incoming documents against registered documents for partial overlap. SCAM was implemented as a persistent database with several hooks to experiment with different relevance measures, and to evaluate the performance of the system. SCAM was successfully used in early May to find several instances of plagiarism in Technical Reports and Conference Papers (in a case that was well publicized on the InterNet). 4 User Interfaces ----------------- We have designed and prototyped an architecture that supports a generalized form of shared "annotations". It defines an enabling platform for various kinds of third-party value-added information on top of the existing World-Wide Web infrastructure. A rich variety of usages can be readily realized using this platform. These usages include shared comments, collaborative filtering, seals of approval, guided tours, usage indicators, co-presence, and Vannevar Bush-type trails 5 Searching ----------- We have done both theoretical and practical work on a "designator-based" framework for searching. The design is motivated by the way people use natural language to communicate information needs. An important characteristic of the framework is that it enables systems to provide query responses that have customized levels of abstraction and display properties. A prototype of this system has been built for interaction with a WWW search facility (WebCrawler). We have also addressed the problem of translating information retrieval queries into various target systems. Initially, we performed a feature analysis of query languages of some typical text retrieval systems including Dialog, WAIS, STN, BRS, and Stanford Foilo. We have completed theoretical work for translating user Boolean queries into target-specific queries and the corresponding post-processing required to carry out filtering not supported by a source. Algorithms based on a query normal form have been proposed to solve the mapping problem. A front-end query language has been designed and an implementation of the query mapping aimed at Dialog and Stanford Folio is under way. 6 Agents -------- Work on designing the new architecture for adaptive information searching agents has been completed, and an implementation is under way. Several talks have been given on experiments with the first prototype of the system. The system delivers documents of interest to a group of users, adapting over time given their feedback. 6 Miscellaneous Activities -------------------------- 6.1 Interviews None. 6.2 Visitors and Industry Contacts - Doug Orr, University of Michigan, May 25th - Barry Leiner, ARPA, May 31 - Steve Kirsch, Infoseek, June 21, June 30 - Mike Lanza, Oracle ConText, June 22 - Dow Chemical, Digital Libraries Team, June 28 - Silverplatter, Peter Ciuffetti, telephone discussion, July 14 - Michael Buckland, Berkeley, July 19th - Willy Chu, Mike Blasgen, IBM, July 26 - Daniela Rus, Dartmouth, July 26 - Carl Staelin, HP, July 27 - Mr. Patrick Baxin and Mr. Guy Jerram from the Municipal Library of Lyon met with Rebecca Lasher 6.3 Public Presentations and Meetings Attended Terry Winograd gave a keynote talk at the 1995 SIGIR Conference (July in Seattle) on "Digital vs. Libraries: Brinding the Two Cultures." Steve Ketchpel met with representative of the CommmerceNet "Payments" working group, and agreed to join future working group meetings. The Stanford Digital Libraries Project gave a series of talks at Interval Corporation. July 5. Andreas Paepcke and Hector Garcia-Molina attended the ARPA CSTO PI Meeting in Ft. Lauderdale, July 10-12. Rebecca Lasher attended the ADL 95 Conference, May 15-16. The HPCC/IITA Workshop on Digital Libraries was attended by Terry Winograd, Hector Garcia-Molina, Steve Ketchpel, and Rebecca Lasher. Hector Garcia-Molina, Sergey Brin, and N. Shivakumar attended the SIGMOD Conference. Our work on copy detection was presented, May 23-25. Marko Balabanovic gave the presentation "Learning to Surf" at the AAAI 1995 Spring Symposium on Information Gathering and at Stanford's Center for the Study of Language and Information lecture series on Intelligent Agents. Steve Ketchpel attended DAGS '95: "Electronic Publishing and the Information Superhighway", May 30 - June 3. Presented "Transaction Protection for Information Buyers and Sellers", which was selected as "Best Student Paper" for the conference. Steve Ketchpel and Rebecca Lasher attended DL '95: "Digital Libraries '95" June 11-13. Steve Ketchpel presented "InterPay: Managing Multiple Payment Mechanisms in Digital Libraries". Steve Ketchpel presented InterPay to CommerceNet group at the Knowledge Systems Laboratory, Stanford University. Martin Roscheisen presented "Beyond Browsing: Shared Comments, SOAPs, Trails, and On-line Communities" at the Third International World-Wide Web Conference in Darmstadt, Germany. Martin Roscheisen gave an invited talk on "The Stanford Digital Library Project and Annotation Service" at the German National Research Center GMD, Bonn, Germany. 6.4 Regular Meetings/Seminars - Weekly digital library seminar - Weekly executive committee meetings - Weekly technical design meetings 7 Bibliography -------------- These papers are also available on our home page: http://www-diglib.stanford.edu "An Adaptive Agent for Automated Web Browsing" M. Balabanovic, Y. Shoham and Y. Yun, To appear in Journal of Visual Communication and Image Representation 6(4) December 1995 (special issue on digital libraries) Martin Roscheisen, Terry Winograd, and Andreas Paepcke (1995). Content Rating and Other Third-Party Value-Added Applications for the World-Wide Web. CNRI Journal D-Lib, August. Martin Roscheisen, Christian Mogensen, and Terry Winograd (1994). A Platform for Third-Party Value-Added Information Providers: Architecture, Protocols, and Usage Examples. Technical Report, Stanford Integrated Digital Library Project, Computer Science Dept., Stanford University. November 1994, updated May 1995. Martin Roscheisen, Christian Mogensen, and Terry Winograd (1995). Interaction Design for Shared Commenting. CHI. May 1995, Denver. The Stanford Digital Library Project, The Stanford DL Team, Communications of the ACM, Digital Libraries issue, 1995. Transaction Protection for Information Buyers and Sellers, Steven Ketchpel, DAGS '95, Electronic Publishing and the Information Superhighway. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies, Luis Gravano, Hector Garcia-Molina, VLDB Conference, Zurich, September, 1995, to appear. SCAM: A Copy Detection Mechanism for Digital Documents, N. Shivakumar and H. Garcia-Molina, Digital Libraries '95, June 1995. InterPay: Managing Multiple Payment Mechanisms in Digital Libraries, S. Cousins, S. Ketchpel, A. Paepcke, H. Garcia-Molina, S. Hassan, and M. Roscheisen Digital Libraries '95, June 1995. Beyond Browsing: Shared Comments, SOAPs, Trails and On-line Communities, M. Roscheisen, C. Mogensen and T. Winograd, WWW'95, Darmstadt. Copy Detection Mechanisms for Digital Documents, S. Brin, J. Davis, and H. Garcia-Molina, ACM SIGMOD'95, May 1995, San Jose.