Digital Library Project Stanford University Quarterly Report. August 1, 1996 Reporting Period: May 1-July 31, 1996 1. Administrative ----------------- We redesigned our Web presence (http://www-diglib.stanford.edu). Many of us attended the University of Michigan DLI meeting, and various administrative tasks were pursued there. 2. InfoBus Architecture and Testbed ------------------------------------ We developed a Web proxy generator. This is a facility which allows users to define the behavior of Web-based search services. Using this description, the facility automatically constructs an InfoBus proxy which can then provide access to the service through the InfoBus. Work progressed on further integrating the SCAM service into the InfoBus architecture. When this work is finished, SCAM will be accessible via method calls. In addition, a Java interface is being developed. The second version of InterBib was released. It now provides support for RTF and for bibliography search. The RTF support means that users may now submit MS Word documents and BibTeX or Refer bibliographies to InterBib, and have reference lists added to the documents. Version one only supported Framemaker. The search facility allows users to search over all the bibliographies that were previously submitted to InterBib, were parsed correctly, and were released for public access by the submitters. InterBib is available at http://www-interbib.stanford.edu/~testbed/interbib. Work progressed on allowing views to be added to objects. This facility helps when multiple services interact with document objects, and each facility would like to 'pretend' that the documents have different attributes and behavior. With document views, each module can attach a different view to the document objects it operates on. This is like dynamically adding a superclass to the object's class. We are also exploring additional uses of views, such as access control to documents, the dynamic addition of payment capabilities, etc. 3. Economics ------------ Work continued on InterPay II. We continued to focus on a more general payment interoperability model. See our new publication in the bibliography section for details. Work on relationship management for access control (R-MANAGE) moved from design towards implementation. We have been augmenting the basic testbed infrastructure by rights management facilities in order to deal with access control issues such as licensing and copyrights. The basic approach is based on a contract framework, adding both persons ("epers") and relationships ("commpacts") as first-class citizens to interface and architecture: - An "e-person" (or "epers") is a persistent, access-controlled electronic representation of a person, that provides a single point of reference for everything related to this person, including controlled ways to get hold of personal information, to request approvals, to eave behind notfications, to automtically set up certain standard relationships (such as accounts with content providers), etc. An epers is essentially a generalization of a Unix account. - A "commpact" is a "communication pact" between e-persons that will just be the representation of a legal contract in many cases, but it can also encapsulate less formal agreements (such as privacy related ones). Commpacts are technically a set of rights and obligations. For example, the obligation to pay a certain sum for a subscription to a newsletter might be one such promise in a subscription commpact. Actions are then evaluated (authorized) with respect to the context of a previously established commpact. (R-MANAGE integrates with InterPay II at this action level. Much of its functionality is available to users as part of a separate "relationship manager" task in the task interface.) 4. User Interfaces ------------------ The SensMaker interface is being re-worked to experiment with a metaphor that begins with a catalog of 'cards' to get users started, but then moves quickly beyond the traditional use of card catalogs. This redesign is being undertaken in preparation of user tests around the SenseMaker concepts. The DLITE interface has been demonstrated often, and continues to be expanded. 5. Searching ------------ In the wake of the DLI meeting, we constructed first versions of interfaces to CSQuest and CMU's Informedia project. CSQuest is a concept space over Inspec which was constructed at the University of Arizona (see University of Illinois's DL project for an explanation of this term). We are planning to use this facility for our interactive query development module. The CMU proxy searches over Informedia's video text, title and abstract database, not the video itself. We are planning to move this first implementation to CMU's evolving RPC interface. Our front-end query language was extended to enable more effective queries to UCSB's ADL map service. Specifically, our data types now include numeric values, and the attribute set has been extended to include various spatial aspects. We also extended the query translation mechanism to add missing features, such as negation and stemming. Translation service to AltaVista (a Boolean search engine on the Web) was added to the testbed. The query translation implementation was ported to SunOS and Solaris. 6. Agents --------- A proxy for our Fab personalizing Web retrieval was added to the testbed. The initial evaluation of Fab, involving ten users over one month, has been completed. A more extensive test involving several hundred users is under preparation. 7. Proposal for Machine-Machine Search Engine --------------------------------------------- A new draft of STARTS was completed in preparation of a ratification workshop on August 1, 1996. STARTS is a proposed standardized interface to be supported by commercial search engines. It will help programs that attempt to identify sources relevant to a query, search over multiple engines, and merge the results. The proposal facilitates three aspects of this work: Submitting queries to search engines with different native query languages, accessibility of information about collections stored within sources, and the retrieval of result set statistics to help with rank-merging. The draft was circulated among search engine vendors, several major commercial search engine users, and the Z39.50 community. It is available at http://www-db.stanford.edu/~gravano/starts_home.html. 8. Miscellaneous Activities --------------------------- 8.1 Visitors and Industry Contacts - Rebecca Lasher Met with Tuija Sonkkila from Helsinki. - Paul Allen. - Phone interview with Bob Burkman, Editor of "Information Advisor" a technical newletter. - NTT Data of Japan. - Rebecca Lasher and Vicky Reich met with Mark Stefik of Xerox PARC. - Vicky Reich met with Paivi Kytomaki, Director of a library in Finland. - Vicky Reich met with Kuriyama Masamitu, a reference librarian from Japan. - Daphne Koller and Vicky Reich met with visitors from Hitachi. - Vicky Reich met with visitors from Matsushita. 8.2 Public Presentations and Meetings Attended - Group attendance at DLI Meeting in at the University of Michigan. - Marko Balabanovic: "Adaptive Information Retrieval on the World-Wide Web", CSLI Interface Lab Tutorials, May 13 1996 (part of a series of talks for the CSLI industrial affiliates). - Luis Gravano: Invited talk, W3C Distributed Indexing/Searching Workshop (Cambridge, Massachusetts, May 28-29) - Steve Ketchpel: "Making Trust Explicit in Distributed Commerce Transactions". Presentation at International conference on Distributed Computing Systems - Rebecca Lasher, Andreas Paepcke, Scott Hassan: Carl Lagoze and Jim Davis of Cornell University - Rebecca Lasher: Presentation on digital libraries and the economics of electronic journals at the annual Special Libraries conference in Boston June 9, 1996 - Rebecca Lasher: NCSTRL meeting in Washington D.C. - Andreas Paepcke: Invited talk on digital libraries, object file systems, and databases in the context of the Web and distributed object technology. OMG/W3C workshop on distributed objects and mobile code in Boston. - Vicky Reich and Rebecca Lasher: Attended a full day class on copyright at UC Berkeley, May 4th. - N. Shivakumar: Phone interview with Phil Ross, Reporter, Forbes Magazine. - Terry Winograd: Talk and book signing, Stanford book store. - Terry Winograd: Presentation at Interval Research: "User models and ontologies". 8.4 Regular Meetings/Seminars - Weekly Digital Library seminar - Executive committee meetings when required - Weekly technical design meetings 9. Bibliography --------------- The following are publications that were submitted or accepted for publication. Please see also the working papers section at our Web site for up-to-date information of a less formal nature. Michelle Q Wang Baldonado, and Steve B. Cousins. Addressing Heterogeneity in the Networked Information Environment. Submitted to journal "Review of Information Networking" Edward Chang and Hector Garcia-Molina: Reducing Initial Latency in Multimedia Storage System. IEEE Multimedia Database Conference to be held in August, 1996. K.C.C. Chang, H. Garcia-Molina, A. Paepcke. Boolean Query Mapping Across Heterogeneous Information Sources. Accepted to the IEEE Transactions on Knowledge and Data Engineering as part of a special section of concise research papers on Digital Libraries (Aug. 1996) Arturo Crespo, Eric A. Bier. WebWriter: A Browser-based Editor for Constructing Web Applications. Proceedings of the 5th WWW Conference, Paris. 1996. H. Garcia-Molina, L. Gravano, N. Shivakumar. dSCAM: Finding Document Copies Across Multiple Databases. Proceedings of 4th International Conference on Parallel and Distributed Information Systems, Miami Beach, Florida, December, 1996. Steven Ketchpel, Hector Garcia-Molina, Andreas Paepcke, Scott Hassan, Steve Cousins. "UPAI: A Universal Payment Application Interface". USENIX 2nd e-commerce workshop. Steven Ketchpel and Hector Garcia-Molina. Making Trust Explicit in Distributed Commerce Transactions. International conference on Distributed Computing Systems, 1996. Andreas Paepcke. Information Needs in Technical Work Settings and their Implications for the Design of Computer Tools. CSCW Journal, 5(1), 1996. Andreas Paepcke. Digital Libraries: Searching Is Not Enough (What We Learned On-Site). Dlib Magazine, May, 1996. Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayal. >From User Access Patterns to Dynamic Hypertext Linking. Proceedings of the 5th WWW Conference, Paris. 1996.