rmr@cs.stanford.edu
Christian Mogensen
mogensen@cs.stanford.edu
Terry Winograd
winograd@cs.stanford.edu
Computer Science Department
This paper describes a system we have implemented that enables people to share structured in-place annotations attached to material in arbitrary documents on the WWW. The basic conceptual decisions are laid out, and a prototypical example of the client-server interaction is given. We then explain the usage perspective, describe our experience with using the system, and discuss other experimental usages of our prototype implementation, such as collaborative filtering, seals of approval, and value-added trails. We show how this is a specific instantiation of a more general "virtual document" architecture in which, with the help of light-weight distributed meta information, viewed documents can incorporate material that is dynamically integrated from multiple distributed sources. Development of that architecture is part of a larger project on Digital Libraries that we are engaged in.
1 Introduction
There are many different reasons why people want to communicate to each other
about specific things they find as networked information resources. These
include comments and annotations of the kind a workgroup would share about their
common area of interest, the ability to have newsgroup-like fora associated with
specific items on the "net", value-added trails that link items together that
someone considers being connected under a particular view, systematic critique
and review information,
"Seals of Approval" (SOAPs), or filters in support of enabling people to make
sense of whatever information is presented to them.
All of these applications have in common two properties that are not associated with the standard mechanisms for hypertext (e.g., HTML):
For example, consider a set of "consumer reports" annotations provided by a review organization and attached to product catalogs on the web, or the private (within the group) comments made by our local research group as part of their joint reading of the WWW conference proceedings. In each case, the annotations themselves need to be kept separate from the annotated documents and access to them handled in a uniform way for appropriate subscribers.
We have developed a general mechanism for shared annotations, which enables people to annotate arbitrary documents at any position in-place, share comments/pointers with other people (either publicly or privately), and create shared "landmark" reference points in the information space. The framework represents a further step towards giving people a presence on the Web, laying the foundation for on-line communities.
This paper describes a general meta-information architecture ("BRIO") and an implementation of this general architecture, called "ComMentor," which we have developed for annotating pages on the WorldWideWeb. which realizes such usage scenarios in a particular instantiation; this includes a basic client-server protocol, meta-information description language ("PRDM"), a server system (currently based on an NCSA http 1.3 server with CGI scripts written in PERL), and a remodeled NCSA xMosaic 2.4 browser with interface augmentations to provide access to our extended functionality [CHI95, SLIDES].
The idea of a system enabling a multiplicity of independent individuals to create light-weight value-added "trails" through a document space was envisaged most prominently by Vannevar Bush as early as 1945 [BUSH]. The ComMentor system can be thought of as a tool which such "trail blazers" use to add value to contents, and which other people use to find guidance based on these human-created super-structures.
There are a number of existing systems that incorporate mechanisms that are related to our current architecture. These include the annotations facility in Lotus Notes (which requires making available "hooks" for annotation attachment in a given document), annotations in Acrobat (which are in-place but not shared), and various other in-place facilities which are based on a shared file system to allow multiple people to share comments (these include ForComment(TM), and some versions of MS Word--both of which are not incremental, that is, only one user can write comments at a given time, and then pass on this right.) In the World-Wide Web arena, there has been ongoing discussion about the appropriate mechanisms for group annotations, but these have generally assumed whole-document attachment (rather than attachment in place) and universal (rather than access-controlled) distribution. They have not been developed into widely used facilities. There are a few experimental systems dealing with annotations of various kinds (mentioned in the references) including Ubique [UBIQUE], which uses a proprietary architecture, geared towards synchronous user-user communication rather than value-added structures.
In the first section of this paper, we describe the overall architecture including a description of the user view and a note on how documents are synthesized. In the second section, we present sample scenarios for the client-server interaction. In the remainder of the paper, we discuss our experience using the facility and describe a variety of usage scenarios that we are currently aware of.
Further technical detail is available in the extended technical report [CSDTR95], from the Stanford Digital Library Project.
2 System Architecture
In this section, we describe the basic system structure, the meta-information
server design, the user view of the annotation system, and the current interface
design of the browser. We also include a brief technical note on how documents
are dynamically synthesized.
Users interact with a browser to retrieve documents from various document servers. In addition, there are meta-information servers from which relevant information can be retrieved according to the protocol defined in this paper. Meta-information "items" like annotations are organized into "sets" to which members of "groups" have access. For example, depending on the context set by the user, the browser can decide to retrieve annotations from certain meta information servers for every document the user looks at, and display a version of the document in which annotations to various segments are shown at their appropriate position in the text.
Conceptually, a "browser" can be understood as consisting of
The figure shows the basic entities: members, who form groups, which access annotation sets that contain annotations (from left to right).
The architecture is based on "annotation sets". Every annotation belongs to a particular set and annotates a particular page at some specific location. Every set is associated with a particular server ("annotation server") and identity (like a URL). The server holds the index (but not necessarily the contents) of all the annotations in the set, and will in general be distinct from the server which provides the annotated documents (and will not necessarily be under the control or resource allocation of the original writer). Each annotation set can be thought of as a "distributed document," stored in one place but consisting of a set of annotations that are associated with pages anywhere on the Web.
Access control is managed per annotation set. Some obvious examples are
Sets are particular information objects to which authenticated people stand in a certain authorization relation; groups are the human-organizational objects. There may well be more than one annotation set associated with a group, each used for different purposes.
The mechanism for authenticating access to meta-information sets (by virtue of being member in a group with appropriate access authorization) is an issue which is kept orthogonal in the design: any authentication mechanism (private keys, public keys, etc.) can be used. This allows us to separate out group membership (with a possibly very secure approval mechanism) and multiple sets of annotations (e.g. on different topics) which anyone in a specific group can easily access. Users can either have read or read/write access to a set by virtue of their group membership. Since there is no net-wide standard for individual identity and access control group membership, we have implemented a simple mechanism for establishing individual identities (with associated passwords) and group membership. We expect in the future that these will be generic network functions, and we will use the standard mechanisms that emerge.
For illustration, consider the equivalent in the UNIX file system: before being able to use a workstation, one needs an account ("group membership"), and once this is established, the user needs to authenticate herself. Then, once "logged in", users can create files and directories ("Sets") in whatever way is authorized by their group memberships for a particular location (e.g. write access for the group 'users' in the home directory). This distinction allows for differently strong authentication methods to be used without inhibiting the ease of creating directories/sets.
Annotations can also be examined with a light-weight viewer which pops up a small window (which looks like a PostIt(TM)) when the icon is selected with the middle button and removes it when the button is released. This tool is a generic meta viewer which can be used in a variety of contexts to get "preview" information faster than it is possible with a full document-view window. It is useful as a general interface augmentation for mosaic browsers in contexts such as examining whether or not to follow a hyperlink, which might lead to an expensive document.
Annotations can be filtered according to different selections; a typical usage is to filter review information for category labels to see only documents with a certain rating. Note that queries for particular items from the meta information server are supported by its database backend.
Since annotations are documents too, they can be recursively annotated.
2.4 Dynamically Synthesizing Documents: The Merge Library
The merge library is the set of procedures that actually synthesize the document
that is then rendered from the documents and annotations that are relevant in a
given context. It contains procedures that take a document and a PRDM list of
comment items, and return a document where the comments are in-lined and given
an appropriate rendering. The procedures are specific for the content type of
the document.
The method of attachment for HTML and plain text is currently based on string position trees (Patricia trees for positions; cf. [KNUTH]) applied to a canonical representation of the document. Each comment object has associated with it the highlighted text string as well as the position identifier string.
Position identifier strings are useful in this context because they are by definition the smallest internal identifying string, and therefore are likely to be robust against modifications of the underlying document. They also allow for changes to be detected. Beyond the minimal string, there is some redundantly stored information to give a context for reattachment in case the document was modified and the minimal string itself does not uniquely identify a position any more.
If the position cannot be recovered, the browser appends the corresponding item marked as "Unassigned" to the end of the document.
For a flexible system with maximum interactivity and generality, it is quite useful to have incremental insertion mechanisms; that is, not all relevant meta-information pertaining to a document in a given context has to be known in the beginning. An example would be that a user activates a new set, or one set is returned by a slower server, or another server has already pre-merged the meta-information from some of the sets while the browser merges the rest of the sets. Then we want to be able to merge in the additional meta-information from the relevant sets incrementally. Note that the requirement of incremental insertions determines how the merge algorithm has to look like. For example, global position identifiers such as simple position counts would not work in a straight-forward sense since they are affected by previous insertions. This raises the need for a canonical representation of a text in a certain format which embodies all the features that are significant for attachment but do still allow text transformations which are invariant with respect to the canonical form. (For example, inserting an HTML comment into an HTML file should preferably not affect the position identifiers.)
Procedures for other content types, especially images (and possibly external viewers), have not yet been examined. If the type already supports a concept of attached annotations (e.g., PDF in the Acrobat viewer), then we could make use of that directly in identifying the position of annotations which are stored on a group server.
3 Client-Server Interaction: Sample Scenarios
In this section, we give two sample scenarios of how browser and
server interact:
Retrieving the base document is the standard document browsing ("GET") interaction.
Whenever Joe accesses a new page, the browser sends out a request for annotations related to the URL just loaded to the annotation server; this request includes:
The server sends back a string of meta-information which the client uses to merge with the original document. The document is then rendered with the annotations inside it.
The merging of the document and the annotations is done by a set of procedures based on redundant information and minimal length tree descriptors which uniquely identify the position in a document and which are designed to be maximally robust against change of the underlying document. (See also the section on the "merge library" for synthesizing documents from other documents and accompanying meta information.)
Setting up an annotation server requires augmenting a conventional http server with the appropriate scripts. The groups which are available at such a server are described in conventional HTML pages which are augmented by meta-information as to who is the owner, whether the group is public, and other properties. These pages can be browsed in the usual way. We have set up an initial What's New type list which contains currently only our servers, but others are likely to come, and people can use such lists to browse for groups they want to join in the same way they also browse for other documents (which also means that such information defines the interface for search "agents" which can be looking for new groups of relevance).
The annotation server sends back a page containing meta-information about the group. The browser extracts this information, displays the page, and indicates in its user interface that the current group is eligible for a request for membership.
Example: The Join Group
button is made active.
A new user 'Joe' checks out the server and discovers a few interesting annotation sets there. Unfortunately, they are all closed to non-members, so Joe applies to join a group named Outsiders which is described as:
This group is intended for interested parties outside the lab. It will allow you to access the following annotation sets and contribute to the discussion of our research by adding your own annotations. The documents of interest tend to reside on this server. Check each annotation set for details.
The group administrator receives e-mail notification of the request and approves the request by sending a reply back to the annotation server's mail gateway. In other words, joining a group is much like joining a mailing list. However, a group is merely an access authorization unit. Groups have nothing to do with the actual creation of annotations; they are dealt with fairly infrequently (like setting up a new UNIX account for someone).
After receiving an e-mail notification of the approval, Joe decides to
go back to the annotation server and select a few annotation sets.
On viewing a page describing an annotation set, his browser detects
the embedded meta-information and enables the
"Add Annotation Set"
button in the user interface.
Joe adds the annotation set "HTML Standardization" to his list of subscribed sets. His browser pulls out the embedded meta-information and adds the information to the list of stored annotation sets. The meta-information includes the set's name and location as well as some caching data such as which documents have been annotated.
Joe then activates the set. Annotations for all active sets are always requested from annotation servers by the browser--conceptually for every browsed document (but a caching mechanism by which only those document servers are queried which are known to be annotated leads to an efficient implementation).
4 Usage Scenarios
In this section, we present some usage scenarios which follow naturally
from the described architecture.
anyone
to have read access to the annotation set. A collection of annotations are usually rendered in-line as small icons. Each icon is a link to a dynamically generated HTML page containing the annotation text. Such a page can then be annotated recursively, or a follow-up comment can be made (which is usually rendered linearly)--providing a threaded stream of comments.
Note that the actual visual presentation of comments is independent of the meta information transmitted; different browser designers can experiment with different renderings without affecting the underlying functionality.
A useful feature when using this sort of set is the ability to query the set for recent additions. If one is interested only in annotations which are less than a day old, or written by other members of a particular group, then the meta information server can perform these selections as queries over the meta information database, and make this information available as links to the relevant locations. This makes it easy to track new annotations to a given document or a particularly interesting discussion.
Annotation sets can replace the notion of a hotlist: the user simply marks interesting pages with respect to a particular set as she visits them. Marking a page adds a piece of meta-information to the chosen set. A straightforward extension is the use of multiple hotlists, like one for each topic: different sets. Such hotlists would then be retrieved by asking the server for a list of all documents in the set. An additional benefit is that the hotlist can be shared by several clients simultaneously.
Another use of personal sets is the possibility of creating very informal shared sets that exist outside the formal group structure. In other words, if someone wants to share information with his office mate, it is possible to create a set to which both have read and write access.
The tour mechanism is currently implemented simply as a (specifically typed) list of links for each comment thread and a two-pane browsing interface such that the tour context is always maintained in one view and the tour focus is rendered in the other view (e.g. the annotated document with the comment included). Each such tour link will bring the user to where the comment in the annotated page is located. Since it enables users to find annotations that are distributed over a number of documents without needing to check for changes all the pages of those documents, this greatly extends the value of the annotations. We are investigating the advantages of more sophisticated graphical visualizations, such as maps.
Trails can be applied to implement multiple guided tours through the same content without confusing the users: For illustration consider a set of paintings and their descriptions such as they can found in the "Web Louvre". It is now possible to create annotation sets "impressionist painters", "chronological tour", and others, and add trails marks (a special annotation type) to each set which form a linked tour according to the set semantics. Then, if people wish to look at the paintings in a chronological order, they can activate the corresponding set, and follow along a tour by clicking on the trail mark icons. If they wish a different tour, then they can turn on another set. In any case, they will not be confused by multiple signs on a given page, because the annotation architecture allows for underlying contents and connecting super-structures to be separated (in the static representation) and dynamically synthesized together based on a chosen user context.
As a hypothetical example, consider the case that someone has a high opinion of the quality ratings which a certain academy issues about documents, and wants to be advised by these ratings while browsing for documents. Then, such an academy could create a Guidance SOAP; this is just an annotation set where the access authorizations are chosen such that anyone can read it, but only fellows of the academy can write new comments. Then, interested users can can enable the agency's annotation set, and when they browse documents, they can quickly get an idea of how valuable it will in all likelihood be to spend time on a given document: by glancing at the seals. Note that with the tour mechanism it is moreover possible to look explicitly at those documents for which a very positive rating exists.
SOAPs can be implemented in the given system in a straight-forward way; they are just annotations whose content follows some category system. Browsers can then exploit this systematicity by associating some action model to the various categories; such actions can range from popup hints to not retrieving a particular page.
Acknowledgements
Members of the Project on People, Computers, and Design at Stanford
Unversity, in particular Michelle Baldonado and Steve Cousins, have
provided valuable feedback throughout the design and development of
the system.