Stanford InfoLab Publication Server

Correspondence and Translation for Heterogeneous Data

Abiteboul, S. and Cluet, S. and Milo, T. (1996) Correspondence and Translation for Heterogeneous Data. Technical Report. Stanford InfoLab. (Publication Note: Database Theory - ICDT '97, 6th International Conference, Delphi, Greece, January 8-10, 1997)




A primary motivation for new database technology is to provide support for the broad spectrum of multimedia data available notably through the network. These data are stored under different formats: SQL or ODMG (in SGML or LaTex (documents), DX formats (scientific Step (CAD/CAM etc. Their integration is a very active field of research and development (see for instance, for a very small sample, [10, 6, 7, 9, 8, 12, 19, In this paper, we provide a formal foundation to facilitate the integration of such heterogeneous data and the maintenance of heterogeneous replicated data. A sound solution for a data integration task requires a clean abstraction of the different formats in which data are stored, and means for specifying the correspondences/relationships between data in different worlds and for translating data from one world to another. For that we introduce a middleware data model that serves as a basis for the integration task, and declarative rules for specifying the integration. The choice of the middleware data model is clearly essential. One common trend in data integration over heterogeneous models has always been to use an integrating model that encompasses the source models. We take an opposite approach here, i.e., our model minimalist. The data structure we use consists of ordered labeled trees. We claim that this simple model is general enough to capture the essence of formats we are interested in. Even though a mapping from a richer data model to this model may loose some of the original semantics, the data itself is preserved and the integration with other data models is facilitated. Our model is similar to the one used in [7] and to the OEM model for unstructured data (see, e.g., [21, This is n

Item Type:Techreport (Technical Report)
Uncontrolled Keywords:database, integration, format, heterogeneous, replication
Subjects:Computer Science > Semistructured Data
Related URLs:Project Homepage
ID Code:147
Deposited By:Import Account
Deposited On:25 Feb 2000 16:00
Last Modified:08 Dec 2008 14:14

Download statistics

Repository Staff Only: item control page