Integrating Schemas of Heterogeneous Database Systems


Stefano Spaccapietra, Martin Andersson, Yann Dupont

Department of Computer Science,
Swiss Federal Institute of Technology,
CH-1015 Lausanne, Switzerland


Kokou Yetongnon, Christine Parent, Christophe Nicolle

IEM Department, University of Burgundy,
F-21000 Dijon, France


Introduction

The increased use of databases in recent years has lead to the situation where semantically related information is stored in separate databases at different sites. To achieve easy access to all information gathered on a subject, it is desirable to have a global view of the data distributed in the different systems. One way to achieve such a global view and at the same time preserve the autonomy of the local systems is to create a loosely coupled federated database system as described in (Sheth-90).

An obstacle to the establishment of a federated database system is the heterogeneity of the local databases. Data model heterogeneity occurs when the local systems use different languages to define and manipulate data. Another type of heterogeneity occurs when one single phenomenon in the domain of discourse is modelled differently in local systems. Conversely, identical schema items in local systems can in fact represent different phenomena. A study of different types of heterogeneity has been presented in Conflicts and Correspondence Assertions in Interoperable Databases (Spaccapietra-91).

To overcome the problems of heterogeneity, a solution in two phases is usually advocated: translation followed by integration (see, Femus : A Federated Multilingual Database System (Andersson-93)) . In the translation phase, schemas in the local database systems, expressed in different data models, are translated into schemas in one single conceptual model that is used as intermediate model. In the integration phase, correspondence assertions are specified that relate items of the local schemas to each other. The integrator then builds a virtual federated schema according to the correspondence assertions. The federated schema describes data from all the local systems and can be translated back to the data models of the members of the federation. This allows a local user to query the federated schema using his own language. Note that a federated schema does not have to be global. There may be several possibly overlapping federated schemas according to the requirements of the applications. A federated schema can be constructed partly from other federated schemas.

The following two sections give an outline of the translation and the integration phase.

Translation

An extendable meta-model is used to map concepts in different data models that are semantically equivalent. The meta-model defines a set of meta-types that represent the concepts of the involved data models. The meta-types are organized in an inheritance hierarchy that reflects the similarities between the different modelling concepts. The only thing that has to be done when a new modelling concept is added to the meta-model is to choose an existing metatype from which the new metatype can be derived, and to define the transformation rules between the existing metatype and the new one.

The translation between a set of data models is carried out in two phases. First, the inheritance hierarchy is extended with concepts of the data models that are not already defined as meta-types. Next, suppose that the schema-types Sij and Skl, expressed in models Mi and Mk respectively, are instances of the same meta-type. When translating the schema Si to a schema in the model Mk, all Sij can be directly mapped to Skl. The main advantage of using a meta-model for the translation is the possibility to reuse translation rules in several data model translations. Moreover, with this approach only one translation per model is required, independently of the number of models in the federation. A more detailed description of the mechanism can be found in TIME - Traducteur Intelligent avec Metamodele Extensible, (Nicolle-95) .

It may happen that there is a semantic gap between the intermediate model and the models used by the members of the federation. A common situation is e.g. when relational databases are members of a federated database system. The information content of a relational schema is not sufficient to construct a conceptual schema. E.g., an entity relationship schema specifies a set of entity types or object types, whereas in a relational schema, there is no explicit representation of the object types. The relation schemes that carry properties of the same object type must be identified and translated to an entity type.

A method for the construction of a conceptual schema describing a relational database is presented in Extracting an Entity Relationship Schema from a Relational Database through Reverse Engineering, (Andersson-94) . The underlying assumption is that the data semantics is reflected in the utilization of the data. The method is divided in an information acquisition phase, when the application code is used to extract knowledge about the relational database system, followed by a schema generation phase. The method is supported by an interactive tool.

Integration

In Model-Independent Assertions for Integration of Heterogeneous Schemas, (Spaccapietra-92) , we proposed a new method to integrate heterogeneous source schemas based on assertions. The method solves structural, semantic and descriptive conflicts without changing the input schemas. For this purpose, interschema correspondence assertions, stated by the DBA, may relate any element in one schema to an element of any type in another schema. An assertion defines relationships between the element ``types'' (structural descriptions) and between the related ``classes'' (sets of associated instances). It also includes a efinition of the mapping between the objects at the instance level, which is necessary for the integration of object instances in the related databases. Integration rules are generically defined on an abstract ``generic data model'', which basically supports objects, value attributes and reference attributes. Their implementation in an actual integration of two input schemas is tailorable to suit the specific features of the input data model.

Implementation

A set of interactive tools for translation and integration of heterogeneous databases is currently being developed. These tools are built in connection with the SUPER environment described in SUPER: Visual Interfaces for Object Relationship Data Models, (Dennebouy-95). .

Bibliography

(Andersson-93) M. Andersson, Y. Dupont, S. Spaccapietra, K. Yetongnon, M. Tresch, H. Ye. ``Femus : A Federated Multilingual Database System'', Advanced Database Systems, LNCS 759, N.R. Adam, B.K. Bhargava [Eds], Springer-Verlag, New York 1993.

(Andersson-94) M. Andersson, ``Extracting an Entity Relationship Schema from a Relational Database through Reverse Engineering'', in Proceedings of the 13th Int. Conf. on the Entity Relationship Approach, Manchester UK, Dec. 1994

(Dennebouy-95) Y. Dennebouy, M. Andersson, A. Auddino, Y. Dupont, E. Fontana, M. Gentile, S. Spaccapietra, ``SUPER: Visual Interfaces for Object Relationship Data Models'', in Journal of Visual Languages and Computing, Special Issue on Visual Query Languages, 1995

(Nicolle-95) C. Nicolle, ``TIME - Traducteur Intelligent avec Metamodele Extensible, Technical Report, No.95/05, University of Burgundy, May 1995.

(Sheth-90) A. Sheth, J. Larson, ``Federated Database Systems for Managing Distributed and Heterogeneous and Autonomous Databases'', ACM Computing Survey, 22(3). September 1990.

(Spaccapietra-91) S. Spaccapietra, C. Parent, ``Conflicts and Correspondence Assertions in Interoperable Databases'', in SIGMOD RECORD, 20(4), December 91.

(Spaccapietra-92) S. Spaccapietra, C. Parent, Y. Dupont, ``Model-Independent Assertions for Integration of Heterogeneous Schemas'', in Very Large Database Journal, 1(1), July 1992.