Entropy-based hierarchization of relational data structures

Daniel Vodňanský


The problem of incompatibility between relational data structures (usually a structured, non-hierarchical and non-redundant) and hierarchical structures (usually XML, which may be redundant and arbitrarily structured) may cause problems in data transformations. A common example of such incompatible transformation is RDBS data to HTML. Such transformation often increases redundancy. This paper studies aspects of the conceptual (entity-relationship) model that break the hierarchical structure and can be mapped into a relational model. The most common example of such aspect is presented - the problem of hierarchy convergence and the M:N relation cardinality usage. To assess the quality of final XML document and schema, a calculation method derived from Shannon's average amount of information is proposed. This method compares two possible directions of problematic data hierarchization. The assessment is based on specific data, helping to validate its own schema.

