Schema matching

The terms schema matching and mapping are often used interchangeably for a database process. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student
and DB2.Grad-Student ; possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades.
Automating these two approaches has been one of the fundamental tasks of data integration. In general, it is not possible to determine fully automatically the different correspondences between two schemas — primarily because of the differing and often not explicated or documented semantics of the two schemas.

Impediments

Among others, common challenges to automating matching and mapping have been previously classified in especially for relational DB schemas; and in – a fairly comprehensive list of heterogeneity not limited to the relational model recognizing schematic vs semantic differences/heterogeneity. Most of these heterogeneities exist because schemas use different representations or definitions to represent the same information ; OR different expressions, units, and precision result in conflicting representations of the same data.
Research in schema matching seeks to provide automated support to the process of finding semantic matches between two schemas. This process is made harder due to heterogeneities at the following levels

Syntactic heterogeneity – differences in the language used for representing the elements
Structural heterogeneity – differences in the types, structures of the elements
Model / Representational heterogeneity – differences in the underlying models or their representations
Semantic heterogeneity – where the same real world entity is represented using different terms or vice versa
Schema matching

Methodology

Discusses a generic methodology for the task of schema integration or the activities involved. According to the authors, one can view the integration.

Preintegration — An analysis of schemas is carried out before integration to decide upon some integration policy. This governs the choice of schemas to be integrated, the order of integration, and a possible assignment of preferences to entire schemas or portions of schemas.
Comparison of the Schemas — Schemas are analyzed and compared to determine the correspondences among concepts and detect possible conflicts. Interschema properties may be discovered while comparing schemas.
Conforming the Schemas — Once conflicts are detected, an effort is made to resolve them so that the merging of various schemas is possible.
Merging and Restructuring — Now the schemas are ready to be superimposed, giving rise to some intermediate integrated schema. The intermediate results are analyzed and, if necessary, restructured in order to achieve several desirable qualities.
Approaches

Approaches to schema integration can be broadly classified as ones that exploit either just schema information or schema and instance level information.
Schema-level matchers only consider schema information, not instance data. The available information includes the usual properties of schema elements, such as name, description, data type, relationship types, constraints, and schema structure. Working at the element or structure level, these properties are used to identify matching elements in two schemas. Language-based or linguistic matchers use names and text to find semantically similar schema elements. Constraint based matchers exploit constraints often contained in schemas. Such constraints are used to define data types and value ranges, uniqueness, optionality, relationship types and cardinalities, etc. Constraints in two input schemas are matched to determine the similarity of the schema elements.
Instance-level matchers use instance-level data to gather important insight into the contents and meaning of the schema elements. These are typically used in addition to schema level matches in order to boost the confidence in match results, more so when the information available at the schema level is insufficient. Matchers at this level use linguistic and constraint based characterization of instances. For example, using linguistic techniques, it might be possible to look at the Dept, DeptName and EmpName instances to conclude that DeptName is a better match candidate for Dept than EmpName. Constraints like zipcodes must be 5 digits long or format of phone numbers may allow matching of such types of instance data.
Hybrid matchers directly combine several matching approaches to determine match candidates based on multiple criteria or information sources.
Most of these techniques also employ additional information such as dictionaries, thesauri, and user-provided match or mismatch information
Reusing matching information
Another initiative has been to re-use previous matching information as auxiliary information for future matching tasks. The motivation for this work is that structures or substructures often repeat, for example in schemas in the E-commerce domain. Such a reuse of previous matches however needs to be a careful choice. It is possible that such a reuse makes sense only for some part of a new schema or only in some domains. For example, Salary and Income may be considered identical in a payroll application but not in a tax reporting application. There are several open ended challenges in such reuse that deserves further work.
Sample Prototypes
Typically, the implementation of such matching techniques can be classified as being either rule based or learner based systems. The complementary nature of these different approaches has instigated a number of applications using a combination of techniques depending on the nature of the domain or application under consideration.

Identified relationships

The relationship types between objects that are identified at the end of a matching process are typically those with set semantics such as overlap, disjointness, exclusion, equivalence, or subsumption. The logical encodings of these relationships are what they mean. Among others, an early attempt to use description logics for schema integration and identifying such relationships was presented. Several state of the art matching tools today and those benchmarked in the Ontology Alignment Evaluation Initiative are capable of identifying many such simple and complex matches between objects.

Evaluation of quality

The quality of schema matching is commonly measure by precision and recall. While precision measures the number of correctly matched pairs out of all pairs that
were matched, recall measures how many of the actual pairs have been matched.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...