Digitální knihovna UPCE přechází na novou verzi. Omluvte prosím případné komplikace. / The UPCE Digital Library is migrating to a new version. We apologize for any inconvenience.

Publikace:
Similarity Space and Its Applications

Disertační práceopen access
Načítá se...
Náhled

Datum

Autoři

Rozinek, Ondřej

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Univerzita Pardubice

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Mathematical spaces have been studied for centuries and belong to the basic mathematical theories, which are used in various real-world applications. In general, a mathematical space is a set of mathematical objects with an associated structure. This structure can be specified by a number of operations on the objects of the set. These operations must satisfy certain axioms of mathematical space. Similarity and dissimilarity functions are widely used in many research areas: in information retrieval, data mining, machine learning, cluster analysis and applications in database search, protein sequence comparison and many more. When a dissimilarity function is used, a distance metric is normally required. On the other hand, although similarity functions are used, there is no formally accepted definition of this concept. In this dissertation is used for the first time the novel term similarity space. A significant contribution of this dissertation is the identification of a class of functions that satisfy the axioms of similarity space, alongside the development of novel mathematical theorems and definitions that extend our understanding of similarity. This includes the exploration of duality between similarity and metric spaces, the introduction of normalization transformations that addresses to solution to open unsolved problem, and the establishment of new descriptions and definitions for convergence, continuity, and other fundamental properties within similarity spaces. A significant section is dedicated to developing a new fixed-point theory in similarity space, establishing solutions for differential equations, and introducing a new convergence criterion for the Newton method. Another theoretical contribution is the novel application of similarity space in linear regression. Within the framework of Natural Language Processing (NLP) and Artificial Intelligence (AI), this dissertation applies theoretical insights to address real-world challenges, particularly in the areas of approximate string matching, complex fuzzy record matching and deduplication. By developing a novel convolution-based string matching model, proposing an advanced mathematical model for fuzzy record similarity, and introducing an optimal Q-gram filter for bipartite matching, this research presents novel solutions that significantly improve upon the state-of-the-art methods in terms of efficiency, accuracy, and applicability. In conclusion, this dissertation not only advances the theoretical understanding of similarity spaces but also demonstrates their vast potential for application in data processing and analysis. By bridging the gap between abstract mathematical theory and practical computational challenges, this work lays the groundwork for future innovations across broad range of fields.

Popis

Klíčová slova

similarity metric, similarity space, normalized similarity, edit distance, Jaccard coefficient, Q-gram filter, indexing method, approximate string matching, record linkage, entity resolution, record deduplication, similarity search, similarity join, linear regression, fixed point

Citace

Permanentní identifikátor

Endorsement

Review

Supplemented By

Referenced By