Similarity Space and Its Applications

Rozinek, Ondřej

Publikace:
Similarity Space and Its Applications

Disertační práceopen access

Soubory

Plný text práce (6.77 MB)

Posudek vedoucího práce (81.36 KB)

Posudek oponenta práce (138.37 KB)

Posudek oponenta práce (94.03 KB)

Datum

2024

Autoři

Rozinek, Ondřej

Nakladatel

Univerzita Pardubice

Abstrakt

Mathematical spaces have been studied for centuries and belong to the basic mathematical theories, which are used in various real-world applications. In general, a mathematical space is a set of mathematical objects with an associated structure. This structure can be specified by a number of operations on the objects of the set. These operations must satisfy certain axioms of mathematical space. Similarity and dissimilarity functions are widely used in many research areas: in information retrieval, data mining, machine learning, cluster analysis and applications in database search, protein sequence comparison and many more. When a dissimilarity function is used, a distance metric is normally required. On the other hand, although similarity functions are used, there is no formally accepted definition of this concept. In this dissertation is used for the first time the novel term similarity space. A significant contribution of this dissertation is the identification of a class of functions that satisfy the axioms of similarity space, alongside the development of novel mathematical theorems and definitions that extend our understanding of similarity. This includes the exploration of duality between similarity and metric spaces, the introduction of normalization transformations that addresses to solution to open unsolved problem, and the establishment of new descriptions and definitions for convergence, continuity, and other fundamental properties within similarity spaces. A significant section is dedicated to developing a new fixed-point theory in similarity space, establishing solutions for differential equations, and introducing a new convergence criterion for the Newton method. Another theoretical contribution is the novel application of similarity space in linear regression. Within the framework of Natural Language Processing (NLP) and Artificial Intelligence (AI), this dissertation applies theoretical insights to address real-world challenges, particularly in the areas of approximate string matching, complex fuzzy record matching and deduplication. By developing a novel convolution-based string matching model, proposing an advanced mathematical model for fuzzy record similarity, and introducing an optimal Q-gram filter for bipartite matching, this research presents novel solutions that significantly improve upon the state-of-the-art methods in terms of efficiency, accuracy, and applicability. In conclusion, this dissertation not only advances the theoretical understanding of similarity spaces but also demonstrates their vast potential for application in data processing and analysis. By bridging the gap between abstract mathematical theory and practical computational challenges, this work lays the groundwork for future innovations across broad range of fields.

Klíčová slova

similarity metric, similarity space, normalized similarity, edit distance, Jaccard coefficient, Q-gram filter, indexing method, approximate string matching, record linkage, entity resolution, record deduplication, similarity search, similarity join, linear regression, fixed point

Permanentní identifikátor

https://hdl.handle.net/10195/83131

Kolekce

Disertační práce / Dissertations FEI (Ph.D.)
Vysokoškolské kvalifikační práce / Theses, dissertations, etc.

Zobrazit úplný záznam

Publikace:
Similarity Space and Its Applications

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace: Similarity Space and Its Applications

Soubory

Datum

Autoři

Název časopisu

ISSN časopisu

Název svazku

Nakladatel

Výzkumné projekty

Organizační jednotky

Číslo časopisu

Abstrakt

Popis

Klíčová slova

Citace

Permanentní identifikátor

Kolekce

Endorsement

Review

Supplemented By

Referenced By

Publikace:
Similarity Space and Its Applications