Publikace: Similarity Space and Its Applications
Disertační práceopen accessNačítá se...
Datum
Autoři
Rozinek, Ondřej
Název časopisu
ISSN časopisu
Název svazku
Nakladatel
Univerzita Pardubice
Abstrakt
Mathematical spaces have been studied for centuries and belong to the basic mathematical
theories, which are used in various real-world applications. In general, a mathematical
space is a set of mathematical objects with an associated structure. This structure can
be specified by a number of operations on the objects of the set. These operations must
satisfy certain axioms of mathematical space.
Similarity and dissimilarity functions are widely used in many research areas: in information
retrieval, data mining, machine learning, cluster analysis and applications in
database search, protein sequence comparison and many more. When a dissimilarity
function is used, a distance metric is normally required. On the other hand, although
similarity functions are used, there is no formally accepted definition of this concept. In
this dissertation is used for the first time the novel term similarity space.
A significant contribution of this dissertation is the identification of a class of functions
that satisfy the axioms of similarity space, alongside the development of novel mathematical
theorems and definitions that extend our understanding of similarity. This includes
the exploration of duality between similarity and metric spaces, the introduction of normalization
transformations that addresses to solution to open unsolved problem, and the
establishment of new descriptions and definitions for convergence, continuity, and other
fundamental properties within similarity spaces. A significant section is dedicated to developing
a new fixed-point theory in similarity space, establishing solutions for differential
equations, and introducing a new convergence criterion for the Newton method. Another
theoretical contribution is the novel application of similarity space in linear regression.
Within the framework of Natural Language Processing (NLP) and Artificial Intelligence
(AI), this dissertation applies theoretical insights to address real-world challenges,
particularly in the areas of approximate string matching, complex fuzzy record matching
and deduplication. By developing a novel convolution-based string matching model,
proposing an advanced mathematical model for fuzzy record similarity, and introducing
an optimal Q-gram filter for bipartite matching, this research presents novel solutions that
significantly improve upon the state-of-the-art methods in terms of efficiency, accuracy,
and applicability.
In conclusion, this dissertation not only advances the theoretical understanding of similarity
spaces but also demonstrates their vast potential for application in data processing
and analysis. By bridging the gap between abstract mathematical theory and practical
computational challenges, this work lays the groundwork for future innovations across
broad range of fields.
Popis
Klíčová slova
similarity metric, similarity space, normalized similarity, edit distance, Jaccard coefficient, Q-gram filter, indexing method, approximate string matching, record linkage, entity resolution, record deduplication, similarity search, similarity join, linear regression, fixed point