(Nelineární) stromové indexování a protisměrné vyhledávání
(Nonlinear) Tree Pattern Indexing and Backward Matching
Type of document
disertační prácedoctoral thesis
Author
Jan Trávníček
Supervisor
Janoušek Jan
Opponent
Demlová Marie
Field of study
InformatikaStudy program
InformatikaInstitutions assigning rank
katedra teoretické informatikyRights
A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.htmlVysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html
Metadata
Show full item recordAbstract
Trees are one of the fundamental data structures used in Computer Science. The dissertation thesis contributions are best categorised as a part of arbology research [52]. Arbology research is a counterpart of stringology research. Arbology research deals with trees represented in some linear notations, i.e. like strings with additional properties that encode the tree structure. Many algorithms belonging to the stringology maybe, with some care, adapted to handle trees represented as strings using some linear notation. This dissertation thesis is focused on finding all occurrences of tree patterns and nonlinear tree patterns inside a subject tree. Two different general approaches of solving the problem are explored in the dissertation thesis. The first approach is focused on preprocessing of the subject tree and forming a complete index of the subject tree capable of reporting the occurrences when queried with (nonlinear) tree patterns. The second approach is complementary to indexing and it is focused on preprocessing of the (nonlinear) tree pattern and creation of a matching algorithm. The results of the dissertation thesis are divided into two parts. The first, indexing, approach is covered by two different tree indexes. The second, matching, approach is covered by a single tree pattern matching algorithm designed for various tree representations. The first approach is represented by a nonlinear tree pattern pushdown automaton, which can be used to locate occurrences of (nonlinear) tree patterns and a full and linear index also capable of locating occurrences of tree patterns and in extended variant also of nonlinear tree patterns. The second approach is represented by a backward linearised tree pattern matching algorithm, which is a variant on backward pattern matching algorithm known from the area of strings. The algorithm is designed to work with many linear representations of trees. An extension of this algorithm for nonlinear tree patterns is also presented. Tree pattern is a representation of a subgraph of a tree, which is rooted in some node of the tree and contains a wildcard symbol in leaves representing any subtree. The nonlinear tree pattern additionally contains nonlinear variables in leaves which represent any subtree again, however, the same nonlinear variables represent the same subtrees. Given a tree with n nodes, the number of distinct tree patterns and nonlinear tree patterns can be at most 2n−1 + n − 1 and at most (2 + v)n−1 + n − 1, respectively, where v is the number of nonlinear variables allowed in the nonlinear tree patterns. Trees are one of the fundamental data structures used in Computer Science. The dissertation thesis contributions are best categorised as a part of arbology research [52]. Arbology research is a counterpart of stringology research. Arbology research deals with trees represented in some linear notations, i.e. like strings with additional properties that encode the tree structure. Many algorithms belonging to the stringology maybe, with some care, adapted to handle trees represented as strings using some linear notation. This dissertation thesis is focused on finding all occurrences of tree patterns and nonlinear tree patterns inside a subject tree. Two different general approaches of solving the problem are explored in the dissertation thesis. The first approach is focused on preprocessing of the subject tree and forming a complete index of the subject tree capable of reporting the occurrences when queried with (nonlinear) tree patterns. The second approach is complementary to indexing and it is focused on preprocessing of the (nonlinear) tree pattern and creation of a matching algorithm. The results of the dissertation thesis are divided into two parts. The first, indexing, approach is covered by two different tree indexes. The second, matching, approach is covered by a single tree pattern matching algorithm designed for various tree representations. The first approach is represented by a nonlinear tree pattern pushdown automaton, which can be used to locate occurrences of (nonlinear) tree patterns and a full and linear index also capable of locating occurrences of tree patterns and in extended variant also of nonlinear tree patterns. The second approach is represented by a backward linearised tree pattern matching algorithm, which is a variant on backward pattern matching algorithm known from the area of strings. The algorithm is designed to work with many linear representations of trees. An extension of this algorithm for nonlinear tree patterns is also presented. Tree pattern is a representation of a subgraph of a tree, which is rooted in some node of the tree and contains a wildcard symbol in leaves representing any subtree. The nonlinear tree pattern additionally contains nonlinear variables in leaves which represent any subtree again, however, the same nonlinear variables represent the same subtrees. Given a tree with n nodes, the number of distinct tree patterns and nonlinear tree patterns can be at most 2n−1 + n − 1 and at most (2 + v)n−1 + n − 1, respectively, where v is the number of nonlinear variables allowed in the nonlinear tree patterns.
View/ Open
Collections
Related items
Showing items related by title, author, creator and subject.
-
Vyhledávání ve stromech na principu mrtvých zón
Author: Obůrka Robin; Supervisor: Trávníček Jan; Opponent: Janoušek Jan
(České vysoké učení technické v Praze. Vypočetní a informační centrum.Czech Technical University in Prague. Computing and Information Centre., 2016-04-23)V práci jsou představeny dva nové algoritmy pro vyhledávání ve stromech - sousměrný algoritmus (založený na algoritmu Morris-Pratt) a algoritmus na principu mrtvých zón. Algoritmy naleznou všechny výskyty daného stromového ... -
Vyhledávání vzorků ve stromech a indexování stromů za použití automatů zpracovávajících řetězce
Author: Eliška Šestáková; Supervisor: Janoušek Jan; Opponent: Meduna Alexander
(České vysoké učení technické v Praze. Vypočetní a informační centrum.Czech Technical University in Prague. Computing and Information Centre., 2023-06-30)Problém vyhledávání vzorků ve stromech lze definovat jako vyhledávání všech výskytů vzorku ve vstupním stromě. Tento problém je často připodobňován k problému vyhledávání v řetězcích. Jedním z přístupů používaném při ... -
Metody pro přibližné vyhledávání vzorků v řídkých multidimensionálních polích pomocí metod strojového učení
Author: Kučerová Anna; Supervisor: Krčál Luboš; Opponent: Holub Jan
(České vysoké učení technické v Praze. Vypočetní a informační centrum.Czech Technical University in Prague. Computing and Information Centre., 2017-05-09)Hlavním cílem této práce je navrhnout řešení přibližného vyhledávání vzorů, které používá některou z metod strojového učení. Toho je dosaženo využitím hashování a již existujících algoritmů. Hashování se používá k nalezení ...