Model Machine Learning-based Feature Ranking and Feature Selection
Typ dokumentu
disertační práceAutor
Pilný, Aleš
Vedoucí práce
Šnorek, Miroslav
Studijní obor
Informatika a výpočetní technikaStudijní program
Elektrotechnika a informatikaInstituce přidělující hodnost
České vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra počítačůMetadata
Zobrazit celý záznamAbstrakt
This thesis introduces two novel machine learning methods of feature ranking and feature
selection. The methods build on data mining and knowledge discovery techniques.
The methods are based on arti cial neural network and evolutionary computation. The
proposed methods are designed to adapt to the problem,and thus, to provide robust and
e cient algorithms and their results in comparison to other used techniques on real world
problems. In the past, research in this eld has focused on speci c kind of problem such
as classi cation or regression. This thesis shows applicability of presented approaches for
both types of problems: classi cation and regression.
Feature ranking and feature selection play an important part of the whole knowledge
discovery process. Successful approaches often exploit some expert knowledge speci c to a
given task, which might be di cult or even impossible to transfer to a di erent task. The
main contribution of the thesis is development of techniques that do not require such an
expert input. This is achieved by integrating the construction of the data mining model
into the method.
The proposed approach not only delivers the solution, but also derives a mathematical
expression that justi es the outcome. This expression is automatically evolved during
the data mining process. In case of arti cial neural network, the expression represents a
data mining model which provides results of classi cation or regression. In case of genetic
programming, the expression represents how the attribute importance was determined and
describes a relationship between particular attributes and output variables.
The methods were experimentally evaluated on both, synthetic data, with feature importance
known in advance, and on standard real datasets. The quality of the proposed feature
selection and ranking is on par with state-of-the-art approaches, in a number of cases even
achieving the best performance. Our methods were successfully applied to di erent realworld
disciplines, including anthropology and dental medicine for prediction of age from
teeth mineralization, or modelling of oral exostoses.
Zobrazit/ otevřít
Kolekce
- Disertační práce - 13000 [743]