Model Machine Learning-based Feature Ranking and Feature Selection
Typ dokumentudisertační práce
Studijní oborInformatika a výpočetní technika
Studijní programElektrotechnika a informatika
Instituce přidělující hodnostČeské vysoké učení technické v Praze. Fakulta elektrotechnická. Katedra počítačů
MetadataZobrazit celý záznam
This thesis introduces two novel machine learning methods of feature ranking and feature selection. The methods build on data mining and knowledge discovery techniques. The methods are based on arti cial neural network and evolutionary computation. The proposed methods are designed to adapt to the problem,and thus, to provide robust and e cient algorithms and their results in comparison to other used techniques on real world problems. In the past, research in this eld has focused on speci c kind of problem such as classi cation or regression. This thesis shows applicability of presented approaches for both types of problems: classi cation and regression. Feature ranking and feature selection play an important part of the whole knowledge discovery process. Successful approaches often exploit some expert knowledge speci c to a given task, which might be di cult or even impossible to transfer to a di erent task. The main contribution of the thesis is development of techniques that do not require such an expert input. This is achieved by integrating the construction of the data mining model into the method. The proposed approach not only delivers the solution, but also derives a mathematical expression that justi es the outcome. This expression is automatically evolved during the data mining process. In case of arti cial neural network, the expression represents a data mining model which provides results of classi cation or regression. In case of genetic programming, the expression represents how the attribute importance was determined and describes a relationship between particular attributes and output variables. The methods were experimentally evaluated on both, synthetic data, with feature importance known in advance, and on standard real datasets. The quality of the proposed feature selection and ranking is on par with state-of-the-art approaches, in a number of cases even achieving the best performance. Our methods were successfully applied to di erent realworld disciplines, including anthropology and dental medicine for prediction of age from teeth mineralization, or modelling of oral exostoses.
- Disertační práce - 13000