Shlukování RNA-seq reads podle genové exprese

Hana Mertanová

Clustering of RNA-seq Reads by Gene Expression Levels

Typ dokumentu

bakalářská práce
bachelor thesis

Autor

Hana Mertanová

Vedoucí práce

Ryšavý Petr

Oponent práce

Horák Karel

Studijní obor

Informatika a počítačové vědy

Studijní program

Otevřená informatika

Instituce přidělující hodnost

katedra kybernetiky

Práva

A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics http://knihovny.cvut.cz/vychova/vskp.html
Vysokoškolská závěrečná práce je dílo chráněné autorským zákonem. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem http://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou http://knihovny.cvut.cz/vychova/vskp.html

Metadata

Zobrazit celý záznam

Abstrakt

Technológie na sekvenovanie produkujú veľké množstvo bioinformatických dát. Z týchto dát je možné získať celé spektrum informácií, ako napríklad štruktúru DNA, stav buniek a veľa ďalších. V tejto práci uvedieme základné koncepty a metódy používané na spracovanie dát získaných sekvenovaním. Zameriame sa najmä na analýzu génovej expresie. Bežný prístup je založený na priraďovaní sekvencií na úseky v referenčnom reťazci. Na rozdiel od prístupov založených na referencii, naším cieľom bude rozdeliť sekvencie podľa príslušnosti k jednotlivým génom bez znalosti referenčného reťazca. Na záver porovnáme naše riešenie so štandardným algoritmom založeným na metóde využívajúcej referenciu. Kľúčové slová: zhlukovanie, ready, génová expresia, bez referencie.

Sequencing technologies produce a high amount of bioinformatic data. These data are then processed by various algorithms, gathering the information about the DNA structure, the cell condition and many others. In this thesis, we introduce the basic concepts and methods used to process the sequenced data. Specifically, we focus on the gene expression analysis. Standard approaches are based on aligning the input sequences to the reference. Unlike these reference-based pipelines, our main goal is to categorize the input sequences according to the membership to the different genes without any reference. Finally, we compare our solution to the reference-based algorithm. Keywords: clustering, reads, gene expression, reference-free.