• English
    • svenska
  • svenska 
    • English
    • svenska
  • Logga in
Redigera dokument 
  •   Startsida
  • Faculty of Science / Naturvetenskapliga fakulteten
  • Department of Biological and Environmental Sciences / Institutionen för biologi och miljövetenskap (2012-)
  • Doctoral Theses / Doktorsavhandlingar Institutionen för biologi och miljövetenskap
  • Redigera dokument
  •   Startsida
  • Faculty of Science / Naturvetenskapliga fakulteten
  • Department of Biological and Environmental Sciences / Institutionen för biologi och miljövetenskap (2012-)
  • Doctoral Theses / Doktorsavhandlingar Institutionen för biologi och miljövetenskap
  • Redigera dokument
JavaScript is disabled for your browser. Some features of this site may not work without it.

Statistical analysis and modelling of gene count data in metagenomics

Sammanfattning
Microorganisms form complex communities that play an integral part of all ecosystems on Earth. Metagenomics enables the study of microbial communities through sequencing of random DNA fragments from the collective genome of all present organisms. Metagenomic data is discrete, high-dimensional and contains excessive levels of both biological and technical variability, which makes the statistical analysis challenging. This thesis aims to improve the statistical analysis of metagenomic data in two ways; by characterising the variance structure present in metagenomic data, and by developing and evaluating methods for identification of differentially abundant genes between experimental conditions. In Paper I we evaluate and compare the statistical performance of 14 methods previously used for metagenomic data. In Paper II we implement an overdispersed Poisson model and use it to show that the biological variability varies considerably between genes. The model is used to evaluate a range of assumptions for the variance parameter, and we show that correct modelling of the variance is vital for reducing the number of false positives. In Paper III we extend the model used in Paper II to incorporate zero-inflation. Using the extended model, we show that metagenomic data does indeed contain substantial levels of zero-inflation. We demonstrate that the new model has a high power to detect differentially abundant genes. In Paper IV we suggest improvements to the annotation and quantification of gene content in metagenomic data. Our proposed method, HirBin, uses a data-centric approach to identify effects at a finer resolution, which in turn allows for more accurate biological conclusions. This thesis highlights the importance of statistical modelling and the use of appropriate assumptions in the analysis of metagenomic data. The presented results may also guides researchers to select and further refine statistical tools for reliable analysis of metagenomic data.
Delarbeten
I. Jonsson, V., Österlund, T., Nerman, O., Kristiansson, E. (2016). Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics. BMC genomics, 17(1), 1. ::doi::10.1186/s12864-016-2386-y
 
II. Jonsson, V., Österlund, T., Nerman, O., Kristiansson, E. (2016). Variability in metagenomic count data and its influence on the identification of differentially abundant genes. Journal of Computational Biology, ahead of print. ::doi::10.1089/cmb.2016.0180
 
III. Jonsson, V., Österlund, T., Nerman, O., Kristiansson, E. (2017). A zero-inflated model for improved inference of metagenomic gene count data. Manuscript.
 
IV. Österlund, T., Jonsson, V., Kristiansson, E. (2017). HirBin: High-resolution identification of differentially abundant functions in metagenomes. Submitted.
 
Examinationsnivå
Doctor of Philosophy
Universitet
University of Gothenburg. Faculty of Science
Institution
Department of Mathematical Sciences ; Institutionen för matematiska vetenskaper
Disputation
Fredagen den 17 februari 2017, kl. 10.00, Pascal, Matematiska Vetenskaper, Chalmers tvärgata 3
Datum för disputation
2017-02-17
E-post
v.a.jonsson@gmail.com
URL:
http://hdl.handle.net/2077/48788
Samlingar
  • Doctoral Theses / Doktorsavhandlingar Institutionen för biologi och miljövetenskap
  • Doctoral Theses from University of Gothenburg / Doktorsavhandlingar från Göteborgs universitet
Fil(er)
Thesis frame (6.193Mb)
Abstract (717.1Kb)
Datum
2017-01-26
Författare
Viktor, Jonsson
Nyckelord
metagenomics
statistical modelling
hierarchical statistical models
gene ranking
overdispersion
zero-inflation
false discovery rate
receiver operating characteristic curves
Publikationstyp
Doctoral thesis
ISBN
978-91-629-0089-2 (PRINT)
978-91-629-0090-8 (PDF)
Språk
eng
Metadata
Visa fullständig post

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV
 

 

Visa

VisaSamlingarI datumordningFörfattareTitlarNyckelordDenna samlingI datumordningFörfattareTitlarNyckelord

Mitt konto

Logga inRegistrera dig

DSpace software copyright © 2002-2016  DuraSpace
gup@ub.gu.se | Teknisk hjälp
Theme by 
Atmire NV