Advancing Evolutionary Biology: Genomics, Bayesian Statistics, and Machine Learning
Abstract
During the recent decades the field of evolutionary biology has entered the era of big data, which has transformed the field into an increasingly computational discipline. In this thesis I present novel computational method developments, including their application in empirical case studies. The presented chapters are divided into three fields of computational biology: genomics, Bayesian statistics, and machine learning. While these are not mutually exclusive categories, they do represent different domains of methodological expertise.
Within the field of genomics, I focus on the computational processing and analysis of DNA data produced with target capture, a pre-sequencing enrichment method commonly used in phylogenetic studies. I demonstrate on an empirical case study how common computational processing workflows introduce biases into the phylogenetic results, and I present an improved workflow to address these issues. Next I introduce a novel computational pipeline for the processing of target capture data, intended for general use. In an in-depth review paper on the topic of target capture, I provide general guidelines and considerations for successfully carrying out a target capture project. Within the context of Bayesian statistics, I develop a new computer program to predict future extinctions, which utilizes custom-made Bayesian components. I apply this program in a separate chapter to model future extinctions of mammals, and contrast these predictions with estimates of past extinction rates, produced from fossil data by a set of different recently developed Bayesian algorithms. Finally, I touch upon newly emerging machine learning algorithms and investigate how these can be improved in their utility for biological problems, particularly by explicitly modeling uncertainty in the predictions made by these models.
The presented empirical results shed new light onto our understanding of the evolutionary dynamics of different organism groups and showcase the utility of the methods and workflows developed in this thesis. To make these methodological advancements accessible for the whole research community, I embed them into well documented open-access programs. This will hopefully foster the use of these methods in future studies, and contribute to more informed decision-making when applying computational methods to a given biological problem.
Parts of work
Andermann, Tobias, Alexandre M. Fernandes, Urban Olsson, Mats Töpel, Bernard Pfeil, Bengt Oxelman, Alexandre Aleixo, Brant C. Faircloth, and Alexandre Antonelli. 2019. “Allele Phasing Greatly Improves the Phylogenetic Utility of Ultraconserved Elements.” Systematic Biology 68 (1): 32–46. ::doi::10.1093/sysbio/syy039 Andermann, Tobias, Ángela Cano, Alexander Zizka, Christine D. Bacon, and Alexandre Antonelli. 2018. “SECAPR—a Bioinformatics Pipeline for the Rapid and User-Friendly Processing of Targeted Enriched Illumina Sequences, from Raw Reads to Alignments.” PeerJ 6 (July): e5175. ::doi::10.7717/peerj.5175 Andermann, Tobias, Maria Fernanda Torres Jiménez, Pável Matos- Maraví, Romina Batista, José L. Blanco-Pastor, A. Lovisa S. Gustafsson, Logan Kistler, Isabel M. Liberal, Bengt Oxelman, Christine D. Bacon, and Alexandre Antonelli. 2020. “A Guide to Carrying Out a Phylogenomic Target Sequence Capture Project.” Frontiers in Genetics 10. ::doi::10.3389/fgene.2019.01407 Andermann, Tobias, Søren Faurby, Robert Cooke, Daniele Silvestro, and Alexandre Antonelli. 2020. “iucn_sim: A New Program to Simulate Future Extinctions Based on IUCN Threat Status.” Ecography (in print). ::doi::10.1111/ecog.05110 Andermann, Tobias, Søren Faurby, Samuel T. Turvey, Alexandre Antonelli, and Daniele Silvestro. 2020. “The Past and Future Human Impact on Mammalian Diversity.” Science Advances 6 (36): eabb2313. ::doi::10.1126/sciadv.abb2313 Silvestro, Daniele, and Tobias Andermann. 2020. “Prior Choice Affects Ability of Bayesian Neural Networks to Identify Unknowns.” ArXiv Preprint arXiv:2005.04987. http://arxiv.org/abs/2005.04987.
Degree
Doctor of Philosophy
University
University of Gothenburg. Faculty of Science
Institution
Department of Biological and Environmental Sciences ; Institutionen för biologi och miljövetenskap
Disputation
Fredagen den 18 december 2020, kl. 14.00, Hörsalen, Botanhuset, Institutionen för Biologi och Miljövetenskap, Carl Skottsbergs gata 22B, Göteborg
Date of defence
2020-12-18
tobias.andermann@bioenv.gu.se
Date
2020-11-20Author
Andermann, Tobias
Keywords
computational biology
bioinformatics
phylogenetics
neural networks
NGS
target capture
Illumina sequencing
fossils
IUCN conservation status
extinction rates
Publication type
Doctoral thesis
ISBN
978-91-8009-136-7
978-91-8009-137-4
Language
eng