Practical application of machine learning for analyses of biological matrices and environmental phenomena.
Sammanfattning
This thesis presents research aimed at forwarding an understanding of machine learning methods as a method of studying complex matrices and environmental phenomena. A number of machine learning methods in the form of linear projection algorithms and statistical experimental designs were applied for qualitative analysis of different matrices. The used linear projection algorithms included principal component analysis (PCA), partial least squares (PLS), orthogonal partial least squares (OPLS), and transposed orthogonal partial least squared (T-OPLS). Several different statistical designs of experiments (DoE) were also implemented, including face-centred composite design (CCF), simplex mixture design, and definitive screening (DS) design. The analysed matrices included mammalian cells, wood, and a protein mixture. In addition to biological matrices, this work also presents research aimed at forming a multivariate understanding of a specific environmental phenomenon, namely the biogenic production of volatile halogenated organic carbons. Through the above enquiries, several challenges that exist in machine learning were examined.
The application of several linear projection algorithms for the spectral interpretation of hyperspectral images of human blood cells and PC12 cell line from rats was investigated when applied for spectral interpretation close to the detection limit. The achieved results revealed the benefits and the shortcomings of T-OPLS under such conditions. A deepened understanding of the T-OPLS algorithm was achieved by examining a protein-buffer mixture. The thesis provides therefore the first extensive examination of this algorithm and its performance in the analysis of nonlinear, co-dependent data. Also, the research presented here provided an extensive report on how linear projection algorithms with or without DoE may contribute to qualitative interpretation of nonlinear spectroscopic data.
A simplex mixture design and PLS were used to successfully quantify polyethylene glycol (PEG) in waterlogged archaeological wood. This study contributed both to the field of wood conservation and to the understanding of the performance of the used machine learning methods. Lastly, the biogenic production of volatile halogenated organic compounds (VHOCs) was examined. The reported research in this thesis was the first of its kind to involve DoE in the field of biogenic VHOC production. The acquired results indicate that previously reported formation mechanisms of VHOC were dependent on several abiotic factors, making the connection between those factors and the formation of VHOCs more complicated than had been previously assumed. By examining the biogenic VHOC formation multivariatly for the first time thus contributed to a deeper understanding of the formation of VHOCs and also emphasized the need for multivariate approaches, in particularly DoE, in any future examinations.
Examinationsnivå
Doctor of Philosophy
Universitet
Göteborgs universitet. Naturvetenskapliga fakulteten
Institution
Department of Marine Sciences ; Institutionen för marina vetenskaper
Disputation
Fredagen den 5 oktober 2020, kl. 10.00, Hörsal, Carl Skottsbergsgata 22B
Datum för disputation
2020-10-05
E-post
alexandra.walsh@chem.gu.se
Datum
2020-09-11Författare
Walsh, Alexandra
Nyckelord
Machine learning
Surface enhanced Raman spectroscopy
Acute lymphatic leukaemia
Volatile halogenated organic carbons
Marine algae
Doxorubicin
Principal component analysis
Multivariate statistics
Design of experiments
Waterlogged archaelogical wood
Transposed orthogonal partial least squares
Publikationstyp
Doctoral thesis
ISBN
978-91-8009-023-0
Språk
eng