Benchmarking Machine Learning Methods for Peptide Activity Predictions
Sammanfattning
One of the main challenges in the drug discovery process is to find a suitable compound
for further analysis. The compound must affect the target relevant for the
specific disease, while at the same time have desired properties to make it a safe and
efficient drug candidate. The task of finding and optimizing these compounds is a
long and expensive process. Therefore, using machine learning algorithms to predict
the properties of compounds can speed up the process and reduce the cost. To use
the algorithms, the information about the compounds must be translated into a
numerical representation. The choice of representation and algorithm is of greatest
importance since the predictions must be reliable to avoid late-stage failures in the
drug discovery process.
The objective of this thesis was to investigate if a molecular representation together
with a machine learning model could be found to accurately predict the potency of
peptides. This was done through a benchmarking study where different sequencebased
descriptors and predictive models were combined to see if one combination
worked well for various types of peptides. The descriptors were Z-scales, pseudo
amino acid composition, and one-hot representation, and were combined with two
different machine learning models, namely support vector classifier and random
forests classifier. The results show that one-hot representation outperforms Z-scales
and pseudo amino acid composition, however, the predictive model depends on the
characteristics of peptides.
Examinationsnivå
Student essay
Samlingar
Datum
2022-10-14Författare
Knutson, Boel
Meskini Moudi, Lida
Nyckelord
Drug discovery
peptide
classification
molecular representation
Z-scales
pseudo amino acid composition
one-hot representation
random forests
support vector machines
Språk
eng