Learning the shapes of protein pockets
Abstract
The comparison of protein pockets plays an important role in drug discovery. Through
the identification of binding sites with similar structures, we can assist in finding
hits and characterizing the function of proteins. Traditionally, the geometry of cavities
has been described with scalar features, which are not fully representative of
the shape. In this work, we propose a method that creates geometrical descriptors
of the pocket shape based on Euclidean neural networks, allowing us to encode
their physical features. As a result, we can compare the cavities by computing
the Euclidean distance between their respective embeddings. As a way of ensuring
that the generated embeddings contain relevant geometrical information, our model
was trained on a supervised classification task to predict whether given pockets are
druggable. To do this, a new dataset was built from the existing sc-PDB database
that served as a reference to set the druggable cavities. Then, the protein cavity
detection algorithm Fpocket was applied to generate decoys. The supervised model
is evaluated by predicting druggability on held-out data, while the utility of the
learned embeddings is assessed by comparing how a pocket changes during a dynamic
simulation. The findings obtained are encouraging and point to a possible
paradigm shift in the way pocket shape can be learned. All code is available at
https://github.com/acorrochanon/Pocket-shapes.
Degree
Student essay
Collections
Date
2022-10-14Author
Corrochano, Alejandro
Gharbi, Yossra
Keywords
Protein
cavity
ligand-binding
3D-equivariance
shape
latent space
e3nn
Fpocket
sc-PDB
Language
eng