Learning the shapes of protein pockets

Corrochano, Alejandro; Gharbi, Yossra

Abstract

The comparison of protein pockets plays an important role in drug discovery. Through the identification of binding sites with similar structures, we can assist in finding hits and characterizing the function of proteins. Traditionally, the geometry of cavities has been described with scalar features, which are not fully representative of the shape. In this work, we propose a method that creates geometrical descriptors of the pocket shape based on Euclidean neural networks, allowing us to encode their physical features. As a result, we can compare the cavities by computing the Euclidean distance between their respective embeddings. As a way of ensuring that the generated embeddings contain relevant geometrical information, our model was trained on a supervised classification task to predict whether given pockets are druggable. To do this, a new dataset was built from the existing sc-PDB database that served as a reference to set the druggable cavities. Then, the protein cavity detection algorithm Fpocket was applied to generate decoys. The supervised model is evaluated by predicting druggability on held-out data, while the utility of the learned embeddings is assessed by comparing how a pocket changes during a dynamic simulation. The findings obtained are encouraging and point to a possible paradigm shift in the way pocket shape can be learned. All code is available at https://github.com/acorrochanon/Pocket-shapes.

Degree

Student essay

Date

2022-10-14

Author

Corrochano, Alejandro

Gharbi, Yossra

Keywords

Protein

cavity

ligand-binding

3D-equivariance

shape

latent space

e3nn

Fpocket

sc-PDB

Language

eng

Metadata

Show full item record