Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery. (arXiv:2306.12802v1 [cs.LG])
By: <a href="http://arxiv.org/find/cs/1/au:+Lam_H/0/1/0/all/0/1">Hoang Thanh Lam</a>, <a href="http://arxiv.org/find/cs/1/au:+Sbodio_M/0/1/0/all/0/1">Marco Luca Sbodio</a>, <a href="http://arxiv.org/find/cs/1/au:+Gallindo_M/0/1/0/all/0/1">Marcos Martínez Gallindo</a>, <a href="http://arxiv.org/find/cs/1/au:+Zayats_M/0/1/0/all/0/1">Mykhaylo Zayats</a>, <a href="http://arxiv.org/find/cs/1/au:+Fernandez_Diaz_R/0/1/0/all/0/1">Raúl Fernández-Díaz</a>, <a href="http://arxiv.org/find/cs/1/au:+Valls_V/0/1/0/all/0/1">Víctor Valls</a>, <a href="http://arxiv.org/find/cs/1/au:+Picco_G/0/1/0/all/0/1">Gabriele Picco</a>, <a href="http://arxiv.org/find/cs/1/au:+Ramis_C/0/1/0/all/0/1">Cesar Berrospi Ramis</a>, <a href="http://arxiv.org/find/cs/1/au:+Lopez_V/0/1/0/all/0/1">Vanessa López</a> Posted: June 23, 2023
Recent research in representation learning utilizes large databases of
proteins or molecules to acquire knowledge of drug and protein structures
through unsupervised learning techniques. These pre-trained representations
have proven to significantly enhance the accuracy of subsequent tasks, such as
predicting the affinity between drugs and target proteins. In this study, we
demonstrate that by incorporating knowledge graphs from diverse sources and
modalities into the sequences or SMILES representation, we can further enrich
the representation and achieve state-of-the-art results on established
benchmark datasets. We provide preprocessed and integrated data obtained from 7
public sources, which encompass over 30M triples. Additionally, we make
available the pre-trained models based on this data, along with the reported
outcomes of their performance on three widely-used benchmark datasets for
drug-target binding affinity prediction found in the Therapeutic Data Commons
(TDC) benchmarks. Additionally, we make the source code for training models on
benchmark datasets publicly available. Our objective in releasing these
pre-trained models, accompanied by clean data for model pretraining and
benchmark results, is to encourage research in knowledge-enhanced
representation learning.
Provided by:
http://arxiv.org/icons/sfx.gif