Liu, Dan and Young, Francesca and Lamb, Kieran D. and Claudio Quiros, Adalberto and Pancheva, Alexandrina and Miller, Crispin and Macdonald, Craig and Robertson, David L. and Yuan, Ke (2024) PLM-interact: extending protein language models to predict protein-protein interactions. bioRxiv.

<img xmlns="http://www.w3.org/1999/xhtml" src="https://pub.demo35.eprints-hosting.org/107/1.haslightboxThumbnailVersion/349398.pdf" class="document_preview_tile_thumbnail"/> <span xmlns="http://www.w3.org/1999/xhtml" title="349398.pdf">349398.pdf</span>
349398.pdf - Published Version
Available under License Creative Commons Attribution.

Download (3MB)
Abstract

Computational prediction of protein structure from amino acid sequences alone has been achieved with unprecedented accuracy, yet the prediction of protein-protein interactions (PPIs) remains an outstanding challenge. Here we assess the ability of protein language models (PLMs), routinely applied to protein folding, to be retrained for PPI prediction. Existing PPI prediction models that exploit PLMs use a pre-trained PLM feature set, ignoring that the proteins are physically interacting. Our novel method, PLM-interact, goes beyond a single protein, jointly encoding protein pairs to learn their relationships, analogous to the next-sentence prediction task from natural language processing. This approach provides a significant improvement in performance: Trained on human-human PPIs, PLM-interact predicts mouse, fly, worm, E. coli and yeast PPIs, with 16-28% improvements in AUPR compared with state-of-the-art PPI models. Additionally, it can detect changes that disrupt or cause PPIs and be applied to virus-host PPI prediction. Our work demonstrates that large language models can be extended to learn the intricate relationships among biomolecules from their sequences alone.

Information
Library
URI https://pub.demo35.eprints-hosting.org/id/eprint/107
View Item