Liu, Dan and Young, Francesca and Lamb, Kieran D. and Claudio Quiros, Adalberto and Pancheva, Alexandrina and Miller, Crispin and Macdonald, Craig and Robertson, David L. and Yuan, Ke (2024) PLM-interact: extending protein language models to predict protein-protein interactions. bioRxiv.
AI Summary:
Computational prediction of protein structure from amino acid sequences alone has been achieved with unprecedented accuracy. However, predicting protein-protein interactions (PPIs) remains an outstanding challenge.AI Topics:
Computational prediction of protein structure from amino acid sequences alone has been achieved with unprecedented accuracy, yet the prediction of protein-protein interactions (PPIs) remains an outstanding challenge. Here we assess the ability of protein language models (PLMs), routinely applied to protein folding, to be retrained for PPI prediction. Existing PPI prediction models that exploit PLMs use a pre-trained PLM feature set, ignoring that the proteins are physically interacting. Our novel method, PLM-interact, goes beyond a single protein, jointly encoding protein pairs to learn their relationships, analogous to the next-sentence prediction task from natural language processing. This approach provides a significant improvement in performance: Trained on human-human PPIs, PLM-interact predicts mouse, fly, worm, E. coli and yeast PPIs, with 16-28% improvements in AUPR compared with state-of-the-art PPI models. Additionally, it can detect changes that disrupt or cause PPIs and be applied to virus-host PPI prediction. Our work demonstrates that large language models can be extended to learn the intricate relationships among biomolecules from their sequences alone.
Title | PLM-interact: extending protein language models to predict protein-protein interactions |
---|---|
Creators | Liu, Dan and Young, Francesca and Lamb, Kieran D. and Claudio Quiros, Adalberto and Pancheva, Alexandrina and Miller, Crispin and Macdonald, Craig and Robertson, David L. and Yuan, Ke |
Identification Number | 10.1101/2024.11.05.622169 |
Date | 27 November 2024 |
Divisions | College of Medical Veterinary and Life Sciences > School of Cancer Sciences College of Medical Veterinary and Life Sciences > School of Infection & Immunity College of Science and Engineering > School of Computing Science |
Additional Information | The authors acknowledge funding from the European Union’s Horizon 2020 research and innovation program, under the Marie Sklodowska-Curie Actions Innovative Training Networks grant agreement no. 955974 (VIROINF) for DL, a UK Medical Research Council (MRC) Doctoral Training Programme in Precision Medicine studentship (MR/N013166/1) for KDL and MRC grants: MC_UU_00034/5, MC_UU_00034/6 and MR/V01157X/1. KY acknowledges support from Cancer Research UK (EDDPGM-Nov21\100001 and DRCMDP-Nov23/100010), Biotechnology and Biological Sciences Research Council (BBSRC) BB/V016067/1, Prostate Cancer UK MA-TIA22-001 and EU Horizon 2020 grant ID 101016851. This work used the DiRAC Extreme Scaling service (Tursa) at the University of Edinburgh, managed by the Edinburgh Parallel Computing Centre on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The DiRAC service at Edinburgh was funded by BEIS, UKRI and STFC capital funding and STFC operations grants. DiRAC is part of the UKRI Digital Research Infrastructure. |
URI | https://pub.demo35.eprints-hosting.org/id/eprint/107 |
---|
Item Type | Article |
---|---|
Depositing User | Unnamed user with email ejo1f20@soton.ac.uk |
Date Deposited | 11 Jun 2025 16:35 |
Revision | 16 |
Last Modified | 12 Jun 2025 13:04 |
![]() |