Identifying and quantifying proteins in a biological sample is only one puzzle piece in a large and complex system, as they seldomly act alone. Characterizing how proteins interact with one another is therefore crucial for better understanding a given protein’s function. For example, perturbations in a protein’s interaction partners can be indicative of a particular disease state.
Immunoprecipitation coupled to mass spectrometry (IP-MS/MS) methods are often used to identify protein-protein interactions (PPIs), as they are high-throughput and generate accurate and robust results. While these methods generate false-positive identifications, the combination of experimental negative controls and computational modelling can be effectively used to filter these out. This task, however, becomes much more difficult when the biological sample of interest is human plasma. Traditional negative controls that one may use with cell lines do not translate for use in human plasma and the sheer abundance of a select few proteins (e.g., albumin, immunoglobulins) generate an immense amount of noise that current state -of-the-art algorithms are not optimized to handle.
To address this issue, we developed MAGPIE (Machine learning Assessment with loGistic regression of Protein-protein IntEractions) for assessing the confidence of PPIs in human plasma. Our collaborators at the Institut de recherches cliniques de Montréal generated a unique IP-MS/MS dataset, wherein they began by targeting known human plasma proteins with highly selective antibodies in samples not depleted of highly abundant proteins. For the negative controls, they used antibodies targeting proteins known to not be present in human plasma with the rationale that identifications from these experiments would represent examples of non-specific binding. With this, we could build a tool capable of modeling the background noise in the data and filter out false-positive identifications.
Using a principal component analysis and hierarchical clustering, we first demonstrated that the negative control data indeed captured what was likely to be a collection of non-specific bindings in our samples. Therein, we observed that our negative controls clustered separately from experiments with antibodies target known human plasma proteins. We then trained a logistic regression model, identifying our positive training examples as putative PPIs with Z-scores greater than or equal to 3 relative to abundances in the negative controls. Our negative training examples were randomly sampled identifications from the collection of negative controls.
From a total of 882 putative PPIs, MAGPIE identified 68 with a false discovery rate (FDR) of 20.77% and a logistic regression probability of 0.99. To benchmark these results, we applied the SAINT algorithm to our dataset, which identified 18 putative PPIs with an FDR of 41.53% and probability of 0.99. While SAINT is the current state-of-the-art for confidence assessment of PPIs, these results highlight the difficulty of filtering false-positive identifications from a noisy dataset. Notably, all 18 of SAINT’s identifications were also identified by MAGPIE (Figure 1). We show that MAGPIE’s high-confidence identifications recover known and predicted interactions, as reported by the STRING database, for the PCSK9 and SNCA proteins. While indirect, this provides an additional layer of confidence in the results.
Altogether, we developed a tool, MAGPIE, that is the first of its kind for assessing the confidence of protein-protein interactions in human plasma and further showed it outperformed the SAINT algorithm. This work helps pave the way for characterizing PPIs in plasma samples that are more similar to their native environment, as they do not need to be depleted of highly abundant proteins that lead to non-specific bindings and false-positive identifications. Such a tool can help provide novel insights for studying biological mechanisms and diseases with proteins that enter circulation.
Hashimoto-Roth E., Forget D., Gaspar VP., Bennett SAL., Gauthier M-S., Coulombe B. & Lavallée-Adam M. (2025) MAGPIE: A Machine Learning Approach to Decipher Protein-Protein Interactions in Human Plasma. J Proteome Res, 24(2), 383–396, doi: 10.1021/acs.jproteome.4c00160.