Proteins in parallel: New tool to compare protein expression could accelerate research

Skip to Navigation

Ezine

  • Published: Oct 12, 2015
  • Author: Ryan De Vooght-Johnson
  • Channels: Laboratory Informatics
thumbnail image: Proteins in parallel: New tool to compare protein expression could accelerate research

Untapped potential

As the fundament of biological processes, proteins are central to modern biomedical research. Studying proteins can help scientists to understand disease mechanisms, while analysing differences in their expression between samples is used to identify biomarkers for diagnosis

As the fundament of biological processes, proteins are central to modern biomedical research. Studying proteins can help scientists to understand disease mechanisms, while analysing differences in their expression between samples is used to identify biomarkers for diagnosis.

This requires the ability to compare the expression of large numbers of proteins and across multiple samples, something that remains a challenge for liquid chromatography–tandem mass spectrometry (LC–MS/MS)-based methods. While tandem MS provides reliable elution time and sequence information for identified peptides, which can be used to find their corresponding peaks in LC–MS for quantification, it can only sample a small fraction of the total proteins in a run.

This means only a small number of commonly identified peptides are quantified and compared across multiple samples. This limits the efforts of clinical researchers and impedes large studies that need to compare the protein expression profiles of hundreds of patients.

A recently developed algorithm called PeakLink attempts to solve this problem. It can accurately link LC–MS peaks without tandem MS identifications to their corresponding peaks with MS/MS identification in other runs, allowing identification across multiple samples from different instruments, tissues and labs. Unfortunately, PeakLink cannot yet be used practically, as existing software architectures do not provide access to peak elution profiles from multiple LC–MS/MS samples simultaneously.

Recognising the potential of PeakLink, researchers at the University of Texas at San Antonio developed an all-new software package to realise its potential. The software, called MZDASoft, thus enables large-scale comparison of protein expression levels over multiple samples using LC–MS/MS.

Large-scale comparison of protein expression

MZDASoft extracts LC–MS peak features using the core part of the software, the Parallel Peak Extractor (PPE). Its quantification application MZDASoftQuant includes two distinct modules; TandemQuant quantifies tandem MS-identified peptides and provides the statistics required by PeakLink. This arrangement increases processing speed, as it separates feature extraction (traditionally the most time-consuming task) from other steps.

The package is remarkably efficient because feature extraction is processed in parallel. This allows large studies, containing 100s of GBs of data and from multiple patients, time points and conditions, to be processed quickly. PPE also saves the extracted features to database files, which can be directly searched by various applications. Moreover, it extracts features without requiring any information on the analyte, which means it can be used in various LC–MS-based applications.

While all these features are important, the most innovative aspect of MZDASoft is that it supports data integration across multiple samples. This increases protein comparison coverage, allowing researchers to compare samples from different disease conditions.

Test cases

The developers tested the software with the SILAC (stable isotope labelling by amino acids in cell culture) technique, a labelling method that is widely used for quantifying proteins with MS. They used two datasets, a yeast dataset labelled with pre-mixed heavy-to-light ratios (HLRs) and a dataset obtained from breast cancer research labelled using Super-SILAC (SILAC for tissues).

They tested MZDASoft’s ability to compare proteins and accurately quantify expression, as well as its processing speed and data storage requirements against MaxQuant – a popular software package used for analysing large MS data sets.

When comparing the total number of proteins quantified across the yeast datasets, MZDASoftQuant increased protein quantification coverage 4.3 fold. Using the Super-SILAC dataset, it increased coverage from under 1000 to over 6000 proteins, across five samples.

Although the primary goal of the software is to increase protein comparison coverage across multiple samples, the accuracy of quantification is also important. Comparing pre-defined yeast HLRs to those measured by MZDASoft showed that quantification accuracy is not comprised. In fact, accuracy improves as a result of stricter quantification criteria and boundary detection techniques.

Processing speed was also enhanced. Although a direct comparison was not possible (as MaxQuant does not provide peak linking functionality), the researchers say MaxQuant would take around 90 hours to process 134 GB of data, which MZDASoft could process in just 8 hours.

This novel software architecture allows the most accurate peak linking algorithm to be implemented, vastly increasing the total number of proteins that can be compared across samples. In the test cases, 100–500% more proteins could be compared across multiple runs than MaxQuant. The authors hope that MZDASoft will enable more rapid clinical research and development.

Related Links

To download a sample script, visit http://compgenomics.utsa.edu/zgroup/MZDASoft/download.html.

Rapid Communications in Mass Spectrometry, 2015, 29(19), 1841-1848, MZDASoft: a software architecture that enables large-scale comparison of protein expression levels over multiple samples based on liquid chromatography/tandem mass spectrometry.

Article by Ryan De Vooght-Johnson

The views represented in this article are solely those of the author and do not necessarily represent those of John Wiley and Sons, Ltd.

Follow us on Twitter!

Social Links

Share This Links

Bookmark and Share

Microsites

Suppliers Selection
Societies Selection

Banner Ad

Click here to see
all job opportunities

Most Viewed

Copyright Information

Interested in separation science? Visit our sister site separationsNOW.com

Copyright © 2017 John Wiley & Sons, Inc. All Rights Reserved