ASPIRER: A new computational approach for identifying non-classical secreted proteins based on deep learning

Wang, X.; Li, F.; Xu, J.; Rong, J.; Webb, G.I.; Ge, Z.; Li, J.; Song, J.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/139704

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Journal article
Title:	ASPIRER: A new computational approach for identifying non-classical secreted proteins based on deep learning
Author:	Wang, X. Li, F. Xu, J. Rong, J. Webb, G.I. Ge, Z. Li, J. Song, J.
Citation:	Briefings in Bioinformatics, 2022; 23(2):1-12
Publisher:	Oxford University Press (OUP)
Issue Date:	2022
ISSN:	1467-5463 1477-4054
Statement of Responsibility:	Xiaoyu Wang, Fuyi Li, Jing Xu, Jia Rong, Geoffrey I. Webb, Zongyuan Ge, Jian Li and Jiangning Song
Abstract:	Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.
Keywords:	non-classical secreted protein; bioinformatics; machine learning; deep learning; feature engineering; predictor
Rights:	© The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
DOI:	10.1093/bib/bbac031
Grant ID:	http://purl.org/au-research/grants/arc/DP120104460 http://purl.org/au-research/grants/arc/LP110200333 http://purl.org/au-research/grants/nhmrc/1127948 http://purl.org/au-research/grants/nhmrc/1144652
Published version:	http://dx.doi.org/10.1093/bib/bbac031
Appears in Collections:	Molecular and Biomedical Science publications

Files in This Item:

File	Description	Size	Format
hdl_139704.pdf	Published version	828.62 kB	Adobe PDF	View/Open

Show full item record

Adelaide Research & Scholarship