Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction

Zhang, M.; Jia, C.; Li, F.; Li, C.; Zhu, Y.; Akutsu, T.; Webb, G.I.; Zou, Q.; Coin, L.J.M.; Song, J.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/139707

Scopus	Web of Science®	Altmetric
Citations
?	?

Type:	Journal article
Title:	Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction
Author:	Zhang, M. Jia, C. Li, F. Li, C. Zhu, Y. Akutsu, T. Webb, G.I. Zou, Q. Coin, L.J.M. Song, J.
Citation:	Briefings in Bioinformatics, 2022; 23(2):1-25
Publisher:	Oxford University Press (OUP)
Issue Date:	2022
ISSN:	1467-5463 1477-4054
Statement of Responsibility:	Meng Zhang, Cangzhi Jia, Fuyi Li, Chen Li, Yan Zhu, Tatsuya Akutsu, Geoffrey I. Webb, Quan Zou, Lachlan J.M. Coin and Jiangning Song
Abstract:	Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.
Keywords:	machine learning; deep learning; promoter identification; performance evaluation
Rights:	© The Author(s) 2022. Published by Oxford University Press. All rights reserved.
DOI:	10.1093/bib/bbab551
Grant ID:	http://purl.org/au-research/grants/nhmrc/1144652 http://purl.org/au-research/grants/nhmrc/1127948 http://purl.org/au-research/grants/nhmrc/1103384 http://purl.org/au-research/grants/nhmrc/1195743 http://purl.org/au-research/grants/nhmrc/1143366
Published version:	http://dx.doi.org/10.1093/bib/bbab551
Appears in Collections:	Molecular and Biomedical Science publications

Files in This Item:

There are no files associated with this item.

Show full item record

Adelaide Research & Scholarship