Setting up a meta-threading pipeline for high-throughput structural bioinformatics: eThread software distribution, walkthrough and resource profiling

TitleSetting up a meta-threading pipeline for high-throughput structural bioinformatics: eThread software distribution, walkthrough and resource profiling
Publication TypeJournal Article
Year of Publication2012
AuthorsBrylinski M, Feinstein WP
JournalJ Comput Sci Syst Biol
Volume6
Issue1
Pagination001-010
Date Published2012
Abstract

eThread, a meta-threading and machine learning-based approach, is designed to effectively identify structural templates for use in protein structure and function modeling from genomic data. This is an essential methodology for high-throughput structural bioinformatics and critical for systems biology, where extensive knowledge of protein structures and functions at the systems level is prerequisite. eThread integrates a diverse collection of algorithms, therefore its deployment on a large multi-core system necessarily requires comprehensive profiling to ensure the optimal utilization of available resources. Resource profiling of eThread and the single-threading component algorithms indicate as wide range of demands with respect to wall clock time and host memory. Depending on the threading algorithm used, the modeling of a single protein sequence of up to 600 residues in length takes minutes to hours. Full meta-threading of one gene product from E. coli proteome requires ~12h on average on a single state-of-the-art computing core. Depending on the target sequence length, the subsequent three-dimensional structure modeling using eThread/Modeller and eThread/TASSER-Lite takes additional 1-3 days of computing time. Using the entire proteome of E. coli, we demonstrate that parallel computing on a multi-core system follows Gustafson-Barsis' law and can significantly reduce the production time of eThread. Furthermore, graphics processor units can speedup portions of the calculations; however, to fully utilize this technology in protein threading, a substantial code development is required. eThread is freely available to the academic and non-commercial community as a user-friendly web-service at http://www.brylinski.org/ethread. We also provide source codes and step-by-step instructions for the local software installation as well as a case study demonstrating the complete procedure for protein structure modeling. We hope that genome-wide high-throughput structural bioinformatics using eThread will significantly expand our knowledge of protein structures and their molecular functions and contribute to the thriving area of systems biology.

DOI10.4172/jcsb.1000094
Alternate JournalJournal of Computer Science & Systems Biology
Full Text

iPaper

PreviewAttachmentSize
2012_jcsb.pdf1.9 MB

© Michal Brylinski
This website is hosted at the CCT