Understanding the three-dimensional (3D) structures of proteins is fundamental to deciphering their biological functions. These intricate structures provide crucial insights for not only comprehending nature’s molecular mechanisms but also for advancing human endeavors in drug discovery and protein engineering. As experimental methods for protein structure determination can be time-consuming and challenging, computational protein structure prediction has emerged as a vital complementary approach, especially when homologous structures are available. The increasing wealth of both sequence and structural data has further amplified the significance of template-based modeling, also known as homology or comparative modeling, in the field of structural biology.
While traditional template-based modeling heavily relies on accurate homolog detection and sequence alignment, the focus has shifted towards pushing the boundaries beyond the limitations of existing templates. Improving model structures and refining initial models has become paramount for advancing the field. However, achieving significant improvements, particularly in refinement, has proven to be a formidable task, as evidenced by the rigorous assessments in the Critical Assessment of protein Structure Prediction (CASP) experiments. In the latest CASP9, only a handful of groups demonstrated genuine refinement improvements, with the most substantial enhancement being a mere 0.37% achieved by the ‘Seok-server’, highlighting the complexity of this challenge.
This article introduces GalaxyWEB, a cutting-edge web server designed to address these challenges by offering two key functionalities: de novo protein structure prediction from sequence and refinement of user-provided models. Rooted in the highly acclaimed ‘Seok-server’ method, which was recognized as a top performer in CASP9, GalaxyWEB provides a streamlined yet powerful approach to protein structure prediction. By employing a lighter version of the original algorithm, GalaxyWEB ensures efficient service delivery without compromising accuracy. This efficiency is achieved through optimized sampling techniques in both model building and refinement stages.
GalaxyWEB’s template-based modeling strategy leverages multiple template information to construct reliable core regions of proteins. Subsequently, it employs an innovative ab initio refinement method to rebuild unreliable loops or termini – regions often critical for protein function but challenging for traditional methods. This ab initio approach for less conserved regions is particularly valuable for functional and design studies, as these regions frequently dictate the specific activities of proteins within a family.
GalaxyWEB Methodology: Combining Template-Based Modeling with Ab Initio Refinement
The GalaxyWEB server operates through a sophisticated pipeline encompassing both structure prediction (GalaxyTBM) and refinement (GalaxyREFINE), as illustrated in Figure 1. The process commences with template selection using a refined HHsearch approach. To enhance the identification of relevant templates, especially for difficult targets, GalaxyWEB re-scores HHsearch results by emphasizing the secondary structure score. This re-ranking score is calculated as a weighted sum of the Z-scores for both sequence and secondary structure from HHsearch.
Among the top 20 re-ranked homologs, GalaxyWEB selects multiple templates, discarding structural outliers based on pairwise TM-scores for aligned core regions. On average, 4.55 templates are selected for single-domain CASP9 targets. PROMALS3D is then utilized for multiple sequence alignment of core regions, excluding unaligned termini. Terminus sequence alignments are subsequently appended. Initial model structures are generated from the selected templates and sequence alignment through Conformational Space Annealing (CSA) global optimization. This optimization is driven by restraints derived from templates using an in-house method, employing sum-of-potentials similar to those developed by Thompson et al., but with a wider range of application between Cα pairs, comparable to MODELLER.
Unreliable Local Regions (ULRs) are then identified within the initial model. GalaxyWEB reconstructs up to three ULRs simultaneously using CSA optimization with a hybrid energy function. This function combines physics-based and knowledge-based terms. During CSA optimization, the triaxial loop closure algorithm is extensively used to ensure geometrically accurate backbone structures for loops. This refined approach, optimized for web server efficiency, represents a significant advancement over the original Seok-server method, which demanded considerably more computational resources.
Performance Evaluation: GalaxyWEB Benchmarked Against CASP Standards
To rigorously assess the performance of the GalaxyWEB server, especially given its streamlined methodology compared to the original Seok-server, it was re-evaluated on the 68 single-domain targets from CASP9. The backbone structure quality, measured by the average Global Distance Test Total Score (GDT-TS), demonstrated a slight decrease from 68.5 for Seok-server to 67.6 for GalaxyWEB. This minor reduction is attributed to the lighter optimization protocols implemented for efficiency. Nevertheless, GalaxyWEB’s performance remains highly competitive, aligning with the top six server methods in CASP9. Notably, refinement by GalaxyWEB improved initial model structures in 65% of cases when local structure quality was assessed using Root Mean Square Deviation (RMSD). Further details on the refinement method’s performance are available in a separate publication.
The GalaxyWEB Server: Accessibility and Functionality
The GalaxyWEB server is powered by a cluster of four Linux servers, each equipped with eight 2.33 GHz Intel Xeon processor cores. The web application is built using Python and a MySQL database. The core structure prediction and refinement pipeline integrates HHsearch, PROMALS3D, and the in-house GALAXY program package, written in Fortran 90. JMol is incorporated for interactive visualization of predicted protein structures.
Users can access GalaxyWEB at http://galaxy.seoklab.org/ to perform both protein structure prediction and refinement tasks. For structure prediction, users simply input a protein sequence in FASTA format. For refinement jobs, users upload a model structure in PDB format and specify the residue ranges defining the regions for refinement. The expected runtime for structure prediction is approximately 7 hours for a 500-residue protein, while refinement of a 26-residue loop or terminus takes around 2 hours. GalaxyWEB presents the five best models on the website for immediate viewing and download, as shown in Figure 2. Users can also download the complete set of generated models as a tar archive.
Conclusion: GalaxyWEB – Advancing Protein Structure Prediction through Ab Initio Innovation
GalaxyWEB stands out as a valuable web server for protein structure prediction and refinement, uniquely distinguished by its ab initio refinement capabilities. Unlike many other protein structure servers, GalaxyWEB excels in identifying and refining unreliable regions where template information is lacking or inconsistent. This ab initio loop and terminus modeling capability, validated by its performance in CASP9, positions GalaxyWEB as a leading tool for enhancing the accuracy of protein structure predictions, particularly in challenging regions. Researchers can leverage GalaxyWEB not only for de novo structure prediction but also to refine models generated by other methods, making it an indispensable resource for the structural biology community.
Funding: National Research Foundation of Korea; Ministry of Education, Science and Technology [2011-0012456]; Center for Marine Natural Products and Drug Discovery (CMDD); Ministry of Land, Transport and Maritime Affairs of Korea; Seoul National University.
Conflict of interest statement: None declared.
References
[1] … (References from original article to be included here)
[2] …
[3] …
[4] …
[5] …
[6] …
[7] …
[8] …
[9] …
[10] …
[11] …
[12] …
[13] …
[14] …
[15] …
[16] …
[17] …
[18] …
[19] …
[20] …
[21] …