|Documentation for (PS)2|
(PS)2 is an automatic homology modeling server. The method uses a new substitution matrix, S2A2, that combines both
sequence and secondary structure information for the detection of homologous proteins with remote similarity and the target-template alignment. The final
three dimensional structure is built using the modeling package MODELLER. After generated a predicted model,
the programs ProQ and ProQres were used to evaluate the quality of this model based on the LGscore and MaxSub scores. Finally,
the predicted model was displayed by AstexViewer and automatically sent to users.
Figure 1. Overview of the (PS)2 server using the protein sequence of telomere replication protein Est3 in Saccharomyces cerevisiae as query. (A) Input format of the (PS)2 server. (B) Search results of a query protein, comprising target name, sequence, predicted secondary structure, the graph of the aligned regions and the hits list of the templates of the query. (C) The selected template, target-template alignment and prediction structure of Est3. (D) The visualization of the predicted structure for Est3. (E) The model-quality evaluation.
Option "Automatic": Server will actomatically select the modelling template(s).
Option "Manual": Users can select the modeling template(s) by themself.
Option "Use this template": Users can use a specific PDB as the tempalete.
S2A2 substitution matrix
The S2A2 is a 60x60 substitution matrix based on secondary structure propensities of 20 amino acids.
It is an effective substitution matrix for the detection of remote homologs and target-template alignment.
Figure 2. The S2A2 substitution matrix.
MODELLER is used for homology or comparative modeling of protein three-dimensional structures. The
user provides an alignment of a sequence to be modeled with known related structures and MODELLER
automatically calculates a model containing all non-hydrogen atoms.
Sali A & Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993, 234: 779-815.
ProQ was proposed by Wallner. It is a neural network based predictor that based on a number of structural features predicts the quality of a protein model. ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures. Two quality measures are predicited LGscore and MaxSub.
Different ranges of quality:
Correct Good Very good
LGscore > 1.5 LGscore > 3 LGscore > 5
MaxSub > 0.1 MaxSub > 0.5 MaxSub > 0.8
Wallner B & Elofsson A: Can correct protein models be identified?. Protein Sci. 2003, 12: 1073-1086.
ProQres was proposed by Wallner. It is a neural network based predictor that based on a number of structural features predicts the quality of different parts of a protein model.
The quality ranges from 0 for to 1 for a perfect prediction. The predicted scores are the S-score=1/(1+(rmsd/5)2) for each residue. The sum of this score is used MaxSub, LGscore and TM-score.
Wallner B. & Elofsson A.: Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci. 2005, 15: 900-913.
SW-score is reported as the Smith-Waterman score. It is a row alignment score which is calculated as the sum of substitution (S2A2-matrix) and gap scores.
The value bit-score is derived from the raw alignment score S (SW-score) in which the statistical properties of the scoring system used have been taken into account. Because bit-scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.
Expectation value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E-value, the more significant the score.
The Global Distance Test Total Score (GDT_TS) of Ca atoms was used to assess the correctness of the predicted model. GDT_TS has been commonly used in modeling studies and in the CASP community. GDT_TS is defined as
where N in the total number residues of a target, GDTd is the number of aligned residues whose Ca-atom distance between the native structure and predicted model is less than d A (angstrom) after superposition of the two structures; and d is 1, 2, 4, and 8 A (angstrom).
Zemla A: LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003, 31: 3370-3374.
Figure 3 shows the correlation between E-values and GDT_TS scores for 121 targets in CASP8 and the Pearson correlation coefficient is 0.65. According to GDT_TS scores, our server often yields reliable predicted structures (i.e. GDT_TS score >= 60%) if the E-value <= 10-2.
Figure 3. The correlation between E-values and GDT_TS scores for 121 targets in CASP8. Our server often yields reliable predicted structures if the E-value is less than 10-2.