Part 5 - Building a model from homology constraints

Superposition of target (S-HPCDH) and template (R-HPCDH)

Background

In general, proteins with similar amino acid sequences (primary structure) have similar protein structures. This forms the basis of comparative protein modeling. We discovered in Part 5 that the two HPCDH sequences share 41% sequence identity. This degree of similarity is such that an accurate sequence alignment between the two is likely to have only minor errors if any. Yet, despite an accurate sequence alignment, the structures will likely have diverged enough that small shifts in the main chain trace and side chain conformations would be expected (should the structure at some point be solved). Nevertheless, we may be able to construct a model of S-HPCDH based on the structure of R-HPCDH that is accurate enough to facilitate insight into the basis of stereospecificity. The technology of comparative modeling (sometimes called homology modeling or template-based modeling) is really just a sophisticated copying procedure.

MODELLER is a computer program that models three-dimensional structures of proteins and their assemblies by satisfaction of spatial restraints.

MODELLER is most frequently used for homology or comparative protein structure modeling: The user provides an alignment of a sequence to be modeled with known related structures and MODELLER will automatically calculate a model with all non-hydrogen atoms.

More generally, the input to the program are restraints on the spatial structure of the amino acid sequence(s) and ligands to be modeled. The output is a 3D structure that satisfies these restraints as well as possible. Restraints can in principle be derived from a number of different sources. These include related protein structures (comparative modeling), NMR experiments (NMR refinement), rules of secondary structure packing (combinatorial modeling), cross-linking experiments, fluorescence spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis, intuition, residue-residue and atom-atom potentials of mean force, etc. The restraints can operate on distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo atoms. Presently, MODELLER automatically derives the restraints only from the known related structures and their alignment with the target sequence.

A 3D model is obtained by optimization of a molecular probability density function (pdf). The molecular pdf for comparative modeling is optimized with the variable target function procedure in Cartesian space that employs methods of conjugate gradients and molecular dynamics with simulated annealing.

The following are some general rules on how these programs work:

  1. When a residue in the template and target sequence are the same identity and are matched in the sequence alignment, most programs will place these residues in the same 3D space.
  2. When a residue in the template and target sequence are different but aligned in the sequence alignment, the program will place these residues in nearly the same 3D space but with some rules for transforming one residue into another.
  3. Insertions in the target sequence are modeled in various ways and often contain significant errors. Researchers may employ physics-based simulation methods to predict these regions or search for other known structures to model these sections (multiple templates).
  4. The combination of trying to satisfy all these restraints can often result in un-physical atomic overlaps or clashes. Therefore, knowledge-based and physics-based potential energy functions along with energy minimization routines are employed to remove these clashes.

In this step, the authors of the tutorial simply show you the input to the popular comparative protein modeling program, MODELLER, for constructing a model of S-HPCDH. The program requires an external sequence alignment file (construction of this is shown in Step 4 for interested students), a PDB file of the template and an input file to the program. The PDB file originates from the Protein Data Bank and has the code 2cfc [PDB].

You have several options for building the model:


Option 1) Skip to results

In Part 6, we will visualize the S-HPCDH model generated through comparative modeling. You do not actually have to run MODELLER as we have provided the results for you in the next step.


Option 2) Install and Run Modeller

        # File: model-hcds.py
# Homology modelling by the automodel class

from modeller.automodel import *    # Load the automodel class

log.minimal()    # request minimal output
env = environ()  # create a new MODELLER environment to build this model in

# directories for input atom files
env.io.atom_files_directory = './:../atom_files'

a = automodel(env,
              alnfile  = 'HCDS_0.ali',     # alignment filename
              knowns   = 'HPCDH',              # codes of the templates
              sequence = 'HCDS',                # code of the target
                          assess_methods=(assess.DOPE))              
a.starting_model= 1                 # index of the first model 
a.ending_model  = 1                 # index of the last model
                                    # (determines how many models to calculate)

a.md_level = None
a.make()                            # do the actual homology modelling
  1. Follow the instructions on the Modeller installation page to install the program.
  2. Make sure you saved the target-template alignment as HCDS_0.ali
  3. Then, you will need a Modeller command file to configure and run MODELLER. This is a python file and has the file extension .py.
    You can download model-hcds.py
  4. Download the template pdb file 2CFC and save it as 2CFC
  5. Once you have these three files in the current directory, you are ready to run Modeller.
    $> mod9v4 model-hcds.py

This should generate your model HCDS.B99990001.pdb, as well as a MODELLER output log.


Option 3) Submit to a server

In the previous step, you created an alignment in MODELLER format. You will need to copy that alignment verbatim to correctly build the model.

  1. Go to the NRBSC Modeller server
  2. Click the New Model link
  3. Paste your sequence alignment into the form.
  4. Enter the PDB code of your template as 2CFC
  5. Enter the MODELLER license key as well as your email address and submit the form

If there were errors, check your alignment and resubmit. If everything looks OK, the server will build your model and display the results.