Example

In this example, we show how to use eFindSite to predict binding pockets of a given protein. Our target is 210 aa long glutathione S-transferase chain A (PDB-ID: 13gsA), whose amino acid sequence is shown below.


13gsA.fasta>13gsA 210
MPPYTVVYFPVRGRCAALRMLLADQGQSWKEEVVTVETWQEGSLKASCLYGQLPKFQDGDLTLYQSNTIL
RHLGRTLGLYGKDQQEAALVDMVNDGVEDLRCKYISLIYTNYEAGKDDYVKALPGQLKPFETLLSQNQGG
KTFIVGDQISFADYNLLDLLLIHEVLAPGCLDAFPLLSAYVGRLSARPKLKAFLASPEYVNLPINGNGKQ

First, using the target sequence, you need to run threading or a similar procedure to collect evolutionary related templates. Here, we will use eThread, but in principle, you can use other techniques as well; see notes on template libraries and mapping files.

The crystal structure of this protein has been solved and can be obtained from PDB. Note that the original PDB file contains two chains: A and B. Since we will be focusing on chain A, you need to do some editing of the structure file to make sure that the input file for eFindSite contains chain A only.

In addition, you will need secondary structure predicted by PSIPRED and a sequence profile file. The construction of sequence profiles is discussed in the previous section; instructions for PSIPRED are here.

For eFindSite, four input files are required:

Note: If you use threading algorithm other than eThread, the list of templates should have only one column that lists template IDs, e.g. 13gsA.templates.

Lets put these files in a directory called example/. Now, export environmental variables (Mac OSX users skip EF_APL):

[example]$ export EF_LIB=/usr/local/libraries/efindsite-lib-latest

[example]$ export EF_MAP=/usr/local/libraries/efindsite-lib-latest/ethread-lib-jan2012.map

[example]$ export EF_MOD=/usr/local/efindsite-1.0/mod

[example]$ export EF_APL=/usr/local/efindsite-1.0/lib/apclusterunix64.so

and run efindsite:

[example]$ /usr/local/efindsite-1.0/bin/efindsite -s 13gsA.pdb -t 13gsA.ethread-fun -i 13gsA.ss -e 13gsA.prf -o 13gsA-efindsite

You should get 5 output files:

Note that in eFindSite, you can optionally include the chemical properties of binding ligands if such information is available. Lets assume that we know our target binds glutathione; in that case, you can download glutathione from e.g. PubChem (CID: 124886) and then provide a SDF-formatted file that contains glutathione (CID_124886.sdf) using -l option as follows:

[example]$ /usr/local/efindsite-1.0/bin/efindsite -s 13gsA.pdb -t 13gsA.ethread-fun -i 13gsA.ss -e 13gsA.prf -l CID_124886.sdf -o 13gsA-efindsite

The number and format of output files will be exactly the same; however, pocket ranking and confidence estimates may be different due to additional information on binding ligand(s) included in the prediction procedure. The auxiliary ligand file may contain more than one compound.

Images below visualize the results. Figure on the left shows the target structure used as input for eFindSite. The crystal structure contains a glutathione molecule bound in the active site. We did not include this information in eFindSite; however, we can use the bound ligand to assess the accuracy of binding pocket and residue prediction. This is shown in figure on the right, where the bound ligand is represented by sticks colored by atom type and the predicted binding pocket and residues are represented by gray surface and green sticks, respectively (click on images to zoom).

csb01 csb01
Target structure Predicted binding pocket and residues

Format and content of the output files is described in the following sections.

© Michal Brylinski
This website is hosted at the CCT