Ligand-based virtual screening

In this section, we demonstrate how to perform ligand-based virtual screening using efindsite_scr. As shown earlier in this tutorial, 11 binding sites have been predicted by eFindSite for our target protein, glutathione S-transferase chain A (PDB-ID: 13gsA). Here, we will virtually screen the top-ranked predicted pocket against the KEGG compound library comprising 11,265 molecules.

efindsite_scr requires two input files:

  • 13gsA-efindsite.pockets.dat, which contains information on putative pockets identified by efindsite
  • escreen-keggcomp-mar2012.gz, which is the screening library. Note that additional eFindSite-compatible compond collections are available here. For convenience, you can park this file in /usr/local/libraries/.

Now, export this environmental variable, if you haven't done this already:

[example]$ export EF_MOD=/usr/local/efindsite-1.0/mod

and run efindsite_src:

[example]$ /usr/local/efindsite-1.0/bin/efindsite_scr -p 13gsA-efindsite.pockets.dat -s /usr/local/libraries/escreen-keggcomp-mar2012.gz -o 13gsA-escreen-keggcomp

In this example, we use the default scoring function, which is data fusion using sum rule; other available scoring functions are described here. The output file 13gsA-escreen-keggcomp contains a ranked list of compounds assigned with Tanimoto score and the Z-score:

1 C00051 0.000133 1.950014
2 C02320 0.000222 1.949668
3 C02471 0.000311 1.949322

Each row lists one compound and the following data:

  • compound rank (1)
  • compound ID (C00051)
  • Tanimoto score (0.000133)
  • Z-score (1.950014)

In general, the higher the Z-score of the top-ranked compound, the more confident ranking. Below are the chemical structures of the top 3 compounds selected from the KEGG library of 11,265 molecules, which make perfect sense considering that our target is glutathione S-transferase:

#1: C00051 (1.950014) #2: C02320 (1.949668) #3: C02471 (1.949322)  


© Michal Brylinski
This website is hosted at the CCT