Programs

The eFindSite software distribution includes the following main programs: efindsite, efindsite_scr and efindsite_map. To get the list of available options for each program, simply execute the program command with no arguments. Below is the full description of the programs.

efindsite

efindsite predict binding pockets, residues and ligands from a given protein structure and threading data.

Mandatory arguments:

-s input_file, where input_file is the target protein structure in PDB format; you can use either experimental structure or protein model

-t template-fun, where template-fun is a text file that contains information on protein templates identified for your target e.g. by protein threading

-i sec_struct.ss, where sec_struct.ss is a secondary structure profile constructed for the target by PSIPRED

-e seq.prf, where seq.prf is a sequence profile of the target; it can be generated e.g. by SP3 or SPARKS.

-o output_name, where output_name is used to save five output files with different extensions that store binding site prediction results, see below

Optional arguments:

-l input_ligands, where input_ligands is an SDF file that contains auxiliary ligands

-b seq_cut, where seq_cut is a sequence identity threshold of the target protein to templates. Note that this argument should be used only for benchmarks. The default value of 1.0, which means that all identified templates will be used to predict binding sites

-m tm_score, where tm_score is a TM-score threshold. The default value is 0.4, which means that only these template structures whose TM-score to native is ≥ 0.4 will be used

-x template_max, where template_max is the maximum number of templates, default 250

-r binres_cut, where bin_res_cut is a probability threshold to assign residues as ligand-binding, default 0.25

-n binres_min, where binres_min is the minimum number of binding residues, default 1

-d poc_cluster_cut, where poc_cluster_cut is the cutoff used for pocket clustering (in Affinity Propagation method, it is called a preference factor), default 8.0 [Å]

-g P/L lets you choose the pocket clustering method: P - Affinity Propagation (Linux only), L - average linkage, default either P (Linux) or L (other)

-f fin_cluster_cut, where fin_cluster_cut is a ligand fingerprint clustering cutoff, default 0.7

-c T/A lets you choose a ligand fingerprint scoring function: T - classical Tanimoto coefficient, A - average Tanimoto coefficient, default T

Output files:

output_name.pockets.dat: detailed information on the detected pockets including pocket confidence, list of templates and ligands, binding residues, chemical properties of putative binding ligands, etc.

output_name.pockets.pdb: geometric centers of detected binding pockets in PDB format

output_name.alignments.dat: structure alignments between the target and template proteins constructed by fr-TM-align

output_name.ligands.sdf: ligands extracted from template complexes in SDF format

output_name.templates.pdb: template proteins in PDB format aligned onto the target

efindsite_scr

efindsite_scr performs ligand-based virtual screening against binding pockets detected by efindsite.

Mandatory arguments:

-p pocket.dat, where pocket.dat is a file containing the predicted binding pockets by efindsite

-s cmp_lib, where cmp_lib is a compound library used for virtual screening (gzipped); several libraries are available here

-o output_file, which will contain ranked library compounds assigned with a Tanimoto score and Z-score

Optional arguments:

-n poc_num, where poc_num indicates which predicted binding sites will be used in virtual screening, default 1

-m score_func, where score_func lets you choose the scoring method, these are currently implemented functions (default sum):

  • single scoring functions:
    • tst - classical Tanimoto coefficient for Daylight
    • tsa - average Tanimoto coefficient for Daylight
    • tsc - continuous Tanimoto coefficient for Daylight
    • tmt - classical Tanimoto coefficient for MACCS
    • tma - average Tanimoto coefficient for MACCS
    • tmc - continuous Tanimoto coefficient for MACCS
  •  combined scoring functions:
    • sum - data fusion using sum rule
    • max - data fusion using max rule
    • min - data fusion using min rule
    • svm - machine learning using Support Vector Machines (SVM)

Output files:

Library compounds ranked by a Tanimoto score and the Z-score

efindsite_map

efindsite_map prepares a mapping file that maps threading templates to ligand-bound templates used by efindsite.

Mandatory arguments:

-t thread_lib, where thread_lib is a complete threading library in FASTA format

-p efindsite_lib, where efindsite_lib is the eFindSite library in FASTA format, look for pdb.fasta

-o output_file, where output_file is the mapping file linking threading and eFindSite libraries

Optional arguments:

-a proc_num, where proc_num is the number of the processors to be used, default 1

Output files:

A mapping file required to run efindsite

 

© Michal Brylinski
This website is hosted at the CCT