Processing large datasets using Xeon Phi

In this example, we show how to use eFindSite on computing systems equipped with Xeon Phi accelerators to predict binding pockets for large sets of proteins. The dataset used in this tutorial comprises 501 target proteins 200-500aa in length and 50-250 templates per target (dataset-phi.tar.gz). Note that input files for each target are located in a separate directory. Similarly to the serial version, required input files are:

  • SOME_ID.pdb (target structure in PDB format)
  • SOME_ID.ethread-fun (list of templates from eThread)
  • SOME_ID.ss (secondary structure profile)
  • SOME_ID.prf (sequence profile)

In addition, you will need a list of target proteins to be processed in parallel using eFindSite. In this example, the list of targets is saved as dataset-phi.lst.

Lets put these files in a directory called example/ (you need to unpack dataset-phi.tar.gz before you run eFindSite). Now, export the following environmental variables:

[example]$ export EF_LIB=/usr/local/libraries/efindsite-lib-latest

[example]$ export EF_MAP=/usr/local/libraries/efindsite-lib-latest/ethread-lib-jan2012.map

[example]$ export EF_MOD=/usr/local/efindsite-1.2-phi/mod

[example]$ export EF_APL=/usr/local/efindsite-1.2-phi/lib/apclusterunix64.so

[example]$ export EF_OMP=/usr/local/efindsite-1.2-phi/bin/efindsite_omp

[example]$ export EF_MIC=/usr/local/efindsite-1.2-phi/bin/efindsite_mic

and run efindsite_phi:

[example]$ perl /usr/local/efindsite-1.2-phi/bin/efindsite_phi -l dataset-phi.lst

For each target protein, you should get 5 output files in the corresponding directory dataset-phi/SOME_ID/ (the format of output files is identical to those created by the serial version):

  • SOME_ID-efindsite.pockets.dat (detailed info on predicted pockets)
  • SOME_ID-efindsite.pockets.pdb (predicted pockets in PDB format)
  • SOME_ID-efindsite.alignments.dat (structure alignments of templates onto the target in FASTA format)
  • SOME_ID-efindsite.templates.pdb (template structures aligned onto the target in PDB format)
  • SOME_ID-efindsite.ligands.sdf (extracted binding ligands in SDF format)

 

© Michal Brylinski
This website is hosted at the CCT