Molecular Synthesis

Assuming that organic compounds are composed of sets of rigid fragments connected by flexible linkers, a molecule can be decomposed into its building blocks tracking their atomic connectivity. Here, we developed eSynth, an automated method to synthesize new compounds by reconnecting these building blocks following the connectivity patterns via an exhaustive graph-based search algorithm. eSynth opens up a possibility to rapidly construct virtual screening libraries for targeted drug discovery.
csb17
If you find eSynth useful, please cite the following paper:

eSynth is also available from GitHub

eSynth is free of charge under the terms of GNU General Public License for non-commercial use. It is distributed without any warranty, but with best wishes.



Source codes

Version (hover) Download Size MD5  

 
1.1 latest esynth-1.1.tar.gz 1.5M 9a24c948dbe8ac56afa9e9074406aa44  
1.0 latest esynth-1.0.tar.gz 82K 8b71400072b3a772b3b0ba251affc1c9  

 

Compiling

The following libraries are required (modify the Makefile to update paths accordingly):

NOTE: OpenBabel (2.3.2) should be compiled from source code. You may also need to remove any generic instalation of OpenBabel to prevent path conflicts.

Object files

All object files are compiled and placed in the obj directory; make clean will empty this folder. Before executing esynth, you may need to export the LD_LIBRARY_PATH environment variable, e.g.:

[example]$ export LD_LIBRARY_PATH=${HOME}/apps/openbabel-2.3.2-install/lib/:${HOME}/apps/gsl-1.16-install/lib/

Executable

make will create the esynth application. All output molecules are written in the SMILES format in the output directory synth_output_dir. Each file will contain up to 250,000 molecules and will be compressed using the zlib compression algorithm.

Options

esynth is the executable. Comamand-line arguments are specified with a prefix - and may include the following:

-serial specifies a serial execution
-threaded specifies a threaded execution; experimental, do not use
-odir <directory> specifies the name of the directory where output will be placed; default is esynth_output_dir
-tc <value> defines the Tanimoto coefficient threshold as a value between 0 and 1; default is 0.95
-smi-only species all molecules to be handled as SMILES objects.
-nopen specifies OpenBabel not to be used except for the first input from the SDF files and the resulting output in the SMI format
-prob-level specifies a level to begin pruning molecules for probability purposes
-lip allows users to turn on Lipinski compliance of molecules (Lipinski compliance defaults to off)

NOTE: Using the -lip option is highly recommended to prevent constructing very large molecules by an unrestricted space search.

Individual criteria for Lipinski compliance can be set using the following command-line options:

-mw <value> specifies the upper bound for molecular weight
-hd <value> specifies the upper bound for the number of H-bond donors
-ha <value> specifies the upper bound for the number of H-bond acceptors
-lp <value> specifies the upper bound for the octanol-water partition coefficient

NOTE: In order to successfully specify these values -lip must be used.

Example

A typical run:

[example]$ ./esynth -lip -nopen -serial -smi-only <linkers sdfs> <rigid sdfs>

Sample:

[example]$ ./esynth -lip -nopen -serial -smi-only -prob-level 2 l-test-linker1.sdf l-test-linker2.sdf r-test-rigid1.sdf r-test-rigid2.sdf

NOTE: All data files (*.sdf) must exist in the same directory as the executable.

Example run:

[example]$ cd example-fragments/test/

[example-fragments/test] ../../esynth `cat list` -lip -nopen

This is the simplest run, options can be added as explained above. The output molecules in the SMILES format will be written to synth_output_dir

Datasets

Molecular synthesis with eSynth was validated against the Database of Useful Decoys, Enhanced (DUD-E). Active compounds for each DUD-E target were first clustered into chemically dissimilar groups. Subsequently, each group considered as the validation set was reconstructed using dissimilar molecules pooled from other clusters. This protocol mimics a real application, where one expects to discover novel compounds based on a small set of already developed bioactives.

The dataset provides input, output and validation files for each cluster associated with a given DUD-E target. Validation molecules (actives) are in the SDF format, e.g. thb-cluster1-actives.sdf. The corresponding sets of linkers and rigids (extracted from other clusters) are in thb-1-linkers.sdf and thb-1-rigids.sdf, respectively. Output molecules constructed from these fragments by esynth are in the SMILES format, e.g. thb-cluster1-esynth.smi.

The entire dataset can be downloaded using wget. First, download the list of targets esynth-dataset.lst esynth-dataset.lst and then use the following command:

[home]$ wget --base=http://brylinski.cct.lsu.edu/pub/data/esynth/ -i esynth-dataset.lst

The entire database is very large (22G). If you are interested in selected targets, use individual links:

Target Download Size MD5  

 
aa2ar aa2ar-esynth.tar.gz 2.0G ce33c03b0b57e1b2fbab1a332a022f2b  
abl1 abl1-esynth.tar.gz 12M 275929b234949a70f5033d18d7026985  
ace ace-esynth.tar.gz 2.5G 08ed302123b566b6ca848da3df84ac89  
aces aces-esynth.tar.gz 13M bfb89c8512a387a657c80b776251507c  
ada17 ada17-esynth.tar.gz 6.2M dcb8d2d3acffb59e02af91c0dd2b5cd6  
ada ada-esynth.tar.gz 4.2M f743d162e9aebaca40542aeb59080c03  
adrb1 adrb1-esynth.tar.gz 2.5M 46f97d194c7820d5929a6e3b6e7232d7  
adrb2 adrb2-esynth.tar.gz 2.0M 840f768a46ad022f7da31c8e0f2874a5  
akt1 akt1-esynth.tar.gz 778M 66b7deb2414422f8f7f668f0e95e4ce3  
akt2 akt2-esynth.tar.gz 757M 06cda354702108f5129897c8bf88c112  
aldr aldr-esynth.tar.gz 51M 85d58601b64ed9cfa553a0643ac8d338  
ampc ampc-esynth.tar.gz 975K 0ab3a39bac558a47deaf78dacbf518c4  
andr andr-esynth.tar.gz 4.0M 61d689a4d903ca0ae681fc2bf4c87292  
aofb aofb-esynth.tar.gz 1.2M 18cb3199c13b82cba5ab4286e60df723  
bace1 bace1-esynth.tar.gz 2.9M d017c44456ba1b926f293d8659f05ac1  
braf braf-esynth.tar.gz 145M ff3233117282b94e2990a1dfbd07df73  
cah2 cah2-esynth.tar.gz 12M 4a472dcb4446c5c642e5e44fbbc67b9c  
casp3 casp3-esynth.tar.gz 1.2M f5880365faa2c79716db670b8ef8d96f  
cdk2 cdk2-esynth.tar.gz 7.8M 729203d0b8d4a39cca8161b1f8dba608  
comt comt-esynth.tar.gz 588K 75c4cbdfca37b857cae351a8b1a63250  
cp2c9 cp2c9-esynth.tar.gz 3.1M 7cd0214bbfa888df3f27f4765eff8f66  
cp3a4 cp3a4-esynth.tar.gz 2.6M 73285b1c6a884fb68bf81b9df4bdb43a  
csf1r csf1r-esynth.tar.gz 3.2M 8baaf3b98bd5a0ec2df8a608f3fea728  
cxcr4 cxcr4-esynth.tar.gz 423M bf0531ca381de7da19b4f2d1fb928e07  
def def-esynth.tar.gz 974M 890ab524fbcc14fbb1880cb464c179de  
dhi1 dhi1-esynth.tar.gz 4.5M 429e83b299265a19a327e9cab48a36c3  
dpp4 dpp4-esynth.tar.gz 833M d7ee217bc4bd508b0250b5eec07f2be0  
drd3 drd3-esynth.tar.gz 23M 09514d5dd33abbf290bea2264ca2e49d  
dyr dyr-esynth.tar.gz 2.6M 18a6641e308722e22462a19cb0dba34b  
egfr egfr-esynth.tar.gz 8.0M 90893d7e5ae59751101ec56ffdeeb7b8  
esr1 esr1-esynth.tar.gz 57M e8866aa03315e4bca67ff190b57dfdf3  
esr2 esr2-esynth.tar.gz 27M 149c3858fef363893c9e0c683647e3f2  
fa10 fa10-esynth.tar.gz 6.8M 4ba8b87282f0b859c4a51a97b72d2683  
fa7 fa7-esynth.tar.gz 459K 431df5b74db22e27acc23cabde1d358a  
fabp4 fabp4-esynth.tar.gz 101K 39593f5103db60643d9575bb668e85ec  
fak1 fak1-esynth.tar.gz 1.2M 1fe69d76ecb83e80ad8fbf940d55ff92  
fkb1a fkb1a-esynth.tar.gz 1.7G 9e2d196148e22d6117fe5e73524cad89  
fnta fnta-esynth.tar.gz 2.2G 513c8c05c16ab3a2c6c27be2870430c9  
fpps fpps-esynth.tar.gz 145K 79523b05739d0461073f4bd6a1966632  
gcr gcr-esynth.tar.gz 2.6M 0ec07811f5c01695e84ed32593c1a75b  
glcm glcm-esynth.tar.gz 35M c03cc12a97d02705bb7e58276b765ddb  
gria2 gria2-esynth.tar.gz 1.6M 21e360144c276a2066fec67668bfa5d2  
grik1 grik1-esynth.tar.gz 6.0M 9576ae6861c9eedf9501f77c313b99b1  
hdac2 hdac2-esynth.tar.gz 1.7M 38d19ba57660eb4b917c2e3909262a7f  
hdac8 hdac8-esynth.tar.gz 1.3M 014fcf60141e51ad4a3d96de53191540  
hivint hivint-esynth.tar.gz 3.5M a25757f909ae7f57b057401b453745e1  
hivpr hivpr-esynth.tar.gz 19M 4befa7aec22f8539a36fae2e1b5a195a  
hivrt hivrt-esynth.tar.gz 308M 2688a83ea4b99be200a76558e6c06d30  
hmdh hmdh-esynth.tar.gz 886K 7a3f6f50fcdda26f17bf0ed34d1c000f  
hs90a hs90a-esynth.tar.gz 24M d18b41797102b3d0c6c73b903b2e16ff  
hxk4 hxk4-esynth.tar.gz 2.8M 54a1a3b3e33769de4a08dbc2e5c0a06d  
igf1r igf1r-esynth.tar.gz 848K 91f4cc2199deee0242e38a74298cc285  
inha inha-esynth.tar.gz 9.8M 244676c009be75d8cd341afc97ab59b0  
ital ital-esynth.tar.gz 811K 3d4d7f47e85d9720a517703933d413cc  
jak2 jak2-esynth.tar.gz 1.2M 46079dd767c059750cf52f4ed4f49988  
kif11 kif11-esynth.tar.gz 182M 8b4bd381092eb492d92b2f503d07b29a  
kit kit-esynth.tar.gz 20M 57643f7ac8b3bc4ef7631e6ecd546fdf  
kith kith-esynth.tar.gz 157K 6ae40ebd50cefdbf7cec4fecd8af9bbe  
kpcb kpcb-esynth.tar.gz 491M d096ad5bf07ef98dff8c844193b61471  
lck lck-esynth.tar.gz 5.8M c60ade1afba80cd853f86a99a9ca8cfc  
lkha4 lkha4-esynth.tar.gz 42M df3108ae669c48d7a1a4bc6696d4e021  
mapk2 mapk2-esynth.tar.gz 18M 1e204ed0050b0fcbf4cbd72dc7b5025f  
mcr mcr-esynth.tar.gz 158M e8bab400b16cdde0a8dc90bb332898b9  
met met-esynth.tar.gz 1.7M 2e2df11c470274c08ad03e547f103c0a  
mk01 mk01-esynth.tar.gz 922K 433c55074afe91259edd5675aa4b36d5  
mk10 mk10-esynth.tar.gz 789K 8aa798b27f6bc2374244e7eb8ebd49ef  
mk14 mk14-esynth.tar.gz 9.2M 9578ec5bbacaa837a7ea865eaadf3617  
mmp13 mmp13-esynth.tar.gz 7.6M 288177381c83f0bf5b3934cde9f3bac9  
mp2k1 mp2k1-esynth.tar.gz 1.8M d791e1f209f55e7af80aeedec709410d  
nos1 nos1-esynth.tar.gz 697K bef0747af42007f9cfca366b05238125  
nram nram-esynth.tar.gz 340M b6acaab90e92f973afdb976e9841edac  
pa2ga pa2ga-esynth.tar.gz 5.5M 7c8abaea0992bfde0a914b0b91aff671  
parp1 parp1-esynth.tar.gz 31M abb7a542da158015a50b7643b180e369  
pde5a pde5a-esynth.tar.gz 3.3M cda106641c599528fbf6189c3842bad2  
pgh1 pgh1-esynth.tar.gz 4.1M a5273f05d083d56a20872a9b8cba4d36  
pgh2 pgh2-esynth.tar.gz 10M 5a22121a29326c9f23cce655dedf53b5  
plk1 plk1-esynth.tar.gz 425K af6e10c1f88ce93a75ac9381fe1a5477  
pnph pnph-esynth.tar.gz 268M 2c67aedd75f3080e76ca6aa6929ad973  
ppara ppara-esynth.tar.gz 3.2M f18d1eb8dd514de92471434d2df33c93  
ppard ppard-esynth.tar.gz 1.9M 452077c073252b19cbf08d04ad28d9a8  
pparg pparg-esynth.tar.gz 5.4M c538120a26f3158d5b2020e23666a4e9  
prgr prgr-esynth.tar.gz 1012M cacfec8553cd557bfc8f53af87d2b589  
ptn1 ptn1-esynth.tar.gz 36M a449a1de3c119416b84f22f9409b4848  
pur2 pur2-esynth.tar.gz 6.0M 111ecb432b4db1186852930f4aaec823  
pygm pygm-esynth.tar.gz 202M ad3fa5eba1c747d181a0c0ce63055b94  
pyrd pyrd-esynth.tar.gz 1.2G 579f0c8f5c00435e1e7945661c93956c  
reni reni-esynth.tar.gz 554K 01e8be41a089791482a84f3560054453  
rock1 rock1-esynth.tar.gz 839K 976c507848b58b09c45dc30f954aec8c  
rxra rxra-esynth.tar.gz 25M 69b8a64fab40e8c84cc1ca2895e4296a  
sahh sahh-esynth.tar.gz 231M 632589b99676c893f620d8a57a0d21f5  
src src-esynth.tar.gz 26M 6c9a95b376a0a44c8aef86781c9f758d  
tgfr1 tgfr1-esynth.tar.gz 1.3M 74a4be08ba256197aaea513754e6dd3b  
thb thb-esynth.tar.gz 46M fdbadb00ed89c26dcd0313dfb6c9b120  
thrb thrb-esynth.tar.gz 1.1G fc0a9a70c1aaa5a1bc1cfc7355835167  
try1 try1-esynth.tar.gz 103M 3ffadc597f5994b850f2fcfc48cbd34b  
tryb1 tryb1-esynth.tar.gz 3.2G d418586d70708b020a72c6c19e50e42a  
tysy tysy-esynth.tar.gz 6.9M 3d522f5e737a4b8bd931974e45ef06ac  
urok urok-esynth.tar.gz 1.2M 0dfcf0191832fdb23be3bc4f13f96a03  
vgfr2 vgfr2-esynth.tar.gz 6.0M bca704a2c833ce46495996fc8c32b5b5  
wee1 wee1-esynth.tar.gz 45M 9fc0bef0d6e33b07df9d9c2cba7b1716  
xiap xiap-esynth.tar.gz 228M e017ecb6977aca69253830b3b09ccc03  

 

© Michal Brylinski
This website is hosted at the CCT