|
Tutorials
The workflow in DRDOCK can be summarized into four modules: structure preprocessing, docking/ranking 2016 FDA-approved drugs, 10 ns NPT MD simulation in water, and result rendering (Figure 1). The user-submitted protein target in PDB format is first preprocessed before docking, including the determination of the pKa and protonation state of ionizable residues, optimizing the orientation of rotatable residue side-chains, and disulfide bond detection [1]. As an optional service, the user can choose to relax the target structure, without bound co-crystallizers, by 1 ns MD simulation in water before docking. The service then docks 2016 FDA-approved drugs to the target, and features inferred from docking poses are used to calculate the LOD scores for drug ranking [1]. Users then view the docking poses online on the results page and submit the interested docking poses for 10 ns NPT MD simulations of the protein-drug complexes in water. The resulting trajectories are used for assessing the drug binding stability (i.e., whether the drugs stay in the binding site through the simulation) and binding free energy by MM/GBSA [1]. The trajectories are then updated to the results page for the playback of manipulatable 3D animations.
Figure 1. The workflow in DRDOCK. DRDOCK begins with
preprocessing the user-submitted protein target. Users can choose to directly
use the input structure for docking or optionally perform a one ns MD
simulation to minimally relax the structure in water before docking. The
preprocessed structure is then used to dock 2016 FDA-approved drugs, where the
LOD scores calculated from the docking poses are used for drug ranking. Users
then view the docking poses on the online results page and pick interesting
docking poses to simulate the dynamic protein-drug interaction in 10 ns NPT MD
simulations with water. The resulting trajectory is used for binding free
energy calculation by MM/GBSA and can be played on the online results page. The
figure is reproduced from Tsai & Yang, 2021 [1].
DRDOCK requires users to submit a well-prepared single-chain
protein structure in PDB format and the corresponding residues IDs of the
target sites, such as active sites for enzymes, that are used to identify the
closest docking poses the users should interest in (Figure 2). Since the integrity of the protein
structure is essential for meaningful MD simulations, the protein structure
should be prepared for preventing any missing backbone heavy atoms (CA, C, N,
O). The missing of any side-chain heavy atoms is allowed since they can be
automatically patched according to pre-defined amino acid templates but may
slightly distort the predicted binding poses in docking. DRDOCK requires the
protein structure to only comprise 20 standard amino acids, of which the force
constants and atom charges are already well developed and calibrated in AMBER
ff14SB force field (Maier et al., 2015) [2]. For those structures
containing some of the standard amino acid-derived residues, DRDOCK can
tolerate that by mutating the residues back to their closest standard amino
acids. Otherwise, the users should manually modify and mutate back those
residues according to the original protein sequence. The most convenient way to
achieve this is to submit the desired target protein structure with
non-standard amino acids with its corresponding original peptide sequence to
the SWISS-MODEL web server (Waterhouse et al., 2018) [3] under the “template mode”,
which can automatically mutate any residue that is different from the original
peptide sequence back and patch any missing heavy atoms, which is our
recommended way to prepare a submission-ready target structure.
Figure 2. The user submission form
and results page of DRDOCK. (A) The job submission form in the main page of DRDOCK. The user is
required to provide a well-prepared single-chain protein structure (“PDB
file”). The user can assign a specific chain ID, model number for NMR-resolved
structures, a desired segment in the specified protein chain (in “Residues
range”). Users are required to specify the IDs of residues that comprise a
target site (e.g. enzyme active site residues). Lastly, the user should provide
a valid email address for receiving the job status and results from DRDOCK. The
orange fields are required while grey fields are optional. (B) The results page of DRDOCK. The drugs are rank-ordered based on
our ranking algorithm that ranks docked poses as described in Tsai
& Yang, 2021 [1]. The drug name, 2D structure, description, basic properties, and
PubChem link are shown in the drug panel. The rank-ordered drug list in CSV
format can be downloaded (the download button in the purple bar atop). To
submit poses for MD simulations, click the flask icon to enter the simulation
submission mode, where a submission panel and a “submit” button will show up in
the functional bar atop (beneath the navigation bar). A checkmark also shows up
and replaces the flask icon for chosen drugs. Users can also switch the docking
poses to submit different poses for the same drug. Finally, click the “submit”
button in the submission panel to submit all the checked poses to the web
server. Users can browse the rank-ordered drugs based on the simulation results
[1] by toggling the “MD”
at the upper right corner (highlighted in the black box in Figure 2B). Other details
about DRDOCK, including that our file parser can allow several types of
non-standard amino acids, can be found in Tsai & Yang, 2021 [1]. The
figure is reproduced from Tsai & Yang, 2021 [1].
DRDOCK is designed for allowing real-time access to the
latest calculation results while the entire drug screening is still in the
process. The drugs and their docking poses shown in the results page are rank-ordered
based on the LOD scores, described in Tsai & Yang, 2021 [1]. The drug name, 2D structure,
drug properties, and external link to PubChem (Kim et al., 2019) [4] can be viewed in each drug
panel (Figure 3, top). Docking
poses for a drug can be selected by the switch at the top right (Figure 3, top). Any docking
pose can be further submitted for a 10 ns MD simulation (Figure 2 and 3). When the simulation is finished,
a superposed trajectory can be visualized in the embedded NGL viewer by
clicking the “view trajectory” button (Figure 3, bottom), and the corresponding features
and the estimated binding free energies for each frame can be seen in the
floated left panel (Figure 3,
bottom). For a better use of the computing resources, simulations for at most 20
selected poses can be submitted for each uploaded protein target. Finally, a
complete list of rank-ordered drugs, all the docking poses, MD simulation
trajectories, calculated features can be downloaded by clicking the “download”
button (Figure 2 and 3).
Figure 3. The
functionality and info-display in the drug panel. The user can visualize the
docking pose by clicking the eye-shape “view docking pose” button, and an
extended space showing the binding mode of protein and drug will show up (top),
clicking the “view docking features” button would render us a list of properties
derived from the docking results (middle). Clicking the video-shape of “view
trajectory” button pops open an NGL viewer (Rose et al., 2018) [5] and plays the MD animation for
the interacting drug-target complex if the pose were chosen for simulations. With
this, MM/GBSA-estimated binding free energy is listed on the side, which is
important for re-ranking the docking results. The pose and trajectory can be
downloaded from the “download” button. The figure is reproduced from Tsai
& Yang, 2021 [1].
We treat the drug screening problem as a ranking problem to
reduce the disparity coming from different docking software used, different
sampling depth and parameterization details of the drug force field, etc. The
ranking of drugs is decided by a score based on the distance of pose features’
statistics distributions, sampled within our benchmark set, from the true
binders and the decoys. The features include the original value of Vina-predicted
binding affinity (affinity), the distance between COMs of a drug pose and
assigned target residues (termed as “COM distance”), and the number of poses
belonging to the top ten of best affinities in the group clustered based on an adjacency-based
clustering method [1] (adjacency-based
clustering). To get such statistics, we pooled all the generated poses from the
16 protein-drug complexes in the training set, derived from a 2016 drug library
(see Methods in [1]), and derived the distribution of the true binding poses and
decoys for each of the features associated with each pose (Figure 4). These distributions provided the statistical basis to
annotate a sampled docking pose to be a true binder or decoy, based on its
relative probability. Specifically, we calculated a log-odds (LOD) score for
each of the sampled poses according to the following equation (Eq. 1):
|
, where x represents a sampled docking pose; f represents one of the specified features, e.g. the pose affinity, the pose’s distance to the target site,
and the size of the poses cluster. is the value of feature f for the pose located. and are the
probability distributions of feature f for docking poses sampled from the true binders and the
decoys, respectively. Noted that are binned values and therefore different
values within the same bin have the same or values.
If there is no count for a bin, a small probability (10-7) is
assigned for that bin. The LOD score takes the idea of relative entropy,
representing how more or less likely a given sampled pose can result from a
true binder than from a decoy. A higher LOD score means the pose is more likely
from a true binder. The 2016 drugs are then rank-ordered based on their best
LOD score among the 20 sampled poses.
Figure 4. The difference in the distributions of the affinities
(top), COM distance (middle), and adjacency-based clustering (bottom) between
the “answer” poses (the true binder pose, left columns) and decoys (right
columns).
While docking serves as a quick screening tool for in
silico drug screening, as a trade-off, the lack of considering the
solvation effect and molecular dynamics involved in the protein-ligand
interaction introduces false positives [6].
Explicit solvent MD simulations using classical force fields optimized for
biological molecules are recognized to better describe the protein-ligand
binding dynamics in a timescale that allowed for weak binders to dissociate [7, 8, 9, 10]. The MD
simulations for user-selected protein-pose complexes were performed in an
explicit solvent, 310K, and 1 atm NPT ensemble for 10 ns after preliminary
energy minimization and equilibration of the system. During the simulation, a
snapshot is taken every 0.1 ns, resulting in a trajectory composed of 100
frames. The simulation results were summarized using the designed indicators,
including the mean binding free energy from MM/GBSA calculation over sampled
snapshots, the drug leaving time when the drug center of mass moving away from
the target site more than 10 Å, and the largest distance of the drug COM to the
target sites sampled during the simulations. A simulated drug got a higher rank
if having a favorable mean binding free energy and staying long enough in the
target site (or never leaving) during the 10 ns simulation. For more detail,
please refer to [1].
[1] Tsai, K.-L., Yang, L.-W., 2021. DRDOCK: A drug repurposing platform integrating automated docking, simulations and a log-odds-based drug ranking scheme. doi:10.1101/2021.01.31.429052
[2] Maier, J.A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K.E., Simmerling, C., 2015. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation 11, 3696–3713. doi:10.1021/acs.jctc.5b00255
[3] Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T., 2018. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Research 46. doi:10.1093/nar/gky427
[4] Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2018. PubChem 2019 update: improved access to chemical data. Nucleic Acids Research 47. doi:10.1093/nar/gky1033
[5] Rose, A.S., Bradley, A.R., Valasatava, Y., Duarte, J.M., Prlić, A., Rose, P.W., 2018. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34, 3755–3758. doi:10.1093/bioinformatics/bty419
[6] Pantsar, T., Poso, A., 2018. Binding Affinity via Docking: Fact and Fiction. Molecules 23, 1899. doi:10.3390/molecules23081899
[7] Huang, D., Caflisch, A., 2011. Small Molecule Binding to Proteins: Affinity and Binding/Unbinding Dynamics from Atomistic Simulations. ChemMedChem 6, 1578–1580. doi:10.1002/cmdc.201100237
[8] Liu, K., Kokubo, H., 2017. Exploring the Stability of Ligand Binding Modes to Proteins by Molecular Dynamics Simulations: A Cross-docking Study. Journal of Chemical Information and Modeling 57, 2514–2522. doi:10.1021/acs.jcim.7b00412
[9] Liu, P.-F., Tsai, K.-L., Hsu, C.-J., Tsai, W.-L., Cheng, J.-S., Chang, H.-W., Shiau, C.-W., Goan, Y.-G., Tseng, H.-H., Wu, C.-H., Reed, J.C., Yang, L.-W., Shu, C.-W., 2018. Drug Repurposing Screening Identifies Tioconazole as an ATG4 Inhibitor that Suppresses Autophagy and Sensitizes Cancer Cells to Chemotherapy. Theranostics 8, 830–845. doi:10.7150/thno.22012
[10] Mollica, L., Decherchi, S., Zia, S.R., Gaspari, R., Cavalli, A., Rocchia, W., 2015. Kinetics of protein-ligand unbinding via smoothed potential molecular dynamics simulations. Scientific Reports 5. doi:10.1038/srep11539
Reference: Tsai,KL. and and Yang,LW. (2021) DRDOCK: A DrugRepurposing platform integrating automated docking, simulation and a log-odds-based drug ranking scheme. doi: 10.1101/2021.01.31.429052 (To be peer-reviewed) Contact: The server is maintained by the Yang Lab at the Institute of Bioinformatics and Structural Biology at National Tsing Hua University, Taiwan. For questions and comments please contact Kun-Lin Tsai.
|