|
Tutorials
1. Submit protein target for drug screening
2. Reassess the drug binding by MD simulations
3. User’s guide
3-1. Workflow
3-2. User submission
3-3. Result rendering
3-4. Algorithm
4. References
1.
Submit protein target
for drug screening
DRDOCK automatically performs drug screening for 2016
FDA-approved drugs on a user-submitted protein target. To submit a protein
target, DRDOCK accepts two types of submission, 1) a 3D protein structure file
in PDB format and 2) a protein sequence composing one-letter amino acid codes.
In this tutorial, we will target the SARS-CoV-2 non-structural protein 16
(Nsp16) for drug screening to find drugs competing for the binding site with
the methyl donor S-Adenosylmethionine (SAM) required for the post-translational
modification of the 5’ cap structure of the nascent transcribed viral mRNA.
To submit 1) a 3D protein
structure, a file containing the atom coordinates of the protein target in a standard
PDB format file is required. An example PDB file of SARS-CoV-2 Nsp16 can be
downloaded on the Protein Data Bank (PDB code: 6WVN) or directly by this link. The PDB file contains (the information can be found on the PDB page with the
structure shown in their 3D structure viewer) two peptide chains, one for
the non-structural protein 16 protein (Nsp16, chain A) and the other one for
the non-structural protein 10 protein (Nsp10, chain B), and co-crystallized
with several small molecules and ions, including SAM that accompanied in chain
A with a residue ID of 7112. This PDB file is used as the starting point for
DRDOCK submission.
Now, visit the home page of
DRDOCK where a form by default for protein structure file submission is
presented. Point to the button in the first item of the form labeled with “PDB
file” and marked with an orange brick on the left, which indicates the field of
the item is a required input for submission. Click the upload button and choose
the downloaded PDF file, and the file name for submission will be presented on
the right of the button.
Our target protein, nsp16, is
presented in the PDB file as the “A” chain. For a PDB file containing multiple
peptide chains, specify the chain ID, which is “A” in our case, in the left
field of the second item labeled with “Chain ID”. If this optional field is not
filled, by default, DRDOCK selects the first peptide chain presented in the PDB
file, which is still the “A” chain in our case.
Next, DRDOCK requires the user
to assign one or more target residues, where the center of mass will be the
target site where the drugs are desired to dock. Assign the target residue in
the field of the fourth item as “7112”, the residue ID of SAM in the PDB file.
If multiple residues (for example, the catalytic triad) are desired for target
site assignment, type the residue IDs separated by comma (,) or dash (-) if the
residue IDs are consecutive.
If you prefer receiving an email
notification of the link to the result page after a successful submission and
when the drug screening is completed (see below), key in an email address in
the optional field of the fifth item of the form. If you would like to let
DRDOCK perform a short MD simulation to relax the protein structure a bit in
the water box before the drug screening, click the “Relax the structure by MD
before docking” option.
Click the “submit” button below
the form for job submission. A processing icon will appear with the status
indicated below.
If the input protein structure
and provided information are processed successfully, DRDOCK will automatically
redirect to the result page, with the URL as “https://dyn.life.nthu.edu.tw/drdock/results/<your job ID>/”. You can memo the job
ID and re-visit the result page later after the drug screening is completed.
If you provide an email address in the submission form, you
will also receive an email notification once your input has been successfully processed,
with the link to the result page included.
To check the job status, click the “Server Status” tab in
the navigation bar above. All of the submitted job IDs queued or running in the
server, as indicated by “Q” or “R” in the “Status” column, are listed.
If the submission fails to be processed, a notification page
will be shown, where the possible reasons and solutions are provided, with the
processing log and error messages presented in the box below.
To submit by 2) the protein target sequence, a peptide
sequence encoded with one letter amino acid code should be provided. An example
can be found in PDB code 6WVN and accessed by this link in FASTA format. Copy the first two lines
since the nsp16 protein is in the “A” chain. Go to the home page. Click “AlphaFold2
modeling” below the submission form to switch to sequence submission mode.
Paste the copied FASTA sequence to the text field below the “Protein sequence”
label.
Because the FASTA sequence of the protein target does not
contain the coordinate information of the SAM molecule, we use residue IDs 104
and 135 instead (Notice the residue ID is numbered according to the submitted
peptide sequence and starting from 1), where the center of mass is used as the
target site for approximation of the location of SAM. Click “Submit” for job
submission. The target protein structure will be modeled according to the submitted
protein sequence by AlphaFold2 before subsequent drug screening.
Once the drug screening is completed, the state icon will
change from “Running”
to “Complete” in the top left below the purple functional bar.
If an optional email address is provided in the submission
form, an email notification of job completion will also be sent.
The drug screening result is presented following the ranked
order of drugs according to the LOD scores of the top-ranked poses in each drug
and the ranking rules of DRDOCK, in the “docking” tab of the top-right toggle.
To view the docking pose with the protein target in the 3D
structure, click the “view pose” button represented with an eye icon. The
target protein is shown in cartoon, the drug docking pose in thick sticks, and
the protein residues contacted with the drug docking pose in thin sticks. The
target site is indicated by a yellow ball.
Rotate the 3D structure by mouse-clicking and dragging
inside the area of the expanded structure viewer. Drag and move the entire
complex by right-clicking and dragging. Resize the molecules using the mouse
wheel.
Click the “list icon” to show calculated feature values,
including pose LOD score, affinity, the distance to the target site, and the
poses clustering size, derived from the docking results.
Click the “download icon” to download the preprocessed protein
target PDB file used for docking, the resulting docking pose (in PDB and PDBQT
format), and the run log file outputted by the docking software.
Click the pose switch to browse other lower-ranked poses.
2.
Reassess the drug
bindings by MD simulations
DRDOCK provides each submitted protein target with 20
selected protein-drug docking poses for the binding reassessment using MD
simulations. To submit a drug docking pose for MD simulation and binding free
energy calculation, click the “MD icon” for poses of interest.
For selected poses, the “MD icon” will be marked with a tick.
Click the tick again will cancel the selection.
When selecting a pose for MD simulation, the current
selected and the remaining available number of poses for simulations will be
shown in the above functional bar.
You can also choose to let DRDOCK automatically select the top-ranked
pose of each top-ranked drug for the remaining quota.
To unchoose all the selected
poses, click the close icon on the right.
To submit the selected poses for MD simulations, click the “submit”
button.
When the submitted simulations are completed, the results
can be accessed under the same URL of the result page, with the “MD icon”
replaced with a “video player icon”.
If an optional email address is provided in the submission
form, an email notification of job completion with the link to the result page
will also be sent.
You can also click the “MD” toggle on the top-right corner
to only show the drugs and poses with simulations results, rank-ordered
according to the calculated binding free energy based on the simulations
trajectories and the ranking rules of DRDOCK.
Click the “video player icon” to open the simulation player.
Play the simulation trajectory by clicking the “play button” on the player
control island at the right-bottom corner.
Drag the “progress bar” to switch to the specific snapshot
of the trajectory as indicated by the number shown on the right of the player
control island.
The representation of the protein target, drug, and target
site is the same as that in the view of the docking pose. The feature island
floating on the top-left corner shows the calculated feature values per
snapshot (the top panel) and the summarized feature values for the entire trajectory
(the bottom panel).
The trajectory is allowed for download, including the
topology files (supplied with both AMBER (.parm7) and
CHARMM (.psf) formats) and the coordinates files
(supplied with both NetCDF and DCD formats). These
files are added to the folder from the updated download link. Click the “download
icon” to download the trajectory files.
The workflow in DRDOCK can be summarized into four modules: structure preprocessing, docking/ranking 2016 FDA-approved drugs, 10 ns NPT MD simulation in water, and result rendering (Figure 1). The user-submitted protein target in PDB format is first preprocessed before docking, including the determination of the pKa and protonation state of ionizable residues, optimizing the orientation of rotatable residue side-chains, and disulfide bond detection [1]. As an optional service, the user can choose to relax the target structure, without bound co-crystallizers, by 1 ns MD simulation in water before docking. The service then docks 2016 FDA-approved drugs to the target, and features inferred from docking poses are used to calculate the LOD scores for drug ranking [1]. Users then view the docking poses online on the results page and submit the interested docking poses for 10 ns NPT MD simulations of the protein-drug complexes in water. The resulting trajectories are used for assessing the drug binding stability (i.e., whether the drugs stay in the binding site through the simulation) and binding free energy by MM/GBSA [1]. The trajectories are then updated to the results page for the playback of manipulatable 3D animations.
Figure 1. The workflow in DRDOCK. DRDOCK begins with
preprocessing the user-submitted protein target. Users can choose to directly
use the input structure for docking or optionally perform a one ns MD
simulation to minimally relax the structure in water before docking. The
preprocessed structure is then used to dock 2016 FDA-approved drugs, where the
LOD scores calculated from the docking poses are used for drug ranking. Users
then view the docking poses on the online results page and pick interesting
docking poses to simulate the dynamic protein-drug interaction in 10 ns NPT MD
simulations with water. The resulting trajectory is used for binding free energy
calculation by MM/GBSA and can be played on the online results page. The figure
is reproduced from Tsai & Yang,
2021 [1].
DRDOCK requires users to submit a well-prepared single-chain
protein structure in PDB format and the corresponding residues IDs of the
target sites, such as active sites for enzymes, that are used to identify the
closest docking poses the users should interest in (Figure
2). Since the integrity of the protein structure is essential
for meaningful MD simulations, the protein structure should be prepared for
preventing any missing backbone heavy atoms (CA, C, N, O). The missing of any
side-chain heavy atoms is allowed since they can be automatically patched
according to pre-defined amino acid templates but may slightly distort the
predicted binding poses in docking. DRDOCK requires the protein structure to
only comprise 20 standard amino acids, of which the force constants and atom
charges are already well developed and calibrated in AMBER ff14SB force field (Maier et al., 2015) [2]. For those structures containing some of the standard
amino acid-derived residues, DRDOCK can tolerate that by mutating the residues
back to their closest standard amino acids. Otherwise, the users should
manually modify and mutate back those residues according to the original
protein sequence. The most convenient way to achieve this is to submit the
desired target protein structure with non-standard amino acids with its
corresponding original peptide sequence to the SWISS-MODEL web server (Waterhouse et al., 2018) [3] under the “template mode”, which can automatically mutate
any residue that is different from the original peptide sequence back and patch
any missing heavy atoms, which is our recommended way to prepare a
submission-ready target structure.
Figure 2. The user submission form and results page of DRDOCK. (A) The job submission form in the main page of DRDOCK. The user is required to provide a well-prepared single-chain protein structure (“PDB file”). The user can assign a specific chain ID, model number for NMR-resolved structures, a desired segment in the specified protein chain (in “Residues range”). Users are required to specify the IDs of residues that comprise a target site (e.g. enzyme active site residues). Lastly, the user should provide a valid email address for receiving the job status and results from DRDOCK. The orange fields are required while grey fields are optional. (B) The results page of DRDOCK. The drugs are rank-ordered based on our ranking algorithm that ranks docked poses as described in Tsai & Yang, 2021 [1]. The drug name, 2D structure, description, basic properties, and PubChem link are shown in the drug panel. The rank-ordered drug list in CSV format can be downloaded (the download button in the purple bar atop). To submit poses for MD simulations, click the flask icon to enter the simulation submission mode, where a submission panel and a “submit” button will show up in the functional bar atop (beneath the navigation bar). A checkmark also shows up and replaces the flask icon for chosen drugs. Users can also switch the docking poses to submit different poses for the same drug. Finally, click the “submit” button in the submission panel to submit all the checked poses to the web server. Users can browse the rank-ordered drugs based on the simulation results [1] by toggling the “MD” at the upper right corner (highlighted in the black box in Figure 2B). Other details about DRDOCK, including that our file parser can allow several types of non-standard amino acids, can be found in Tsai & Yang, 2021 [1]. The figure is reproduced from Tsai & Yang, 2021 [1].
DRDOCK is designed for allowing real-time access to the
latest calculation results while the entire drug screening is still in the
process. The drugs and their docking poses shown in the results page are
rank-ordered based on the LOD scores, described in Tsai
& Yang, 2021 [1]. The
drug name, 2D structure, drug properties, and external link to PubChem (Kim et al., 2019) [4] can be viewed in each drug panel (Figure
3, top). Docking poses for a drug can be selected by the
switch at the top right (Figure 3, top).
Any docking pose can be further submitted for a 10 ns MD simulation (Figure 2 and 3). When
the simulation is finished, a superposed trajectory can be visualized in the
embedded NGL viewer by clicking the “view trajectory” button (Figure 3, bottom), and the corresponding features and the estimated
binding free energies for each frame can be seen in the floated left panel (Figure 3, bottom). For a better use of the computing resources,
simulations for at most 20 selected poses can be submitted for each uploaded
protein target. Finally, a complete list of rank-ordered drugs, all the docking
poses, MD simulation trajectories, calculated features can be downloaded by
clicking the “download” button (Figure 2 and 3).
Figure 3. The functionality
and info-display in the drug panel. The user can visualize the
docking pose by clicking the eye-shape “view docking pose” button, and an
extended space showing the binding mode of protein and drug will show up (top),
clicking the “view docking features” button would render us a list of
properties derived from the docking results (middle). Clicking the video-shape
of “view trajectory” button pops open an NGL viewer (Rose
et al., 2018) [5] and
plays the MD animation for the interacting drug-target complex if the pose were
chosen for simulations. With this, MM/GBSA-estimated binding free energy is
listed on the side, which is important for re-ranking the docking results. The
pose and trajectory can be downloaded from the “download” button. The figure is
reproduced from Tsai & Yang, 2021
[1].
We treat the drug screening problem as a ranking problem to
reduce the disparity coming from different docking software used, different sampling
depth and parameterization details of the drug force field, etc. The ranking of
drugs is decided by a score based on the distance of pose features’ statistics
distributions, sampled within our benchmark set, from the true binders and the
decoys. The features include the original value of Vina-predicted binding
affinity (affinity), the distance between COMs of a drug pose and assigned
target residues (termed as “COM distance”), and the number of poses belonging
to the top ten of best affinities in the group clustered based on an adjacency-based
clustering method [1] (adjacency-based
clustering). To get such statistics, we pooled all the generated poses from the
16 protein-drug complexes in the training set, derived from a 2016 drug library
(see Methods in [1]), and derived the distribution of the true binding poses
and decoys for each of the features associated with each pose (Figure 4). These distributions provided the statistical basis to
annotate a sampled docking pose to be a true binder or decoy, based on its
relative probability. Specifically, we calculated a log-odds (LOD) score for
each of the sampled poses according to the following equation (Eq. 1):
|
, where x represents a sampled docking pose; f represents one of the specified features, e.g. the pose affinity, the pose’s distance to the target
site, and the size of the poses cluster. is the value of feature f for the pose located. and are the
probability distributions of feature f for docking poses sampled from the true binders and the
decoys, respectively. Noted that are binned values and therefore different
values within the same bin have the same or values.
If there is no count for a bin, a small probability (10-7) is
assigned for that bin. The LOD score takes the idea of relative entropy,
representing how more or less likely a given sampled pose can result from a
true binder than from a decoy. A higher LOD score means the pose is more likely
from a true binder. The 2016 drugs are then rank-ordered based on their best
LOD score among the 20 sampled poses.
Figure 4. The difference in the distributions of the affinities
(top), COM distance (middle), and adjacency-based clustering (bottom) between
the “answer” poses (the true binder pose, left columns) and decoys (right
columns).
While docking serves as a quick screening tool for in
silico drug screening, as a trade-off, the lack of considering the
solvation effect and molecular dynamics involved in the protein-ligand
interaction introduces false positives [6].
Explicit solvent MD simulations using classical force fields optimized for
biological molecules are recognized to better describe the protein-ligand
binding dynamics in a timescale that allowed for weak binders to dissociate [7, 8, 9, 10]. The MD
simulations for user-selected protein-pose complexes were performed in an
explicit solvent, 310K, and 1 atm NPT ensemble for 10 ns after preliminary
energy minimization and equilibration of the system. During the simulation, a
snapshot is taken every 0.1 ns, resulting in a trajectory composed of 100
frames. The simulation results were summarized using the designed indicators,
including the mean binding free energy from MM/GBSA calculation over sampled
snapshots, the drug leaving time when the drug center of mass moving away from
the target site more than 10 Å, and the largest distance of the drug COM to the
target sites sampled during the simulations. A simulated drug got a higher rank
if having a favorable mean binding free energy and staying long enough in the
target site (or never leaving) during the 10 ns simulation. For more detail,
please refer to [1].
[1] Tsai, K.-L., Yang, L.-W., 2021. DRDOCK: A drug repurposing platform integrating automated docking, simulations and a log-odds-based drug ranking scheme. doi:10.1101/2021.01.31.429052
[2] Maier, J.A., Martinez, C., Kasavajhala, K., Wickstrom, L., Hauser, K.E., Simmerling, C., 2015. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation 11, 3696–3713. doi:10.1021/acs.jctc.5b00255
[3] Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T., 2018. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Research 46. doi:10.1093/nar/gky427
[4] Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2018. PubChem 2019 update: improved access to chemical data. Nucleic Acids Research 47. doi:10.1093/nar/gky1033
[5] Rose, A.S., Bradley, A.R., Valasatava, Y., Duarte, J.M., Prlić, A., Rose, P.W., 2018. NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34, 3755–3758. doi:10.1093/bioinformatics/bty419
[6] Pantsar, T., Poso, A., 2018. Binding Affinity via Docking: Fact and Fiction. Molecules 23, 1899. doi:10.3390/molecules23081899
[7] Huang, D., Caflisch, A., 2011. Small Molecule Binding to Proteins: Affinity and Binding/Unbinding Dynamics from Atomistic Simulations. ChemMedChem 6, 1578–1580. doi:10.1002/cmdc.201100237
[8] Liu, K., Kokubo, H., 2017. Exploring the Stability of Ligand Binding Modes to Proteins by Molecular Dynamics Simulations: A Cross-docking Study. Journal of Chemical Information and Modeling 57, 2514–2522. doi:10.1021/acs.jcim.7b00412
[9] Liu, P.-F., Tsai, K.-L., Hsu, C.-J., Tsai, W.-L., Cheng, J.-S., Chang, H.-W., Shiau, C.-W., Goan, Y.-G., Tseng, H.-H., Wu, C.-H., Reed, J.C., Yang, L.-W., Shu, C.-W., 2018. Drug Repurposing Screening Identifies Tioconazole as an ATG4 Inhibitor that Suppresses Autophagy and Sensitizes Cancer Cells to Chemotherapy. Theranostics 8, 830–845. doi:10.7150/thno.22012
[10] Mollica, L., Decherchi, S., Zia, S.R., Gaspari, R., Cavalli, A., Rocchia, W., 2015. Kinetics of protein-ligand unbinding via smoothed potential molecular dynamics simulations. Scientific Reports 5. doi:10.1038/srep11539
Reference: Tsai,KL. and and Yang,LW. (2021) DRDOCK: A DrugRepurposing platform integrating automated docking, simulation and a log-odds-based drug ranking scheme. doi: 10.1101/2021.01.31.429052 (To be peer-reviewed) Contact: The server is maintained by the Yang Lab at the Institute of Bioinformatics and Structural Biology at National Tsing Hua University, Taiwan. For questions and comments please contact Kun-Lin Tsai.
|