Recent Research Papers

INTRODUCTION

Here are the peer reviewed results from Folding@home. We stress that it can take quite a while to go from a result to a published peer review article (often as much as a year). Also, these articles are written for fellow scientists, so they are fairly technical. However, these papers represent our progress to date that's publicly available, with lots more on the way.


TABLE OF CONTENTS

(Reverse chronological order)

- 55. N-Body simulation on GPUs
- 54.Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics
- 53. Heterogeneity Even at the Speed Limit of Folding: Large-scale Molecular Dynamics Study of a Fast-folding Variant of the Villin Headpiece
- 52. Control of Membrane Fusion Mechanism by Lipid Composition: Predictions from Ensemble Molecular Dynamics.
- 51. Persistent voids: a new structural metric for membrane fusion.
- 50. Protein folding under confinement: a role for solvent.
- 49. Automatic State Decomposition Algorithm.
- 48. Storage@home: Petascale Distributed Storage
- 47. Predicting structure and dynamics of loosely-ordered protein complexes: influenza hemagglutinin fusion peptide.
- 46. A Bayesian Update Method for Adaptive Weighted Sampling.
- 45. Local structure formation in simulations of two small proteins.
- 44. Kinetic Definition of Protein Folding Transition State Ensembles and Reaction Coordinates.
- 43. Parallelized Over Parts Computation of Absolute Binding Free Energy with Docking and Molecular Dynamics.
- 42. Folding Simulations of the Villin Headpiece in All-Atom Detail.
- 41. Ensemble molecular dynamics yields submillisecond kinetics and intermediates of membrane fusion
- 40. Electric Fields at the Active Site of an Enzyme: Direct Comparison of Experiment with Theory.
- 39. A novel approach for computational alanine scanning: application to the p53 oligomerization domain.
- 38. Validation of Markov state models using Shannon's entropy.
- 37. On the role of chemical detail in simulating protein folding kinetics.
- 36. Nanotube confinement denatures protein helices.
- 35. The solvation interface is a determining factor in peptide conformational preferences.
- 34. Can conformational change be described by only a few normal modes?
- 33. How large is alpha-helix in solution? Studies of the radii of gyration of helical peptides by SAXS and MD.
- 32. Error Analysis in Markovian State Models for protein folding.
- 31. Direct calculation of the binding free energies of FKBP ligands using the Fujitsu BioServer massively parallel computer.
- 30. A New Set of Molecular Mechanics Parameters for Hydroxyproline and Its Use in Molecular Dynamics Simulations of Collagen-Like Peptides.
- 29. Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration.
- 28. Solvation free energies of amino acid side chain analogs for common molecular mechanics water models.
- 27. Foldamer dynamics expressed via Markov state models. I. Explicit solvent molecular-dynamics simulations in acetonitrile, chloroform, methanol, and water.
- 26. Foldamer dynamics expressed via Markov state models. II. State space decomposition.
- 25. Unusual compactness of a polyproline type II structure.
- 24. How well can simulation predict protein folding kinetics and thermodynamics?
- 23. Empirical Force-Field Assessment: The Interplay Between Backbone Torsions and Noncovalent Term Scaling.
- 22. Exploring the Helix-Coil Transition via All-atom Equilibrium Ensemble Simulations.
- 21. Does Water Play a Structural Role in the Folding of Small Nucleic Acids?
- 20. Dimerization of the p53 Oligomerization Domain: Identification of a Folding Nucleus by Molecular Dynamics Simulations.
- 19. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a tryptophan zipper beta hairpin.
- 18. Simulations of the role of water in the protein-folding mechanism.
- 17. Trp zipper folding kinetics by molecular dynamics and temperature-jump spectroscopy.
- 16. Does Native State Topology Determine the RNA Folding Mechanism?
- 15. Structural correspondence between the alpha-helix and the random-flight chain resolves how unfolded proteins can have native-like properties.
- 14. Equilibrium Free Energies from Nonequilibrium Measurements Using Maximum-Likelihood Methods.
- 13. Extremely precise free energy calculations of amino acid side chain analogs: Comparison of common molecular mechanics force fields for proteins.
- 12. Solvent Viscosity Dependence of the Folding Rate of a Small Protein: Distributed Computing Study.
- 11. Insights Into Nucleic Acid Conformational Dynamics from Massively Parallel Stochastic Simulations.
- 10. Multiplexed-Replica Exchange Molecular Dynamics Method for Protein Folding Simulation.
- 9. The Trp Cage: Folding Kinetics and Unfolded State Topology via Molecular Dynamics Simulations.
- 8. Absolute comparison of simulated and experimental protein-folding dynamics.
- 7. Native-like Mean Structure in the Unfolded Ensemble of Small Proteins.
- 6. Simulation of Folding of a Small Alpha-helical Protein in Atomistic Detail using Worldwidedistributed Computing.
- 5. Folding@home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology.
- 4. Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing.
- 3. b-Hairpin Folding Simulations in Atomistic Detail Using an Implicit Solvent Model.
- 2. Mathematical Foundations of ensemble dynamics.
- 1. Screen savers of the world, Unite!

55. N-Body simulation on GPUs

Erich Elsen, Mike Houston, V. Vishal, Eric Darve, Pat Hanrahan, and Vijay Pande. Proceedings of the 2006 ACM/IEEE conference on Supercomputing (2006).

SUMMARY. This paper is a bit old and should have been included on this web page some time ago. It details our first efforts with GPU's for molecular dynamics. This work lead to the GPU1 FAH core. We have other papers in the works desdribing the successor to the GPU1 core as well as the PS3 core.

ABSTRACT. Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientic computing. In this poster we show how graphics processors can be used for N-body simulations to obtain large improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs 25x an Intel Pentium4, and 2x specialized hardware such as GRAPE-6A, but at a fraction of the cost. Furthermore, the wide availability of GPUs has signicant implications for cluster computing and distributed computing efforts like Folding@Home.

You can find more information at the DOI for ACM or download the preprint PDF.

54.Calculation of the distribution of eigenvalues and eigenvectors in Markovian state models for molecular dynamics

Nina Singhal Heinrichs and Vijay S. Pande.

SUMMARY. This paper lays out how one can revamp FAH calculations to make them considerably more efficient, perhaps by as much as 1000x reduction in the needed computer time. The basic idea is that we use FAH to build a model of the problem in question (a so-called Markovian state model or MSM) and then use the MSM to predict experimental quantities. When using an MSM to make predictions, the question is usually have we done enough computation to make a sufficiently good (precise) prediction. By calculating the uncertainty (precision) on the fly, we can now send FAH clients to the parts of the problem which are uncertainty limiting. We show that this approach can be considerably more efficiently (1000x) than just running with even sampling. This approach is being incorporated into the FAH server code. One exciting ramfication of this work is that while MSM's were originally formulated as a means to use a large distributed cluster (like Folding@home with 300,000 processors) to try to reproduce what a single, hypothetical machine which is 300,000x faster (which doesn't exist) could do. However, even if that 300,000x faster machine did exist, we show that our approach would be more efficient than a single, long trajectory, suggesting that MSM-based methods should be useful for a very broad set of computer hardware, not just distributed computing platforms.

53. Heterogeneity Even at the Speed Limit of Folding: Large-scale Molecular Dynamics Study of a Fast-folding Variant of the Villin Headpiece

D. Ensign, P. M. Kasson, and V. S. Pande. Journal of Molecular Biology (2007)

SUMMARY: This paper describes the first set of results generated using the SMP clients. The main advantage of using SMP for these sorts of calculations is that the amount of computation that one client can do is several times larger than the traditional clients. This means that our simulations can get many times longer that before; in fact, this has allowed us to generate several hundred folding trajectories of the fastest-folding protein known, the HP35-NleNle variant of the villin headpiece subdomain. In this paper, because our simulation time scales compare well to the 700-nanosecond experimental folding time of this protein, AND we've generated enough trajectories to get good statistics, we can shed some light on the experimental results. To summarize the result, the first helix of the protein was thought to be highly structured in the unfolded state of the protein; we've suggested that structure in this part of the molecule is not enough to lead to fast folding, and that longer time scales than the 700-ns mark may be present in this system.

Check out the movie: it shows some simulation we did for this work, although watching one trajectory is emphatically NOT statistically significant! Some more visualizations of villin from our earlier work can be found on this page.

We have also made the raw data available to researchers on a SimTk.org page. This site inclues the raw data, as well as scripts to automate the process and a VMD plugin to allow for browsing of the data. Please contact simbiosfeedback@stanford.edu if you need help with doing this.

ABSTRACT: We have performed molecular dynamics simulations on a set of nine unfolded conformations of the fastest-folding protein yet discovered, a variant of the villin headpiece subdomain (HP-35 NleNle). The simulations were generated using a new distributed computing method, yielding hundreds of trajectories each on a time scale comparable to the experimental folding time, despite the large (10,000 atom) size of the simulation system. This strategy eliminates the need to assume a two-state kinetic model or to build a Markov state model. The relaxation to the folded state at 300 K from the unfolded configurations (generated by simulation at 373 K) was monitored by a method intended to reflect the experimental observable (quenching of tryptophan by histidine). We also monitored the relaxation to the native state by directly comparing structural snapshots with the native state. The rate of relaxation to the native state and the number of resolvable kinetic time scales both depend upon starting structure. Moreover, starting structures with folding rates most similar to experiment show some native-like structure in the N-terminal helix (helix 1) and the phenylalanine residues constituting the hydrophobic core, suggesting that these elements may exist in the experimentally relevant unfolded state. Our large-scale simulation data reveal kinetic complexity not resolved in the experimental data. Based on these findings, we propose additional experiments to further probe the kinetics of villin folding.

52. Control of Membrane Fusion Mechanism by Lipid Composition: Predictions from Ensemble Molecular Dynamics.

P. M. Kasson and V. S. Pande. PLoS Computational Biology (2007)

SUMMARY: Here, we use molecular molecular-dynamics simulations of lipid vesicle fusion under different lipid compositions to generate a more detailed explanation for how composition controls membrane fusion. We predict that lipid composition affects both the initial process of forming a contact stalk between two vesicles and the formation of a metastable hemifused intermediate. These two roles act in concert to change both the rate of fusion and the level of detectable fusion intermediates. We also present initial results on fusion of vesicles at different membrane curvatures. Recent experimental results suggest that the creation of highly curved membranes is important to fusion of synaptic vesicles. Our simulations cover a curvature regime similar to these experimental systems. In combination with previous results, we predict that the effect of lipid composition on fusion is general across different membrane curvatures, but that the rate of fusion is controlled by both composition and curvature.

ABSTRACT: Membrane fusion is critical to biological processes such as viral infection, endocrine hormone secretion, and neurotransmission, yet the precise mechanistic details of the fusion process remain unknown. Current experimental and computational model systems approximate the complex physiological membrane environment for fusion using one or a few protein and lipid species. Here, we report results of a computational model system for fusion in which the ratio of lipid components was systematically varied, using thousands of simulations of up to a microsecond in length to predict the effects of lipid composition on both fusion kinetics and mechanism. In our simulations, increased phosphatidylcholine content in vesicles causes increased activation energies for formation of the initial stalk-like intermediate for fusion and of hemifusion intermediates, in accordance with previous continuum-mechanics theoretical treatments. We also use our large simulation dataset to quantitatively compare the mechanism by which vesicles fuse at different lipid compositions, showing a significant difference in fusion kinetics and mechanism at different compositions simulated. As physiological membranes have different compositions in the inner and outer leaflets, we examine the effect of such asymmetry, as well as the effect of membrane curvature on fusion. These predicted effects of lipid composition on fusion mechanism both underscore the way in which experimental model system construction may affect the observed mechanism of fusion and illustrate a potential mechanism for cellular regulation of the fusion process by altering membrane composition.

51. Persistent voids: a new structural metric for membrane fusion.

P. M. Kasson, A. Zomorodian, S. Park, N. Singhal, L. J. Guibas, and V. S. Pande. Bioinformatics (2007)

SUMMARY: One challenge in analyzing membrane fusion pathways is simply characterizing the structural intermediates involved. This paper describes use of methods from computational topology and geometry to better measure changes in vesicle structure relevant to fusion.

ABSTRACT: MOTIVATION: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids. RESULTS: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.

50. Protein folding under confinement: a role for solvent.

D. Lucent, V. Vishal, V. S. Pande. Proceedings of the National Academy of Sciences (2007)

Local PDF

SUMMARY: When proteins fold inside a cell, they are frequently subjected to various amounts of spatial confinement. Specifically, misfolded or unfolded proteins can be encapsulated inside a helper molecule called a chaperonin. These chaperonins are involved with helping proteins fold inside cells. Here we investigate how confinement affects protein folding using a simple model: a fast folding mini-protein confined to a nanopore. We find that if we confine the protein, but allow the surrounding water molecules to pass freely in and out of the nanopore, the protein is more likely to reach the folded state. On the other hand, if we make the nanopore water-tight, we find that the protein is less likely to fold. Specifically it is pushed into a small non-native globule. This suggests that when thinking of folding inside a confined space (like a chaperonin) it is important to remember both protein and water are confined, and this confined water can have an affect on protein folding.

ABSTRACT: Although most experimental and theoretical studies of protein folding involve proteins in vitro, the effects of spatial confinement may complicate protein folding in vivo. In this study, we examine the folding dynamics of villin (a small fast folding protein) with explicit solvent confined to an inert nanopore. We have calculated the probability of folding before unfolding (P fold) under various confinement regimes. Using P fold correlation techniques, we observed two competing effects. Confining protein alone promotes folding by destabilizing the unfolded state. In contrast, confining both protein and solvent gives rise to a solvent-mediated effect that destabilizes the native state. When both protein and solvent are confined we see unfolding to a compact unfolded state different from the unfolded state seen in bulk. Thus, we demonstrate that the confinement of solvent has a significant impact on protein kinetics and thermodynamics. We conclude with a discussion of the implications of these results for folding in confined environments such as the chaperonin cavity in vivo.

49. Automatic State Decomposition Algorithm.

J. Chodera, N. Singhal, V. S. Pande, K. Dill, and W. Swope. Journal of Chemical Physics, (2007)

Local PDF

SUMMARY: In order to break up calculations to run on Folding@home and then repiece them together in order to act like a single, very, very, very fast computer, we need special algorithms. We are constantly trying to improve our methods in these directions and this paper represents our latest state of the art in this direction.

ABSTRACT: To meet the challenge of modeling the conformational dynamics of biological macromolecules over long timescales, much recent effort has been devoted to constructing stochastic kinetic models, often in the form of discrete-state Markov models, from short molecular dynamics simulations. To construct useful models that faithfully represent dynamics at the timescales of interest, it is necessary to decompose configuration space into a set of kinetically metastable states. Previous attempts to define these states have relied upon either prior knowledge of the slow degrees of freedom or on the application of conformational clustering techniques which assume that conformationally distinct clusters are also kinetically distinct. Here, we present a first version of an automatic algorithm for the discovery of kinetically metastable states that is generally applicable to solvated macromolecules. Given molecular dynamics trajectories initiated from a well-defined starting distribution, the algorithm discovers long-lived, kinetically metastable states through successive iterations of partitioning and aggregating conformation space into kinetically related regions. We apply this method to three peptides in explicit solvent terminally blocked alanine, the engineered 12-residue beta-hairpin trpzip2, and the 21-residue helical Fs peptide to assess its ability to generate physically meaningful states and faithful kinetic models.

48. Storage@home: Petascale Distributed Storage

Adam L Beberg and Vijay S. Pande. IPDPS (2007)

SUMMARY: Storage@home is a distributed storage infrastructure developed to solve the problem of backing up and sharing petabytes of scientific results using a distributed model of volunteer managed hosts. Data is maintained by a mixture of replication and monitoring, with repairs done as needed.

47. Predicting structure and dynamics of loosely-ordered protein complexes: influenza hemagglutinin fusion peptide.

P. Kasson and V. S. Pande. PSB, (2006)

SUMMARY: We have been applying Folding@home to study the nature of key proteins involved in how flu (the influenza virus) gains access into host cells. This paper refelcts our first work in this direction.

46. A Bayesian Update Method for Adaptive Weighted Sampling.

S. Park and V. S. Pande. Physical Review E (2006)

SUMMARY: We've developed a new way to do a particular type of calculation (for protien thermodynamics) on Folding@home. This paper lays out how this works and gives some demonstrations.

45. Local structure formation in simulations of two small proteins.

Guha Jayachandran, V. Vishal, Angel E. Garcıa and V. S. Pande. Journal of Structural Biology, (2006)

Local PDF

ABSTRACT: Massively parallel all-atom, explicit solvent molecular dynamics simulations were used to explore the formation and existence of local structure in two small alpha-helical proteins, the villin headpiece and the helical fragment B of protein A. We report on the existence of transient helices and combinations of helices in the unfolded ensemble, and on the order of formation of helices, which appears to largely agree with previous experimental results. Transient local structure is observed even in the absence of overall native structure. We also calculate sets of residue-residue pairs that are statistically predictive of the formation of given local structures in our simulations.

Some more visualizations of villin from our earlier work can be found on this page.

44. Kinetic Definition of Protein Folding Transition State Ensembles and Reaction Coordinates.

C. Snow and V. S. Pande. Biophysical Journal, (2006)

ABSTRACT: Using distributed molecular dynamics simulations we located 4 distinct folding transitions for a 39 residue beta-beta-alpha-beta protein fold. We introduce and sequently determine the transmission probability, Ptrans, of 500 conformations along each free energy barrier at room temperature, and determined which conformations were transition state ensemble members (Ptrans ≈ 0.5). We ran similar simulations at 82°C, determined the change in Ptrans with temperature for all 2,000 conformations, and observed Hammond behavior directly using Ptrans correlation. The polymer temperature increase only slightly perturbed the transition probabilities. We propose that diffusion along Ptrans may provide the configurational diffusion rate at the top of the barrier. Specifically, given a transition state conformation x0 with estimated Ptrans = 0.5, we selected a large set of subsequent conformations from independent trajectories, each exactly a small time δt after x0 (250ps). Then we calculated Ptrans for each of the new trial conformations. The P(Ptrans|δt=250ps) distribution reflects diffusion along an ideal kinetic reaction coordinate. This approach provides a novel perspective on the nature of a protein folding transition, and provides a framework for quantitative study of activated relaxation kinetics.

43. Parallelized Over Parts Computation of Absolute Binding Free Energy with Docking and Molecular Dynamics.

Guha Jayachandran, M. R. Shirts, S. Park, and V. S. Pande. Journal of Chemical Physics, (2006)

ABSTRACT: We present a technique for biomolecular free energy calculations that exploits highly parallelized sampling to significantly reduce the time to results. The technique combines free energies for multiple, nonoverlapping configurational macrostates and is naturally suited to distributed computing. We describe a methodology that uses this technique with docking, molecular dynamics, and free energy perturbation to compute absolute free energies of binding quickly compared to previous methods. The method does not require a priori knowledge of the binding pose as long as the docking technique used can generate reasonable binding modes. We demonstrate the method on the protein FKBP12 and eight of its inhibitors.

42. Folding Simulations of the Villin Headpiece in All-Atom Detail.

Guha Jayachandran, V. Vishal, and V. S. Pande. Journal of Chemical Physics (2006)

SUMMARY: We have developed a new method which greatly extends Folding@home's ability to simulate long timescales. This new method (MSM) will be applied to essentially all new Folding@home projects. This paper demonstrates MSM's applied to a challenging target -- the villin headpiece.

ABSTRACT: We report on the use of large-scale distributed computing simulation and novel analysis techniques for examining the dynamics of a small protein. Matters addressed include folding rate, very long timescale kinetics, ensemble properties, and interaction with water. The target system for the study, the villin headpiece, has been of great interest to experimentalists and theorists both. Sampling totaled nearly 500 of the most extensive published to date for a system of villin's size in explicit solvent with all atom detail and was in the form of tens of thousands of independent molecular dynamics trajectories, each several tens of nanoseconds in length. We report on kinetics sensitivity analyses that, using a set of short simulations, probed the role of water in villin's folding and sensitivity to the simulation's electrostatics treatment. By constructing Markovian state models from the collected data, we were able to propagate dynamics to times far beyond those directly simulated and to rapidly compute mean first passage times, long time kinetics (tens of microseconds), and evolution of ensemble property distributions over long times, otherwise currently impossible. We also tested our MSM by using it to predict the structure of villin de novo.

41. Ensemble molecular dynamics yields submillisecond kinetics and intermediates of membrane fusion

P. Kasson, N. Kelley, N. Singhal, M. Vrjlic, A. Brunger, and V. S. Pande. Proceedings of the National Academy of Sciences, USA

SUMMARY: These first results describe work we've been doing to study membrane fusion, the process by which two lipid membranes become one. This process is critical to proper functioning of the cell and also phenomena such as neurotransmission and infection by many viruses. We are seeking to understand how membrane fusion works so that we can eventually manipulate it. We hope such an understanding will lead to the development of new and more effective drugs to combat viral infection and treat neurologic diseases.

ABSTRACT: Lipid membrane fusion is critical to cellular transport and signaling processes such as constitutive secretion, neurotransmitter release, and infection by enveloped viruses. Here, we introduce a powerful computational methodology for simulating membrane fusion from a starting configuration designed to approximate activated prefusion assemblies from neuronal and viral fusion, producing results on a time scale and degree of mechanistic detail not previously possible to our knowledge. We use an approach to the long time scale simulation of fusion by constructing a Markovian state model with large-scale distributed computing, yielding an understanding of fusion mechanisms on time scales previously impossible to simulate to our knowledge. Our simulation data suggest a branched pathway for fusion, in which a common stalk-like intermediate can either rapidly form a fusion pore or remain in a metastable hemifused state that slowly forms fully fused vesicles. This branched reaction pathway provides a mechanistic explanation both for the biphasic fusion kinetics and the stable hemifused intermediates previously observed experimentally. Our distributed computing and Markovian state model approaches provide sufficient sampling to detect rare transitions, a systematic process for analyzing reaction pathways, and the ability to develop quantitative approximations of reaction kinetics for fusion.

40. Electric Fields at the Active Site of an Enzyme: Direct Comparison of Experiment with Theory.

Ian T. Suydam, Christopher D. Snow, Vijay S. Pande, Steven G. Boxer. Science (2006)

Full Text

SUMMARY: The ability to quantitatively predict electric fields in proteins has remained a great challenge. In this paper, we combine new experimental methods with new theoretical methods made possible by Folding@home distributed computing to greatly push the boundary of what one could previously predict. In particular, we see that a single structure is insufficient to make accurate predictions, suggesting that the ensemble approaches inherent to Folding@home may be important in predicting electrostatics in proteins.

ABSTRACT: The electric fields produced in folded proteins influence nearly every aspect of protein function. We present a vibrational spectroscopy technique that measures changes in electric field at a specific site of a protein as shifts in frequency (Stark shifts) of a calibrated nitrile vibration. A nitrile-containing inhibitor is used to deliver a unique probe vibration to the active site of human aldose reductase, and the response of the nitrile stretch frequency is measured for a series of mutations in the enzyme active site. These shifts yield quantitative information on electric fields that can be directly compared with electrostatics calculations. We show that extensive molecular dynamics simulations and ensemble averaging are required to reproduce the observed changes in field.

39. A novel approach for computational alanine scanning: application to the p53 oligomerization domain.

L.T. Chong, W. C. Swope, J. W. Pitera, and V. S. Pande. Journal of Molecular Biology (2006)

SUMMARY: Roughly half of all known cancers involve a mutation in a single protein: p53. P53 serves to protect us from getting cancer; when p53 fails, one often gets cancer. We have developed a new method for predicting how mutations in p53, a protein central to cancer, would impact p53. This new method is naturally suited for distributed computing and can predict several mutations found to date.

ABSTRACT: We have developed a novel computational alanine scanning approach that involves analysis of ensemble unfolding kinetics at high temperature to identify residues that are critical for the stability of a given protein. This approach has been applied to dimerization of the oligomerization domain (residues 326-355) of tumor suppressor p53. As validated by experimental results, our approach has reasonable success in identifying deleterious mutations, including mutations that have been linked to cancer. We discuss a method for determining the effect of mutations on the location of the dimerization transition state.

38. Validation of Markov state models using Shannon's entropy.

S. Park and V. S. Pande. Journal of Chemical Physics (2006)

SUMMARY: Markov State Models (MSM's) have become a major part of how Folding@home calculations are performed. In particular, the MSM technique is at the heart of how one can divide complex calculations like protein folding or lipid vesicle dynamics on 10,000 to 100,000 CPU's -- i.e. how distributed computing can tackle complex problems. This paper presents a new way to test the validity of MSM's generated to make sure that the models are suitable and self-consistent.

ABSTRACT: Markov state models are kinetic models built from the dynamics of molecular simulation trajectories by grouping similar configurations into states and examining the transition probabilities between states. Here we present a procedure for validating the underlying Markov assumption in Markov state models based on information theory using Shannon's entropy. This entropy method is applied to a simple system and is compared with the previous eigenvalue method. The entropy method also provides a way to identify states that are least Markovian, which can then be divided into finer states to improve the model.

37. On the role of chemical detail in simulating protein folding kinetics.

Young Min Rhee and Vijay S. Pande. Chemical Physics (2006)

SUMMARY: How important are local chemical features of proteins during the folding process? We assess protein folding models with varying degrees of chemical detail to gain an understanding of how they perform relative to some of today's most sophisticated models.

ABSTRACT: Is an all-atom representation for protein and solvent necessary for simulating protein folding kinetics or can simpler models reproduce the results of more complex models? This question is relevant not just for simulation methodology, but also for the general understanding of the chemical details relevant for protein dynamics. With recent advances in computational methodology, it is now possible to simulate the folding kinetics of small proteins in all-atom detail. Therefore, with both detailed and simplified models of folding in hand, the outstanding questions are what the differences in these models are for the description of protein folding dynamics, and how we can quantitatively compare the folding mechanisms found in the models. To address the outstanding problem of how to determine the differences between folding mechanism in a sensitive and quantitative manner, we suggest a new method to quantify the non-linear correlation in folding commitment probability (Pfold) values. We use this method to probe the differences between a wide range of models for folding simulations, ranging from coarse grained Go models to all-atom models with implicit or explicit solvation. While the differences between less-detailed models (Go and implicit solvation models) and explicit solvation models are large, the differences within various explicit solvation models appear to be small, suggesting that the discrete nature of water may play a role in folding kinetics.

36. Nanotube confinement denatures protein helices.

Eric J. Sorin and Vijay S. Pande. JACS (2006)

ABSTRACT: In striking contrast to simple polymer physics theory, which does not account for solvent effects, we find that physical confinement of solvated biopolymers decreases solvent entropy, which in turn leads to a reduction in the organized structural content of the polymer. Since our theory is based on a fundamental property of water-protein statistical mechanics, we expect it to have broad implications in many biological and material science contexts.

35. The solvation interface is a determining factor in peptide conformational preferences.

Eric J. Sorin, Young Min Rhee, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2006)

SUMMARY: How complicated is a helix, and how is the complexity of helical structure affected by the solvent? Here we show, through a novel "computational hydrophobic titration" experiment, that many features of helices can be rationalized and/or explained by considering the interactions along the peptide-solvent interface.

TECHNICAL ABSTRACT: The 21-residue polyalanine-based Fs peptide was studied using thousands of long, explicit solvent, atomistic molecular dynamics simulations which reached equilibrium at the ensemble level. Peptide conformational preference as a function of hydrophobicity was examined using a spectrum of explicit solvent models, and the peptide length dependence of the hydrophilic and hydrophobic components of solvent-accessible surface area for several ideal conformational types was also considered. Our results demonstrate how the character of the solvation interface induces several conformational preferences, including a decrease in mean helical content with increased hydrophilicity, which occurs predominantly through reduced nucleation tendency and, to a lesser extent, destabilization of helical propagation. Interestingly, an opposing effect occurs through increased propensity for 310-helix conformations, as well as increased polyproline structure. Our observations provide a framework for understanding previous reports of conformational preferences in polyalanine-based peptides including (i) terminal 310-helix prominence, (ii) low p-helix propensity, (iii) increased polyproline conformations in short and unfolded peptides, and (iv) membrane helix stability in the presence and absence of water. These observations lend physical insight into the role of water in peptide conformational equilibria at the atomic level, and expand our view of the complexity of even the most "simple" of biopolymers. Whereas previous studies have focused predominantly on hydrophobic effects with respect to tertiary structure, this report highlights the need for consideration of such effects on the secondary structural level.

34. Can conformational change be described by only a few normal modes?

Paula Petrone and Vijay S. Pande. Biophysical Journal (2005)

SUMMARY: In allosteric regulation, protein activity is altered when ligand binding (or unbinding) causes changes in the protein conformation. Little is known about which aspects of the protein architecture are responsible for allosteric regulation, however most of these changes involve collective displacements of atoms (domain and hinge-bending motions) which are likely to occur in the microsecond timescale. Normal mode analysis (NMA) decouples the complex motions and fluctuations of proteins into a linear combination of orthogonal basis vectors, each representing an independent concerted harmonic motion with a characteristic frequency. In principle, it would be a natural basis in which to represent conformational change that involves collective motions of atoms. This paper addresses the limitations of NMA, namely how many normal modes are necessary to achieve a certain degree of accuracy in the representation.

TECHNICAL ABSTRACT: We suggest a simple method to assess how many normal modes are needed to map a conformational change. By projecting the conformational change onto a subspace of the normal mode vectors and, using RMSD as a test of accuracy, we find that the first 20 modes only contribute 50% or less of the total conformational change in four test cases (myosin, calmodulin, NtrC, and hemoglobin). In some allosteric systems, like the molecular switch NtrC, the conformational change is localized to a limited number of residues. We find that many more modes are necessary to accurately map this collective displacement. In addition, the normal mode spectra can provide useful information about the details of the conformational change, especially when comparing structures with different bound ligands, in this case, calmodulin. Indeed, this approach presents normal mode analysis as a useful basis in which to capture the mechanism of conformational change, and shows that the number of normal modes needed to capture the essential collective motions of atoms should be chosen according to the required accuracy.

33. How large is alpha-helix in solution? Studies of the radii of gyration of helical peptides by SAXS and MD.

Bojan Zagrovic, Guha Jayachandran, Ian S. Millett, Sebastian Doniach and Vijay S. Pande. Journal of Molecular Biology (2005)

SUMMARY: Direct comparisons are made between Folding@home simulations and experimental measurements (SAXS) to determine molecular size of helical peptides of varying length, revealing the compact nature of such helical peptides.

TECHNICAL ABSTRACT: Using synchrotron radiation and the small-angle X-ray scattering technique we have measured the radii of gyration of a series of alaninebased a-helix-forming peptides of the composition Ace-(AAKAA)n-GYNH2, nZ2-7, in aqueous solvent at 10C. In contrast to other techniques typically used to study a-helices in isolation (such as nuclear magnetic resonance and circular dichroism), small-angle X-ray scattering reports on the global structure of a molecule and, as such, provides complementary information to these other, more sequence-local measuring techniques. The radii of gyration that we measure are, except for the 12-mer, lower than the radii of gyration of ideal a-helices or helices with frayed ends of the equivalent sequence-length. For example, the measured radius of gyration of the 37-mer is 14.2 A , which is to be compared with the radius of gyration of an ideal 37-mer a-helix of 17.6 A . Attempts are made to analyze the origin of this discrepancy in terms of the analytical Zimm-Bragg-Nagai (ZBN) theory, as well as distributed computing explicit solvent molecular dynamics simulations using two variants of the AMBER force-field. The ZBN theory, which treats helices as cylinders connected by random walk segments, predicts markedly larger radii of gyration than those measured. This is true even when the persistence length of the random walk parts is taken to be extremely short (about one residue). Similarly, the molecular dynamics simulations, at the level of sampling available to us, give inaccurate values of the radii of gyration of the molecules (by overestimating them by around 25% for longer peptides) and/or their helical content. We conclude that even at the short sequences examined here (%37 amino acid residues), these a-helical peptides behave as fluctuating semi-broken rods rather than straight cylinders with frayed ends.

32. Error Analysis in Markovian State Models for protein folding.

Nina Singhal and Vijay S. Pande. Journal of Chemical Physics (2005)

SUMMARY: We validate the new Markovian State Model (MSM) for describing protein dynamics, and show how to efficiently calculate how accurate these models are. We also describe how to start new FAH simulations to best improve the accuracy of the model.

TECHNICAL ABSTRACT: In previous work, we described a Markovian state model(MSM) for analyzing molecular-dynamics trajectories, which involved grouping conformations into states and estimating the transition probabilities between states. In this paper, we analyze the errors in this model caused by finite sampling. We give different methods with various approximations to determine the precision of the reported mean first passage times. These approximations are validated on an 87 state toy Markovian system. In addition, we propose an efficient and practical sampling algorithm that uses these error calculations to build a MSM that has the same precision in mean first passage time values but requires an order of magnitude fewer samples. We also show how these methods can be scaled to large systems using sparse matrix methods.

31. Direct calculation of the binding free energies of FKBP ligands using the Fujitsu BioServer massively parallel computer.

Hideaki Fujutani, Yoshiaki Tanida, Masakatsu Ito, Guha Jayachandran, Christopher D. Snow, Michael R. Shirts, Eric J. Sorin, and Vijay S. Pande Journal of Chemical Physics (2005)

SUMMARY: Drug design calculations are generally very difficult. Here we show that calculations made previously on the Folding@home network are possible on a much smaller supercomputer system without loss of numerical precision.

TECHNICAL ABSTRACT: Direct calculations of the absolute binding free energies for eight FKBP ligands were performed using the Fujitsu BioServer massively parallel computer. Using latest version of the general AMBER force field (GAFF) for ligand model parameters and the Bennett acceptance ratio for computing free energy differences, we obtained an excellent linear fit between the calculated and experimental binding free energies. The RMS error from a linear fit is 0.4 kcal/mol for eight ligand complexes. In comparison with a previous study of the binding energies of these same eight ligand complexes, these results suggest that the use of improved model parameters can lead to more predictive binding estimates, and that these estimates can be obtained with significantly less computer time than previously thought. These findings make such direct methods more attractive for use in rational drug design.

30. A New Set of Molecular Mechanics Parameters for Hydroxyproline and Its Use in Molecular Dynamics Simulations of Collagen-Like Peptides.

Sanghyun Park, Randall J. Radmer, Teri E. Klein, and Vijay S. Pande. Journal of Computational Chemistry (2005)

SUMMARY: Simulation of the collagen triple helix has been given less attention that more common protein "folds." Here we present newly derived parameters for such simulations to gain better agreement with experimental data, and thereby offering insight into the stability of the triple helix structure.

TECHNICAL ABSTRACT: Recently, the importance of proline ring pucker conformations in collagen has been suggested in the context of hydroxylation of prolines. The previous molecular mechanics parameters for hydroxyproline, however, do not reproduce the correct pucker preference. We have developed a new set of parameters that reproduces the correct pucker preference. Our molecular dynamics simulations of proline and hydroxyproline monomers as well as collagen-like peptides, using the new parameters, support the theory that the role of hydroxylation in collagen is to stabilize the triple helix by adjusting to the right pucker conformation (and thus the right f angle) in the Y position.

29. Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration.

Michael R. Shirts & Vijay S. Pande. Journal of Chemical Physics (2005)

SUMMARY: We test new methods for free energy calculations -- relevant for our computational drug design methodology. We find that the BAR method we previously investigated is significantly better than methods commonly employed. We have already gotten a lot of positive feedback about this work from others in the field, as they have been starting to use the results of this work to improve their calculations as well.

TECHNICAL ABSTRACT: Recent work has demonstrated the Bennett acceptance ratio method is the best asymptotically unbiased method for determining the equilibrium free energy between two end states given work distributions collected from either equilibrium and non-equilibrium data. However, it is still not clear what the practical advantage of this acceptance ratio method is over other common methods in atomistic simulations. In this study, we first review theoretical estimates of the bias and variance of exponential averaging (EXP), thermodynamic integration (TI), and the Bennett acceptance ratios (BAR). In the process, we present a new simple scheme for computing the variance and bias of many estimators, and demonstrate the connections between BAR and the weighted histogram analysis method. Next, a series of analytically solvable toy problems is examined to shed more light on the relative performance in terms of the bias and efficiency of these three methods. Interestingly, it is impossible to conclusively identify a best method for calculating the free energy, as each of the three methods performs more efficiently than the others in at least one situation examined in these toy problems. Finally, sample problems of the insertion/deletion of both a Lennard-Jones particle and a much larger molecule in TIP3P water are examined by these three methods. In all tests of atomistic systems, free energies obtained with BAR have significantly lower bias and smaller variance than when using EXP or TI, especially when the overlap in phase space between end states is small. For example, BAR can extract as much information from multiple fast, far-from-equilibrium simulations as from fewer simulations near equilibrium, which EXP cannot. Although TI and sometimes even EXP can be somewhat more efficient in idealized toy problems, in the realistic atomistic situations tested in this paper, BAR is significantly more efficient than all other methods.

28. Solvation free energies of amino acid side chain analogs for common molecular mechanics water models.

Michael R. Shirts & Vijay S. Pande. Journal of Chemical Physics (2005)

SUMMARY: This paper is a test of our methods for free energy calculation -- critical to our computational drug design methodology. We achieve a higher level of accuracy and precision than before. Moreover, our recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.

TECHNICAL ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). In order to examine the accuracy of a range of common water models used for protein simulation for their solute/solvent properties, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from the OPLS-AA parameter set with the TIP3P, TIP4P, SPC, SPC/E, TIP3P-MOD, and TIP4P-Ew water models. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.06 kcal/mol, equivalent to that obtained in experimental hydration free energy measurements of the same molecules. We find that TIP3P-MOD, a model designed to give improved free energy of hydration for methane, gives uniformly the closest match to experiment; we also find that the ability to accurately model pure water properties does not necessarily predict ability to predict solute/solvent behavior. We also evaluate the free energies of a number of novel modifications of TIP3P designed as a proof of concept that it is possible to obtain much better solute/solvent free energetic behavior without substantially negatively affecting pure water properties. We decrease the average error to zero while reducing the rms error below that of any of the published water models, with measured liquid water properties remaining almost constant with respect to our perturbations. This demonstrates there is still both room for improvement within current fixed-charge biomolecular force fields and significant parameter flexibility to make these improvements. Recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.

27. Foldamer dynamics expressed via Markov state models. I. Explicit solvent molecular-dynamics simulations in acetonitrile, chloroform, methanol, and water.

Sidney Elmer, Sanghyun Park, & Vijay S. Pande. Journal of Chemical Physics (2005)

SUMMARY: Here, we lay out some of the first applications of a new method for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.

TECHNICAL ABSTRACT: In this article, we analyze the folding dynamics of an all-atom model of a polyphenylacetylene (pPA) 12-mer in explicit solvent for four common organic and aqueous solvents: acetonitrile,chloroform, methanol, and water. The solvent quality has a dramatic effect on the time scales in which pPA 12-mers fold. Acetonitrile was found to manifest ideal folding conditions as suggested by optimal folding times on the order of ~100-200 ns, depending on temperature. In contrast, chloroform and water were observed to hinder the folding of the pPA 12-mer due to extreme solvation conditions relative to acetonitrile; chloroform denatures the oligomer, whereas water promotes aggregation and traps. The pPA 12-mer in a pure methanol solution folded in ~400 ns at 300 K, compared relative to the experimental 12-mer folding time of ~160 ns measured in a 1:1 v/v THF/methanol solution. Requisite in drawing the aforementioned conclusions, analysis techniques based on Markov state models are applied to multiple short independent trajectories to extrapolate the long-time scale dynamics of the 12-mer in each respective solvent. We review the theory of Markov chains and derive a method to impose detailed balance on a transition probability matrix computed from simulation data.

26. Foldamer dynamics expressed via Markov state models. II. State space decomposition.

Sidney Elmer, Sanghyun Park, & Vijay S. Pande. Journal of Chemical Physics (2005)

SUMMARY: Here, we lay out some new methodology for simulation for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.

TECHNICAL ABSTRACT: The structural landscape of poly-phenylacetylene (pPA), otherwise known as m-phenylene ethynylene oligomers, has been shown to consist of a very diverse set of conformations, including helices, turns, and knots. Defining a state space decomposition to classify these conformations into easily identifiable states is an important step in understanding the dynamics in relation to Markov state models. We define the state decomposition of pPA oligomers in terms of the sequence of discretized dihedral angles between adjacent phenyl rings along the oligomer backbone. Furthermore, we derive in mathematical detail an approach to further reduce the number of states by grouping symmetrically equivalent states into a single parent state. A more challenging problem requires a formal definition for knotted states in the structural landscape. Assuming that the oligomer chain can only cross the ideal helix path once, we propose a technique to define a knotted state derived from a helical state determined by the position along the helical nucleus where the chain crosses the ideal helix path. Several examples of helical states and knotted states from the pPA 12-mer illustrate the principles outlined in this article.

25. Unusual compactness of a polyproline type II structure.

Bojan Zagrovic, Jan Lipfert, Eric J. Sorin, Ian S. Millett, Wilfred F. van Gunsteren, Sebastian Doniach & Vijay S. Pande. Proceedings of the National Academy of Sciences (2005)

SUMMARY: This study probes the structural character of a small peptide using experiment and simulation. It highlights the differences between global and local structural information, suggesting a new model for PPII conformational character, which is thought to be dominant in the unfolded state of proteins.

TECHNICAL ABSTRACT: Polyproline type II (PPII) helix has emerged recently as the dominant paradigm for describing the conformation of unfolded polypeptides. However, most experimental observables used to characterize unfolded proteins typically provide only short-range, sequence-local structural information that is both time- and ensemble- averaged, giving limited detail about the long-range structure of the chain. Here, we report a study of a long-range property: the radius of gyration of an alanine-based peptide, Ace-(diaminobutyric acid)2-(Ala)7-(ornithine)2-NH2. This molecule has previously been studied as a model for the unfolded state of proteins under folding conditions and is believed to adopt a PPII fold based on short-range techniques such as NMR and CD. By using synchrotron radiation and small-angle x-ray scattering, we have determined the radius of gyration of this peptide to be 7.4(+/-0.5), which is significantly less than the value expected from an ideal PPII helix in solution (13.1). To further study this contradiction, we have used molecular dynamics simulations using six variants of the AMBER force field and the GROMOS 53A6 force field. However, in all cases, the simulated ensembles underestimate the PPII content while overestimating the experimental radius of gyration. The conformational model that we propose, based on our small angle x-ray scattering results and what is known about this molecule from before, is that of a very flexible, fluctuating structure that on the level of individual residues explores a wide basin around the ideal PPII geometry but is never, or only rarely, in the ideal extended PPII helical conformation.

24. How well can simulation predict protein folding kinetics and thermodynamics?

Christopher D. Snow, Eric J. Sorin, Young Min Rhee, and Vijay S. Pande. Annual Review of Biophysics & Biomolecular Structure (2005)

SUMMARY: Rather than reporting new data from the Folding@home project, this review article offers an in-depth look at the current state-of-the-art in simulation-based prediction. This includes work by our group and others in the field, including many computational models and methods of extracting information that can be directly compared to experiment.

TECHNICAL ABSTRACT: Simulation of protein folding has come a long way in five years. Notably, new quantitative comparisons with experiments for small, rapidly folding proteins have become possible. As the only way to validate simulation methodology, this achievement marks a significant advance. Here, we detail these recent achievements and ask whether simulations have indeed rendered quantitative predictions in several areas, including protein folding kinetics, thermodynamics, and physics-based methods for structure prediction. We conclude by looking to the future of such comparisons between simulations and experiments.

23. Empirical Force-Field Assessment: The Interplay Between Backbone Torsions and Noncovalent Term Scaling.

Eric J. Sorin and Vijay S. Pande. Journal of Computational Chemistry (2005)

SUMMARY: How do the results of peptide simulations change with slight variations to the models employed? Here we answer this question with respect to very local changes in the energetics of the polymer, demonstrating the sensitivity of simulated bulk (i.e. ensemble averaged) structural equilibrium on the parameters of the model.

TECHNICAL ABSTRACT: The kinetic and thermodynamic aspects of the helix-coil transition in polyalanine-based peptides have been studied at the ensemble level using a distributed computing network. This study builds on a previous report, which critically assessed the performance of several contemporary force fields in reproducing experimental measurements and elucidated the complex nature of helix-coil systems. Here we consider the effects of modifying backbone torsions and the scaling of noncovalent interactions. Although these elements determine the potential of mean force between atoms separated by three covalent bonds (and thus largely determine the local conformational distributions observed in simulation), we demonstrate that the interplay between these factors is both complex and force field dependent. We quantitatively assess the heliophilicity of several helix-stabilizing potentials as well as the changes in heliophilicity resulting from such modifications, which can "make or break" the accuracy of a given force field, and our findings suggests that future force field development may need to better consider effect that vary with peptide length. This report also serves as an example of the utility of distributed computing in analyzing and improving upon contemporary force fields at the level of absolute ensemble equilibrium, the next step in force field development.

22. Exploring the Helix-Coil Transition via All-atom Equilibrium Ensemble Simulations.

Eric J. Sorin and Vijay S. Pande. Biophysical Journal (2005)

SUMMARY: How good are our models for folding? This question is important to address in order to understand the usefulness of our work, as well as the work of everyone in the atomistic simulation field in general. Here, we've done extremely extensive tests of models used in folding to show their strengths and weaknesses. Based on their weaknesses, we have proposed a new model which appears to have a much stronger agreement with experiment.

TECHNICAL ABSTRACT: The ensemble folding of two 21-residue a-helical peptides has been studied using all-atom simulations under several variants of the AMBER potential in explicit solvent using a global distributed computing network. Our extensive sampling, orders of magnitude greater than the experimental folding time, results in complete convergence to ensemble equilibrium. This allows for a quantitative assessment of these potentials, including a new variant of the AMBER-99 force field, denoted AMBER-99f, which shows improved agreement with experimental kinetic and thermodynamic measurements. From bulk analysis of the simulated AMBER-99f equilibrium, we find that the folding landscape is pseudo-two-state, with complexity arising from the broad, shallow character of the 'native' and 'unfolded' regions of the phase space. Each of these macrostates allows for configurational diffusion among a diverse ensemble of conformational microstates with greatly varying helical content and molecular size. Indeed, the observed structural dynamics are better represented as a conformational diffusion than as a simple exponential process, and equilibrium transition rates spanning several orders of magnitude are reported. After multiple nucleation steps, on average, helix formation proceeds via a kinetic "alignment" phase in which two or more short, low-entropy helical segments form a more ideal, single-helix structure.

21. Does Water Play a Structural Role in the Folding of Small Nucleic Acids?

Eric J. Sorin, Young Min Rhee, and Vijay S. Pande. Biophysical Journal (2005)

SUMMARY: While previous studies on the folding of nucleic acid hairpins have employed simplified models of either the nucleic acid or the solvent, this paper reports the first such study using an explicit treatment of the surrounding water and counterions. We show that accounting for water molecules in this manner is necessary to most accurately characterize the energetics of hairpin folding, whereas monovalent ions appear to play only a background role.

TECHNICAL ABSTRACT: Nucleic acid structure and dynamics are known to be closely coupled to local environmental conditions and, in particular, to the ionic character of the solvent. Here we consider what role the discrete properties of water and ions play in the collapse and folding of small nucleic acids. We study the folding of an experimentally well-characterized RNA hairpin-loop motif (sequence 5'-GGGC[GCAA]GCCU-3') via ensemble molecular dynamics simulation and, with nearly 500 of aggregate simulation time using an explicit representation of the ionic solvent, report successful ensemble folding simulations, with a predicted folding time of 8.8(2.0)s, in agreement with experimental measurements of ~10s. Comparing our results to previous folding simulations using the GB/SA continuum solvent model shows that accounting for water-mediated interactions is necessary to accurately characterize the free energy surface and stochastic nature of folding. The formation of secondary structure appears to be more rapid than the fastest ionic degrees of freedom, and counterions do not participate discretely in observed folding events. We find that hydrophobic collapse follows a predominantly expulsive mechanism in which a diffusion-search of early structural compaction is followed by final formation of native structure that occurs in tandem with solvent evacuation.

20. Dimerization of the p53 Oligomerization Domain: Identification of a Folding Nucleus by Molecular Dynamics Simulations.

Lillian T. Chong, Christopher D. Snow, Young Min Rhee, and Vijay S. Pande. Journal of Molecular Biology (2005)

SUMMARY: Roughly half of all known cancers result from mutations in p53. Our first work in the cancer area examines the tetramerization domain of p53. We predict how p53 folds and in doing so, we can predict which amino acid mutations would be relevant. When compared with experiments, our predictions have appeared to agree with experiment and give a new interpretation to existing data.

TECHNICAL ABSTRACT: Dimerization of the p53 oligomerization domain involves coupled folding and binding of monomers. To examine the dimerization, we have performed molecular dynamics (MD) simulations of dimer folding from the rate-limiting transition state ensemble (TSE). Among 799 putative transition state structures that were selected from a large ensemble of high-temperature unfolding trajectories, 129 were identified as members of the TSE via calculation of a 50% transmission coefficient from at least 20 room-temperature simulations. This study is the first to examine the refolding of a protein dimer using MD simulations in explicit water, revealing a folding nucleus for dimerization. Our atomistic simulations are consistent with experiment and offer insight that was previously unobtainable.

19. Using path sampling to build better Markovian state models: Predicting the folding rate and mechanism of a tryptophan zipper beta hairpin.

Nina Singhal, Christopher D. Snow, and Vijay S. Pande. Journal of Chemical Physics (2004)

SUMMARY: How can Folding@home use thousands to millions of CPUs to efficiently simulate long timescale biomolecular dynamics? This paper outlines the "Markovian State Model" method which is the foundation of how most new Folding@home calculations are performed. The MSM method allows for a very efficient use of uncoupled simulations, as one would easily get from distributed computing.

TECHNICAL ABSTRACT: We propose an efficient method for the prediction of protein folding rate constants and mechanisms. We use molecular dynamics simulation data to build Markovian state models (MSMs), discrete representations of the pathways sampled. Using these MSMs, we can quickly calculate the folding probability (Pfold) and mean first passage time of all the sampled points. In addition, we provide techniques for evaluating these values under perturbed conditions without expensive recomputations. To demonstrate this method on a challenging system, we apply these techniques to a two-dimensional model energy landscape and the folding of a tryptophan zipper beta hairpin.

18. Simulations of the role of water in the protein-folding mechanism.

Young Min Rhee, Eric J. Sorin, Guha Jayachandran, Erik Lindahl, & Vijay S Pande. Proceedings of the National Academy of Sciences (2004)

ABSTRACT: There are many unresolved questions regarding the role of water in protein folding. Does water merely induce hydrophobic forces, or does the discrete nature of water play a structural role in folding? Are the nonadditive aspects of water important in determining the folding mechanism? To help to address these questions, we have performed simulations of the folding of a model protein (BBA5) in explicit solvent. Starting 10,000 independent trajectories from a fully unfolded conformation, we have observed numerous folding events, making this work a comprehensive study of the kinetics of protein folding starting from the unfolded state and reaching the folded state and with an explicit solvation model and experimentally validated rates. Indeed, both the raw TIP3P folding rate (4.5 +/- 2.5s) and the diffusion-constant corrected rate (7.5 +/- 4.2s) are in strong agreement with the experimentally observed rate of 7.5 +/- 3.5s. To address the role of water in folding, the mechanism is compared with that predicted from implicit solvation simulations. An examination of solvent density near hydrophobic groups during folding suggests that in the case of BBA5, there are water-induced effects not captured by implicit solvation models, including signs of a concurrent mechanism of core collapse and desolvation.

17. Trp zipper folding kinetics by molecular dynamics and temperature-jump spectroscopy.

Christopher D. Snow, Linlin Qiu, Deguo Du, Feng Gai, Stephen J. Hagen, & Vijay S Pande. Proceedings of the National Academy of Sciences (2004)

ABSTRACT: We studied the microsecond folding dynamics of three hairpins (Trp zippers 1-3, TZ1-TZ3) by using temperature-jump fluorescence and atomistic molecular dynamics in implicit solvent. In addition, we studied TZ2 by using time-resolved IR spectroscopy. By using distributed computing, we obtained an aggregate simulation time of 22 ms. The simulations included 150, 212, and 48 folding events at room temperature for TZ1, TZ2, and TZ3, respectively. The all-atom optimized potentials for liquid simulations (OPLSaa) potential set predicted TZ1 and TZ2 properties well; the estimated folding rates agreed with the experimentally determined folding rates and native conformations were the global potential-energy minimum. The simulations also predicted reasonable unfolding activation enthalpies. This work, directly comparing large simulated folding ensembles with multiple spectroscopic probes, revealed both the surprising predictive ability of current models as well as their shortcomings. Specifically, for TZ1-TZ3, OPLS for united atom models had a nonnative free-energy minimum, and the folding rate for OPLSaa TZ3 was sensitive to the initial conformation. Finally, we characterized the transition state; all TZs fold by means of similar, native-like transition-state conformations.

16. Does Native State Topology Determine the RNA Folding Mechanism?

Eric J. Sorin, Bradley J. Nakatani, Young Min Rhee, Guha Jayachandran, V Vishal, & Vijay S Pande. Journal of Molecular Biology (2004)

ABSTRACT: Recent studies in protein folding suggest that native state topology plays a dominant role in determining the folding mechanism, yet an analogous statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics that make RNA folding more complex than protein folding. Applying a distributed computing architecture to sample nearly 5000 complete tRNA folding events using a minimalist, atomistic model, we have characterized the role of native topology in tRNA folding dynamics: the simulated bulk folding behavior predicts well the experimentally observed folding mechanism. In contrast, single-molecule folding events display multiple discrete folding transitions and compose a largely diverse, heterogeneous dynamic ensemble. This both supports an emerging view of heterogeneous folding dynamics at the microscopic level and highlights the need for single-molecule experiments and both single-molecule and bulk simulations in interpreting bulk experimental measurements.

15. Structural correspondence between the alpha-helix and the random-flight chain resolves how unfolded proteins can have native-like properties.

Bojan Zagrovic & Vijay S Pande. Nature Structural Biology (2003)

ABSTRACT: Recently, we have proposed that, on average, the structure of the unfolded state of small, mostly alpha-helical proteins may be similar to the native structure (the 'mean-structure' hypothesis). After examining thousands of simulations of both the folded and the unfolded states of five polypeptides in atomistic detail at room temperature, we report here a result that seems at odds with the mean-structure hypothesis. Specifically, the average inter-residue distances in the collapsed unfolded structures agree well with the statistics of the ideal random-flight chain with link length of 3.8 (the length of one amino acid). A possible resolution of this apparent contradiction is offered by the observation that the inter-residue distances in a typical alpha-helix over short stretches are close to the average distances in an ideal random-flight chain.

14. Equilibrium Free Energies from Nonequilibrium Measurements Using Maximum-Likelihood Methods.

Michael R. Shirts, Eric Bair, Giles Hooker, and Vijay S Pande. Physical Review Letters (2003)

ABSTRACT: We present a maximum likelihood argument for the Bennett acceptance ratio method, and derive a simple formula for the variance of free energy estimates generated using this method. This derivation of the acceptance ratio method, using a form of logistic regression, a common statistical technique, allows us to shed additional light on the underlying physical and statistical properties of the method. For example, we demonstrate that the acceptance ratio method yields the lowest variance for any estimator of the free energy which is unbiased in the limit of large numbers of measurements.

13. Extremely precise free energy calculations of amino acid side chain analogs: Comparison of common molecular mechanics force fields for proteins.

Michael R. Shirts, Jed W. Pitera, William C. Swope, and Vijay S. Pande. Journal of Chemical Physics (2003)

ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). We use large-scale distributed computing to access sufficient computational resources to extensively sample molecular systems and thus reduce statistical uncertainty of measured free energies. In order to examine the accuracy of a range of common models used for protein simulation, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from recent versions of the OPLS-AA, CHARMM, and AMBER parameter sets in TIP3P water using thermodynamic integration. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.05 kcal/mol, which are in general an order of magnitude smaller than those found in other studies. Notably, this level of precision is comparable to that obtained in experimental hydration free energy measurements of the same molecules. Root mean square differences from experiment over the set of molecules examined using AMBER-, CHARMM-, and OPLS-AA-derived parameters were 1.35 kcal/mol, 1.31 kcal/mol, and 0.85 kcal/mol, respectively. Under the simulation conditions used, these force fields tend to uniformly underestimate solubility of all the side chain analogs. The relative free energies of hydration between amino acid side chain analogs were closer to experiment but still exhibited significant deviations. Although extensive computational resources may be needed for large numbers of molecules, sufficient computational resources to calculate precise free energy calculations for small molecules are accessible to most researchers.

12. Solvent Viscosity Dependence of the Folding Rate of a Small Protein: Distributed Computing Study.

Bojan Zagrovic and Vijay S. Pande. Journal of Computational Chemistry (2003)

ABSTRACT: By using distributed computing techniques and a supercluster of more than 20,000 processors we simulated folding of a 20-residue Trp Cage miniprotein in atomistic detail with implicit GB/SA solvent at a variety of solvent viscosities (g). This allowed us to analyze the dependence of folding rates on viscosity. In particular, we focused on the low-viscosity regime (values below the viscosity of water). In accordance with Kramers' theory, we observe approximately linear dependence of the folding rate on 1/g for values from 1-10^(-1) that of water viscosity. However, for the regime between 10^(-4) - 10^(-1) that of water viscosity we observe power-law dependence of the form k ~ g^(-1/5). These results suggest that estimating folding rates from molecular simulations run at low viscosity under the assumption of linear dependence of rate on inverse viscosity may lead to erroneous results.

11. Insights Into Nucleic Acid Conformational Dynamics from Massively Parallel Stochastic Simulations.

Eric J. Sorin, Young Min Rhee, Bradley J. Nakatani & Vijay S. Pande. Biophysical Journal (2003)

ABSTRACT: The helical hairpin is one of the most ubiquitous and elementary secondary structural motifs in nucleic acids, capable of serving functional roles and participating in long-range tertiary contacts. Yet the self-assembly of these structures has not been well-characterized at the atomic level. With this in mind, the dynamics of nucleic acid hairpin formation and disruption have been studied using a novel computational tool: large-scale, parallel, atomistic molecular dynamics simulation employing an inhomogeneous distributed computer consisting of more than 40,000 processors. Using multiple methodologies, over 500 ms of atomistic simulation time has been collected for a large ensemble of hairpins (sequence 5'- GGGC[GCAA]GCCU-3'), allowing characterization of rare events not previously observable in simulation. From uncoupled ensemble dynamics simulations in unperturbed folding conditions, we report on 1), competing pathways between the folded and unfolded regions of the conformational space; 2), observed non-native stacking and basepairing traps; and 3), a helix unwinding-rewinding mode that is differentiated from the unfolding and folding dynamics. A heterogeneous transition state ensemble is characterized structurally through calculations of conformer-specific folding probabilities and a multiplexed replica exchange stochastic dynamics algorithm is used to derive an approximate folding landscape. A comparison between the observed folding mechanism and that of a peptide b-hairpin analog suggests that although native topology defines the character of the folding landscape, the statistical weighting of potential folding pathways is determined by the chemical nature of the polymer.

10. Multiplexed-Replica Exchange Molecular Dynamics Method for Protein Folding Simulation.

Young Min Rhee & Vijay S. Pande. Biophysical Journal (2003)

ABSTRACT: Simulating protein folding thermodynamics starting purely from a protein sequence is a grand challenge of computational biology. Here, we present an algorithm to calculate a canonical distribution from molecular dynamics simulation of protein folding. This algorithm is based on the replica exchange method where the kinetic trapping problem is overcome by exchanging noninteracting replicas simulated at different temperatures. Our algorithm uses multiplexed-replicas with a number of independent molecular dynamics runs at each temperature. Exchanges of configurations between these multiplexed-replicas are also tried, rendering the algorithm applicable to large-scale distributed computing (i.e., highly heterogeneous parallel computers with processors having different computational power). We demonstrate the enhanced sampling of this algorithm by simulating the folding thermodynamics of a 23 amino acid miniprotein. We show that better convergence is achieved compared to constant temperature molecular dynamics simulation, with an effcient scaling to large number of computer processors. Indeed, this enhanced sampling results in (to our knowledge) the first example of a replica exchange algorithm that samples a folded structure starting from a completely unfolded state.

9. The Trp Cage: Folding Kinetics and Unfolded State Topology via Molecular Dynamics Simulations.

Christopher D. Snow, Bojan Zagrovic, and Vijay S. Pande. Journal of the Americal Chemical Society (2002)

ABSTRACT: A number of rapidly folding proteins have been characterized in recent years.1 These small proteins can provide the first direct comparisons between simulated and experimental protein folding kinetics and pathways. Proteins have been characterized through thermodynamic sampling methods, unfolding simulations, and folding simulations using simple potentials. Here, as described recently, we use several thousand stochastic dynamics simulations in a generalized-Born implicit solvent (in atomic detail) to simulate the folding dynamics of the Trp cage mini-protein under experimental conditions (27 °C with full solvent viscosity,) 91 ps-1). The Folding@home distributed computing project was used to generate an aggregate simulation time of ~100 us (~250 CPU years). First we capture the rapid relaxation from an extended starting condition to a relaxed unfolded state ensemble of thousands of conformations. With continued simulation, a small fraction of these simulations reach the folded state. Furthermore, the topology of the collapsed unfolded state closely resembles the native state.

8. Absolute comparison of simulated and experimental protein-folding dynamics.

Christopher D. Snow, Houbi Ngyen, Vijay S. Pande, and Martin Gruebele. Nature (2002)

ABSTRACT: Protein folding is difficult to simulate with classical molecular dynamics. Secondary structure motifs such as -helices and -hairpins can form in 0.1-10 (ref. 1), whereas small proteins have been shown to fold completely in tens of microseconds. The longest folding simulation to date is a single 1- s simulation of the villin headpiece; however, such single runs may miss many features of the folding process as it is a heterogeneous reaction involving an ensemble of transition states. Here, we have used a distributed computing implementation to produce tens of thousands of 5-20-ns trajectories (700s) to simulate mutants of the designed mini-protein BBA5. The fast relaxation dynamics these predict were compared with the results of laser temperature-jump experiments. Our computational predictions are in excellent agreement with the experimentally determined mean folding times and equilibrium constants. The rapid folding of BBA5 is due to the swift formation of secondary structure. The convergence of experimentally and computationally accessible timescales will allow the comparison of absolute quantities characterizing in vitro and in silico (computed) protein folding.

7. Native-like Mean Structure in the Unfolded Ensemble of Small Proteins.

Bojan Zagrovic, Christopher D. Snow, Siraj Khaliq, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2002)

ABSTRACT: The nature of the unfolded state plays a great role in our understanding of proteins. However, accurately studying the unfolded state with computer simulation is difficult, due to its complexity and the great deal of sampling required. Using a supercluster of over 10,000 processors we have performed close to 800 ms of molecular dynamics simulation in atomistic detail of the folded and unfolded states of three polypeptides from a range of structural classes: the all-alpha villin headpiece molecule, the beta hairpin tryptophan zipper, and a designed alpha-beta zinc finger mimic. A comparison between the folded and the unfolded ensembles reveals that, even though virtually none of the individual members of the unfolded ensemble exhibits native-like features, the mean unfolded structure (averaged over the entire unfolded ensemble) has a native-like geometry. This suggests several novel implications for protein folding and structure prediction as well as new interpretations for experiments which find structure in ensemble-averaged measurements.

6. Simulation of Folding of a Small Alpha-helical Protein in Atomistic Detail using Worldwidedistributed Computing.

Bojan Zagrovic, Christopher D. Snow, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2002)

ABSTRACT: By employing thousands of PCs and new worldwide-distributed computing techniques, we have simulated in atomistic detail the folding of a fastfolding 36-residue a-helical protein from the villin headpiece. The total simulated time exceeds 300 ms, orders of magnitude more than previous simulations of a molecule of this size. Starting from an extended state, we obtained an ensemble of folded structures, which is on average 1.7 and 1.9 away from the native state in Ca distance-based root-meansquare deviation (dRMS) and Cb dRMS sense, respectively. The folding mechanism of villin is most consistent with the hydrophobic collapse view of folding: the molecule collapses non-specifically very quickly (20 ns), which greatly reduces the size of the conformational space that needs to be explored in search of the native state. The conformational search in the collapsed state appears to be rate-limited by the formation of the aromatic core: in a significant fraction of our simulations, the C-terminal phenylalanine residue packs improperly with the rest of the hydrophobic core. We suggest that the breaking of this interaction may be the rate-determining step in the course of folding. On the basis of our simulations we estimate the folding rate of villin to be approximately 5 ms. By analyzing the average features of the folded ensemble obtained by simulation, we see that the mean folded structure is more similar to the native fold than any individual folded structure. This finding highlights the need for simulating ensembles of molecules and averaging the results in an experiment-like fashion if meaningful comparison between simulation and experiment is to be attempted. Moreover, our results demonstrate that (1) the computational methodology exists to simulate the multi-microsecond regime using distributed computing and (2) that potential sets used to describe interatomic interactions may be sufficiently accurate to reach the folded state, at least for small proteins. We conclude with a comparison between our results and current protein-folding theory.

5. Folding@home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology.

Stefan M. Larson, Christopher D. Snow, Michael R. Shirts, and Vijay S. Pande. To appear in Computational Genomics, Richard Grant, editor, Horizon Press, (2002)

ABSTRACT: For decades, researchers have been applying computer simulation to address problems in biology. However, many of these ?grand challenges? in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamentally speed such previously intractable problems using a new computational paradigm: distributed computing. By efficiently harnessing tens of thousands of computers throughout the world, we have been able to break previous computational barriers. However, distributed computing brings new challenges, such as how to efficiently divide a complex calculation of many PCs that are connected by relatively slow networking. Moreover, even if the challenge of accurately reproducing reality can be conquered, a new challenge emerges: how can we take the results of these simulations (typically tens to hundreds of gigabytes of raw data) and gain some insight into the questions at hand. This challenge of the analysis of the sea of data resulting from large-scale simulation will likely remain for decades to come.

4. Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing.

Vijay Pande, et al. Peter Kollman Memorial Issue, Biopolymers (2002)

ABSTRACT: Atomistic simulations of protein folding have the potential to be a great complement to experimental studies, but have been severely limited by the time scales accessible with current computer hardware and algorithms. By employing a worldwide distributed computing network of tens of thousands of PCs and algorithms designed to effciently utilize this new many-processor, highly heterogeneous, loosely coupled distributed computing paradigm, we have been able to simulate hundreds of microseconds of atomistic molecular dynamics. This has allowed us to directly simulate the folding mechanism and to accurately predict the folding rate of several fast-folding proteins and polymers, including a nonbiological helix, polypeptide a-helices, a b-hairpin, and a three-helix bundle protein from the villin headpiece. Our results demonstrate that one can reach the time scales needed to simulate fast folding using distributed computing, and that potential sets used to describe interatomic interactions are suffciently accurate to reach the folded state with experimentally validated rates, at least for small proteins.

3. b-Hairpin Folding Simulations in Atomistic Detail Using an Implicit Solvent Model.

Bojan Zagrovic, Eric J. Sorin, and Vijay Pande, Journal of Molecular Biology (2001)

ABSTRACT: We have used distributed computing techniques and a supercluster of thousands of computer processors to study folding of the C-terminal b-hairpin from protein G in atomistic detail using the GB/SA implicit solvent model at 300 K. We have simulated a total of nearly 38 ms of folding time and obtained eight complete and independent folding trajectories. Starting from an extended state, we observe relaxation to an unfolded state characterized by non-specific, temporary hydrogen bonding. This is followed by the appearance of interactions between hydrophobic residues that stabilize a bent intermediate. Final formation of the complete hydrophobic core occurs cooperatively at the same time that the final hydrogen bonding pattern appears. The folded hairpin structures we observe all contain a closely packed hydrophobic core and proper b-sheet backbone dihedral angles, but they differ in backbone hydrogen bonding pattern. We show that this is consistent with the existing experimental data on the hairpin alone in solution. Our analysis also reveals short-lived semi-helical intermediates which denote a thermodynamic trap. Our results are consistent with a three-state mechanism with a single rate-limiting step in which a varying final hydrogen bond pattern is apparent, and semi-helical off-pathway intermediates may appear early in the folding process. We include details of the ensemble dynamics methodology and a discussion of our achievements using this new computational device for studying dynamics at the atomic level.

2. Mathematical Foundations of ensemble dynamics.

Michael R. Shirts and Vijay Pande, Physical Review Letters (2001)

ABSTRACT: A set of parallel replicas of a single simulation can be statistically coupled to closely approximate long trajectories. In many cases, this produces nearly linear speedup over a single simulation (M times faster with M simulations), rendering previously intractable problems within reach of large computer clusters. Interestingly, by varying the coupling of the parallel simulations, it is possible in some systems to obtain greater than linear speedup. The methods are generalizable to any search algorithm with long residence times in intermediate states.

1. Screen savers of the world, Unite!

Michael R. Shirts and Vijay Pande, Science 2000.

Summary: Is distributed computing a fundamental advance or simply fashionable computing? In this brief letter, we show how distributed computing can be used to tackle problems which make even supercomputers quake. Indeed, we show how distributed computing has the ability to create a supercomputer thousands of times more powerful than any existing machine, due the large number of processors on the internet (hundreds of millions) and the relatively small number of computer processors in supercomputers (thousands).

For More Information, Please See:


Last Updated on July 05, 2008, at 07:19 PM