class="head no_bottom_margin" id="sec1title">IntroductionProteins move on a variety of timescales, encompassing motions from the vibration of a single bond to the collective movement of whole domains (, ). X-ray crystallography provides a static view of the structure of proteins. However, when only static structures are available the dynamic processes crucial to protein function () are difficult to elucidate. Experimental techniques to explore the dynamics of proteins, such as nuclear magnetic resonance (NMR), are sophisticated and time-consuming. Molecular dynamics (MD) is a widespread computational method for predicting protein motions and generating ensembles of protein structures. It is effective at modeling motions up to the timescale of nanoseconds. However, the computational cost of modeling proteins on the scale of microseconds or milliseconds means that MD is not suitable for larger-scale transitions. Advanced MD methods such as targeted or accelerated MD can overcome this sampling problem (), but these methods are not yet routinely applicable due to the parameterization required for each protein.Various non-MD methods have been used to generate ensembles of protein structures from a crystal input structure, and hence explore protein dynamics. These ensembles have uses in flexible ligand docking (), generating poses for protein-protein docking (), predicting structures on trajectories between two crystal structures (), and predicting flexible regions in proteins ().CONCOORD (, ) is a distance geometry method to generate structures from an input structure, and consists of a two-step process. First, the different types of chemical interactions in the input structure, e.g., hydrogen bonding and hydrophobic interactions, are converted to distance constraints with a given tolerance. Next, an iterative minimization procedure is performed to move a set of randomly placed coordinates such that most distance constraints are satisfied. This generates a protein structure in a manner similar to the way a structure is produced from NMR constraints. The process is repeated to obtain an ensemble of structures. tCONCOORD extends CONCOORD and gives better sampling of proteins with large conformational changes by predicting hydrogen bonds in the structure that are liable to break ().Normal mode analysis (NMA) can also be used to generate conformations of proteins, usually by modeling the protein along the relevant vibrations. The NMSim web server (, ) finds flexible and rigid protein regions using the graph theoretical approach FIRST (), then generates conformations along low-frequency normal modes. The generated structures are iteratively corrected to produce valid stereochemistry.Modeling conformational transitions is essential in understanding biological processes such as allostery, whereby an effector at a site distant from the active site causes a change in structure or dynamics that leads to a functional change in the protein (). Allostery can arise from non-covalent interactions (e.g., drug binding), covalent interactions (e.g., phosphorylation) and light absorption. This intrinsic property of proteins (href="#bib18" rid="bib18" class=" bibr popnode">Gunasekaran et al., 2004) is important in processes such as cellular signaling and disease, although most allosteric mechanisms remain an enigma and a universal mechanism has not been found (href="#bib34" rid="bib34" class=" bibr popnode">Nussinov and Tsai, 2013).The discovery of new allosteric modulators is of pressing concern, due to their considerable potential as therapeutics (href="#bib26" rid="bib26" class=" bibr popnode">Lamba and Ghosh, 2012). Allosteric modulators have been elucidated for targets as diverse as the γ-aminobutyric acid receptor, hepatitis C virus polymerase, and RNA. Allosteric modulator discovery by virtual screening is an exciting prospect furthered by the elucidation of previously unknown allosteric sites found on solved protein structures (href="#bib35" rid="bib35" class=" bibr popnode">Panjkovich and Daura, 2010). There is an increasing number of entries in the AlloSteric Database (ASD) (href="#bib44" rid="bib44" class=" bibr popnode">Shen et al., 2016), which currently contains more than 1,400 proteins. This shows that a large variety of proteins have allosteric character and implies that many proteins have allosteric character yet to be discovered. However, discovery of allosteric drugs presents challenges beyond those encountered in orthosteric drug discovery. Whether the drug will activate or inhibit the protein is difficult to predict, and in many cases the location of allosteric sites is unknown. Existing approaches for allosteric site prediction include using changes in flexibility on ligand binding (href="#bib31" rid="bib31" class=" bibr popnode">Mitternacht and Berezovsky, 2011, href="#bib36" rid="bib36" class=" bibr popnode">Panjkovich and Daura, 2012, href="#bib17" rid="bib17" class=" bibr popnode">Greener and Sternberg, 2015), machine learning on pocket features (href="#bib23" rid="bib23" class=" bibr popnode">Huang et al., 2013, href="#bib10" rid="bib10" class=" bibr popnode">Cimermancic et al., 2016) and structural conservation (href="#bib35" rid="bib35" class=" bibr popnode">Panjkovich and Daura, 2010).Allostery can be thought of as a property of the ensemble of available protein structures (href="#bib32" rid="bib32" class=" bibr popnode">Motlagh et al., 2014). A perturbation at any site in the structure leads to a shift in the occupancy of states by the population. The conformational selection paradigm suggests that all states available to the protein pre-exist, but certain states (e.g., an allosteric inactive state) are only significantly populated when the allosteric modulator is present. If a method can model the structural ensemble in such a way that the effect of modulators can be predicted, sites with allosteric character can be found.Here we present a novel distance geometry-based method, named ExProSE (Exploration of Protein Structural Ensembles), for protein ensemble generation and allosteric site prediction. By using distance constraints from two crystal structures, ExProSE produces ensembles of protein structures that sample biologically relevant conformations. The ensemble differs from an ensemble arising from MD. The structures are not a snapshot in time on a trajectory; instead, each structure is generated independently. We show that ExProSE provides better coverage of the conformational space than existing methods. Allosteric sites on a set of proteins are predicted by examining the effect of potential modulators on the population distribution of the ensemble. To our knowledge, this is the first study to integrate available structural data into a general framework that allows exploration of protein dynamics and allostery, and that provides models for further studies such as ligand docking.
展开▼