Optimizing and Learning Strategies for Protein Docking

Funding Agency: National Institutes of Health (NIH).

Award Number: 1R01GM135930

Principal Investigators: Pirooz Vakili, Yannis Paschalidis, and Sandor Vajda at Boston University.

PROJECT SUMMARY

Overview. The goal of the proposed research is twofold: (1) generalize and further develop a set of new mathematical methods and algorithms motivated by docking that have been introduced by the proposing team; (ii) use the new methods to make further contributions to the solution of the computational molecular/protein docking problem.

Intellectual merit. The problem of protein docking is defined as predicting the three dimensional structure of the docked complex based on knowledge of the structure of individual components. Experimental techniques for this purpose are often expensive, time-consuming, and in some cases not feasible; hence the need for computational docking methods. The problem of finding the docked complex/native conformation, is generally formulated as a minimization problem of an energy-based scoring function. The scoring function is often composed of multiple energy terms that act in different space scales and demonstrate multi-frequency behavior leading to an enormous number of local minima. Furthermore, the process of docking/binding involves conformational changes to the component molecules leading to a highly complex search space for the optimization problem. These features render the optimization problem extremely difficult. Most state-of-the art docking protocols, including ours, employ a multi-stage and multi-scale approach. They begin with a global search of the conformational space using a simplified scoring function in order to identify promising areas of the space. This stage is followed by local optimization using a more detailed and complete scoring function in order to remove clashes. In the final, so called refinement, stage, promising areas found in the first two stages are explored further using a medium space-scale search in order to provide a set of final solutions. It has recently become evident that due to the inaccuracy of the scoring function/energy potentials, the optimization stage outlined above invariably generate a number of false positives at the final phase, namely, conformations that have low score but are far from the native conformation. This motivates us to introduce in this proposal learning methods that combine energy with additional features in order to rank clusters of conformations at the refinement stage and improve final solutions. This proposal has two distinct thrusts: optimization and learning. On the optimization front, the project team in its past research has defined the docking problem as an optimization problem on manifolds. In this project we introduce two novel elements in the manifold optimization formulation that we expect to lead to significant improvements in the performance of docking algorithms. On the learning front, using novel robust optimization techniques we introduce a new and more rigorous approach to robust regression, classification, and outlier detection, in order to (i) obtain improved ranking of clusters in the refinement stage, and (ii) address the important problem of distinguishing between binders and non-binders. Specific tasks for different stages of the docking problem that will be addressed in this project using the novel methodologies are identified and described in the proposal.

Broader impacts: The team already has had definitive successes in protein-protein docking, and submitted some of the best predictions in the ongoing CAPRI (Critical Assessment of Predicted Interactions) worldwide protein docking experiment. It has also developed the server ClusPro, which according to CAPRI is the best automated server currently available. ClusPro has over 8,000 registered users, and structures generated by the server have been reported in over 500 research papers. We envision additional broader impacts of a successful completion of the project, including: (i) contributions to the theory of geometric optimization and statistical learning, (ii) contributing to improved effectiveness of docking protocols, and (iii) contributions to methodologies in other areas of applications, in particular, robotics, computer vision, medical informatics, chemistry, and biology. The newly developed methods will be implemented in ClusPro, further improving the utility of the server and most likely further increasing its user base and impact. On the educational front, our plans include: training and engagement of graduate/undergraduate students, outreach to high school students and teachers; and offering summer internships to interested high school students.