Refinement Methods for Protein Docking based on Exploring Multi-Dimensional Energy Funnels

Funding Agency: National Institute of General Medical Sciences, National Institutes of Health (NIGMS/NIH).

Award Number: 1-R01-GM093147-01.

Principal Investigators: Yannis Paschalidis and Dima Kozakov in collaboration withPirooz Vakili, Boston University.

Project Summary

All successful state-of-the-art protein docking methods employ a so called multistage approach. At the first stage of such approaches a rough energy potential is used to score billions of conformations. At a second stage, thousands of conformations with the best scores are retained and clustered based on a certain similarity metric. Cluster centers correspond to putative predictions/models. Recent work by the proposing team demonstrated that greater prediction quality can be achieved by properly exploring these clusters through a process called refinement. This work resulted in the development of a prototype refinement approach – the Semi-Definite programming-based Underestimation method (SDU).

The central goal of the project is to build on the SDU success and develop a new high-throughput refinement protocol able to produce predictions of near-crystallographic quality in the most computationally efficient manner. Efficiency will be achieved by leveraging the funnel-like shape that binding free energy potentials exhibit. The specific aims are: (1) the development of a new clustering method that can classify the conformations retained from a first-stage method into clusters suitable for the proposed refinement strategy; (2) the characterization of the structure of the multi-dimensional funnel corresponding to each cluster and the development of an efficient refinement strategy to explore this funnel; (3) the development of a side-chain positioning algorithm appropriate for docking by leveraging Markov random field theory; and (4) the dissemination of the algorithms developed through the release to the research community of a software package and an automated refinement server. It is anticipated that the computational efficiency gains of the proposed refinement protocol over alternative Monte Carlo methods will exceed two orders of magnitude, while, at the same time, significantly improve upon the accuracy achieved by earlier refinement approaches.

A novelty of the proposed work is in its use of sophisticated machinery from the fields of optimization and decision theory specially tailored to the biophysical properties of the docking problem. Techniques from convex and combinatorial optimization, machine learning, and Markov random fields are brought to bear on the refinement stage of multistage protein docking approaches. An important element of the work is the systematic characterization of multi-dimensional binding energy funnels. The existence of such funnels has been long conjectured but it has not led to new docking approaches so far. The proposed algorithms essentially achieve this goal by devising efficient strategies to identify, characterize, and explore these funnels.