Computational Biology is a vast field. It made a number of significant fundamental or methodological contributions to many branches of Life, Physical and Computational Sciences.
There are many technically challenging problems computational biologists study and solve. However, we did not attempt to make a contribution to all fields and therefore only specific topics are included in this broad but not fully inclusive overview statement of the community work here.
Our own journey through the field followed the following vision motivated by a challenging broad question a large portion of the work in the field aims to answer:
Given the genome of a living organism and a large collection of measurements obtained from this organism, can we compute a reasonable knowledge base or model of the organism that will enable answering complex predictive queries about both the biology of the organism and phenotypes exhibited by this organism in different conditions?
In particular, as an important part of the puzzle, infer which variants in the biology of the organism (including the genome) are associated with aberrant behaviors leading to disease or pathogenicity.
While on surface this goal appears beyond reach, substantial progress has been made using a true community based research program supported by individual inventiveness, small collaborations and broad consortiums.
A classical biologist examining this introduction would be convinced by now that it is inconceivable to consider a problem of this scale. After all we do not fully understand the consequences of a single mutation in important proteins such as BRACA1 or EGFR and whether the mutation might result in a loss of function or activation leading to either increased risk of cancer or improved drug response to specific therapies.
However, neither planes or computers have been built overnight. We see this problem as an engineering problem for the community (rather a single scientist). The pheno-geno challenge is achieved by decomposing the problem into smaller, more tractable problems. Each is solved using principled approches. In many cases these “small problems” formed mini-fields of computational biology. Our group and our close collaborators managed to make early and hopefully systematic contributions to many parts of assembling this puzzle.
For each sub-problem we list our own publications as well as pointers (two or three at most) to selected papers in this space. No attempt to provide full coverage is made.
The pheno-geno challenge in computational biology
Step 1: Sequence and assemble the genome of the organism.
Step 2: Identify the functional part list of the organism that includes protein coding genes (Gene Finding) and other functional or regulatory elements.
Step3: Decipher the biochemical and biological function of these functional elements. Deciphering the function of elements means producing a PREDICTIVE understanding of this function that ties the DNA sequence coding for families of functional elements and the phenotype changes induced by changes in the sequence.
Biological Networks: Function in normal conditions and pathology
Step 4: Identify how these functional elements interact with each other. More specifically, identify the protein-protein and regulatory modules which form the building blocks of the organism.
Step 5: Assemble these modules into coherent pathways or complex networks controlling the biology of the organism.
Step 6. Associate the parts, modules, pathways and networks with both biological and clinical phenotypes.
Step 7. Integrate available data and models to produce causal or probabilistic networks, mathematical models or logical rules that can answer complex queries about the organism and the relationship of its genome to the phenotypes.
Step 8. Produce systematic procedures and technologies for disrupting disease by augmenting or rewiring the biological networks aberrantly activated in disease.
For most of these problems we deploy the PREDICT-VALIDATE CYCLE that begins by building predictive models from available data, testing these predictions experimentally in the laboratory, augmenting the models and repeating the cycle.