Gene and Protein Function Prediction and Annotation



Selected Publications in Gene Function Prediction




Google Scholar Publications



Journal Publications


  1. UNPUBLISHED REPORT: Beth Logan, Pedro Moreno, Baris Suzek, Zhiping Weng and Simon Kasif, “A Study of Remote Homology Detection“, Cambridge Research Laboratory, Technical Report, CRL 2001/05, June 2001 (first version summer 2000). (** a very early proposal for using simple sequence kernels for protein similarity, remote homology and function prediction.  Very simple but shown to compete well with BLAST, PFAM and other more sophisticated approaches. **)
    Objective: Biochemical Function. Anton BP, Kasif S, Roberts RJ, Steffen M. Nature Frontiers in Genetics, 2014
  2.  COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps. Chang YC, Hu Z, Rachlin J, Anton BP, Kasif S, Roberts RJ, Steffen M.Nucleic Acids Res. 2015 Dec 3. pii: gkv1324. [Epub ahead of print] PMID: 26635392. (**first database that allows searching of knowledge gaps – SEARCHING THE SPACE THAT WE DON’T KNOW **)
  3. The COMBREX project: design, methodology, and initial results. Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova  IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O’Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg, SL, Vitkup D, Letovsky S, Segrè D, DeLisi C, Roberts RJ, Steffen M, Kasif S. PLoS Biology. 2013 Aug;11(8):e1001638. doi: 10.1371/journal.pbio.1001638. Epub 2013 Aug 27. PMID: 24013487
  4. Biochemical Characterization of Hypothetical Proteins from Helicobacter pylori. Choi HP, Juarez S, Ciordia S, Fernandez M, Bargiela R, Albar JP, Mazumdar V, Anton BP, Kasif S, Ferrer M, Steffen M. PLoS One. 2013 Jun 18;8(6):e66605. Print 2013. PMID: 23825549
  5.  Thousands of new genes with strong computational support found in bacterial genomes, Derrick E. Wood, Henry Lin, Ami Levy-Moonshine, Rajiswari Swaminathan, Yi-Chien Chang, Brian P. Anton, Lais Osmani4, Martin Steffen, Simon Kasif, and Steven L. Salzberg, Biology Direct, 2012.
  6. Bridging Computation and Experiments: Using Computers to Drive Biology, (in revision), Science.
  7. A Systems Biology Approach Links Inflammation to the Predisposition to Metabolic Diseases in Mouse Strains, Marcelo Mori*, Manway Liu*, Simon Kasif, Ron Kahn, Diabetes 2010.
  8. Transcriptome Analysis Reveals Parallel Dysregulation of Oxidative Metabolism and Inflammation in Muscle and Adipose Tissue With Progression of Insulin Resistance in Humans, Mary-Elizabeth Patti, Manway Liu, Wanzhu Jin, Carles Lerin, Jonathan Dreyfuss, Martha Vokes, Joshua Schroeder, Elizabeth Tatro, Peter Park, Isaac Kohane, Igor Leykin, Simon Kasif, and Allison Goldfine, in preparation.
  9. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG,   Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segr D, Salzberg , SL, Delisi C, Steffen M, Kasif S. Nucleic Acids Res. 2010 Nov 21. PMID: 21097892
  10. Functional Characterization of the YmcB and YqeV tRNA Methylthiotransferases of\ Bacillus subtilis Brian P. Anton, Susan Russell, Patrick A. Limbach, Simon Kasif,  Elisabeth A. Raleigh, and Richard J. Roberts, Nucleic Acid Research 2010.
  11. The Evolution of Gene Annotation, Simon Kasif and Martin Steffen, Nature Chemical Biology, Jan 2010.
  12. A Predictive phosphorylation signature of lung cancer. Chang-Jiun Wu, Tianxi Cai, Klarisa Rikova, David Merberg, Simon Kasif,* Martin Steffen*, co-corresponding authors,  PLoS ONE, Nov 2009.
  13. Seeing the Forest for the Trees: Using the Gene Ontology to Restructure Hierarchical clustering.  Dotan-Cohen D, Kasif S, Melkman AA, Kasif S, Bioinformatics, 2009
  14. Biological process linkage networks. Dotan-Cohen D, Letovsky S, Melkman AA, Kasif S. PLoS ONE. 2009;4(4):e5313. Epub 2009 Apr 23. PMID 19390589
  15. Integration of relational and hierarchical network information for protein function prediction. Jiang X, Nariai N, Steffen M, Kasif S, Kolaczyk ED. BMC Bioinformatics. 2008 Aug 22; 9:350. PMID: 18721473
  16. Analysis of gene expression in a developmental context emphasizes distinct biological leitmotifs in human cancers. Naxerova K, Bult CJ, Peaston A, Fancher K, Knowles BB, Kasif S, Kohane IS. Genome Biol. 2008;9(7):R108. Epub 2008 Jul 8. PMID: 18611264
  17. Anton BP, Saleh L, Benner JS, Raleigh EA, Kasif S, Roberts RJ.RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A. 2008 Feb 12;105(6):1826-31. Epub 2008 Feb 5. PMID: 18252828
  18. Context specific protein function prediction. Nariai N, Kasif S. Genome Inform. 2007; 18:173-82. PMID: 18546485
  19. Tullai JW, Schaffer ME, Mullenbrock S, Sholder G, Kasif S, Cooper GM. Immediate-early and delayed primary response genes are distinct in function and genomic architecture, J Biol Chem. 2007 Jun 15;
  20. Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 2007 Jun 15;3(6):e96. PMID: 17571924
  21. Nariai N, Kolaczyk ED, Kasif S. Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS ONE. 2007 Mar 28;2:e337. PMID: 17396164
  22. Zhang L, Kasif S, Cantor AC. Quantifying DNA-protein binding specificities by using oligonucleotide mass tags and mass spectroscopy. Proc Natl Acad Sci U S A. 2007 Feb 27;104(9):3061-6. Epub 2007 Feb 20. PMID: 17360609
  23. Tullai JW, Chen J, Schaffer ME, Kamenetsky E, Kasif S, Cooper GM. Glycogen synthase kinase-3 represses cyclic AMP response element-binding protein (CREB)-targeted immediate early genes in quiescent cells. J Biol Chem. 2007 Mar 30;282(13):9482-91. Epub 2007 Feb 3. PMID: 17277356
  24. Alon N, Asodi V, Cantor C, Kasif S, Rachlin J. Multi-node graphs: a framework for multiplexed biological assays. J Comput Biol. 2006 Dec;13(10):1659-72. PMID: 17238837
  25. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles., PLoS Biol. 2007 Jan 9;5(1):e8 PMID: 17214507
  26. Murali TM, Wu CJ, Kasif S. The art of gene function prediction. Nature Biotechnology. 2006 Dec;24(12):1474-5. PMID: 17160037
  27. Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006 Oct 19;7:265. PMID: 17052348
  28. Rachlin J, Cohen DD, Cantor C, Kasif S. Biological context networks: a mosaic view of the interactome. Nature/Embo. Mol Syst Biol. 2006;2:66. Epub 2006 Nov 28. PMID: 17130868
  29. Lee S, Kohane I, Kasif S. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics. 2005 Nov 27;6:168. PMID: 16309559
  30. Zheng Y, Anton BP, Roberts RJ, Kasif S. Phylogenetic detection of conserved gene clusters in microbial genomes. BMC Bioinformatics. 2005 Oct 3;6:243. PMID: 16202130
  31. Szustakowski JD, Kasif S, Weng Z. Less is more: towards an optimal universal description of protein folds. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii66-ii71. PMID: 16204127
  32. Wu CJ, Kasif S.  GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W596-9. PMID: 15980544
  33. Rachlin J, Ding C, Cantor C, Kasif S.  MuPlex: multi-objective multiplex PCR assay design. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W544-7. PMID: 15980531
  34. Zheng Y, Roberts RJ, Kasif S, Guan C.  Characterization of two new aminopeptidases in Escherichia coli. J Bacteriol. 2005 Jun;187(11):3671-7. PMID: 15901689
  35. Zheng Y, Roberts RJ, Kasif S.  Identification of genes with fast-evolving regions in microbial genomes. Nucleic Acids Res. 2004 Dec 2;32(21):6347-57. Print 2004. PMID: 15576679
  36. Zhang L, Kasif S, Cantor CR, Broude NE. GC/AT-content spikes as genomic punctuation marks. Proc Natl Acad Sci U S A. 2004 Nov 30;101(48):16855-60. Epub 2004 Nov 17. PMID: 15548610
  37. Tullai JW, Schaffer ME, Mullenbrock S, Kasif S, Cooper GM. Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK/ERK signaling pathways. J Biol Chem. 2004 May 7;279(19):20167-77. Epub 2004 Feb 9. PMID: 14769801
  38. Zheng Y, Roberts RJ, Kasif S.  Segmentally variable genes: a new perspective on adaptation. PLoS Biol. 2004 Apr;2(4):E81. Epub 2004 Apr 13. PMID: 15094797
  39. Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2888-93. Epub 2004 Feb 23. PMID: 4981259
  40. Wu CJ, Fu Y, Murali TM, Kasif S.  Gene expression module discovery using Gibbs sampling. Genome Inform. 2004;15(1):239-48. PMID: 15712126
  41. Stitziel NO, Binkowski TA, Tseng YY, Kasif S, Liang J.  topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D520-2. PMID: 14681472
  42. Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S. RankGene: identification of diagnostic genes based on expression data. Bioinformatics. 2003 Aug 12;19(12):1578-9. PMID: 12912841
  43. Wu J, Kasif S, DeLisi C.  Identification of functional links between genes using phylogenetic profiles. Bioinformatics. 2003 Aug 12;19(12):1524-30. PMID: 12912833
  44. Stitziel NO, Tseng YY, Pervouchine D, Goddeau D, Kasif S, Liang J.  Structural location of disease-associated single-nucleotide polymorphisms. J Mol Biol. 2003 Apr 11;327(5):1021-30. PMID: 12662927
  45. Letovsky S, Kasif S.  Predicting protein function from protein/protein interaction data: a probabilistic approach.  Bioinformatics. 2003;19 Suppl 1:i197-204. PMID: 12855458
  46. Zheng Y, Roberts RJ, Kasif S. Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biol. 2002 Oct 10;3(11):RESEARCH0060. Epub 2002 Oct 10. PMID: 12429059
  47. Zheng Y, Szustakowski JD, Fortnow L, Roberts RJ, Kasif S.  Computational identification of operons in microbial genomes. Genome Res. 2002 Aug;12(8):1221-30. PMID: 12176930
  48. Pavlovic V, Garg A, Kasif S. A Bayesian framework for combining gene predictions. Bioinformatics. 2002 Jan;18(1):19-27. PMID: 11836207
  49. Lander ES et al,  Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.
  50. Cai D, Delcher A, Kao B, Kasif S. Modeling splice sites with Bayes networks. Bioinformatics. 2000 Feb;16(2):152-8. PMID: 10842737
  51. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999 Dec 1;27(23):4636-41. PMID: 10556321
  52. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. Alignment of whole genomes. Nucleic Acids Res. 1999 Jun 1;27(11):2369-76. PMID: 10325427
  53. Salzberg SL, Delcher AL, Kasif S, White O.  Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 1998 Jan 15;26(2):544-8. PMID: 9421513
  54. *Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Protein Secondary-Structure Modeling with Probabilistic Networks”, International Conference on Intelligent Systems and Molecular Biology, pp. 109–117, 1993.
  55. *Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Application of Probabilistic Causal Trees to Analysis of Protein Secondary Structure”, Proceedings of the National Conference on Artificial Intelligence, pp. 316–321, July 1993.