Genome Analysis

Lander ES et al,  Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.




Google Scholar Publications


Selected Publications


Journal Publications


  1. Objective: Biochemical Function. Anton BP, Kasif S, Roberts RJ, Steffen M. Nature Frontiers in Genetics, 2014
  2. Hypermutable DNA chronicles the evolution of human colon cancer. Naxerova K, Brachtel E, Salk JJ, Seese AM, Power K, Abbasi B, Snuderl M, Chiang, S, Kasif S, Jain RK. Proc Natl Acad Sci U S A. 2014 PMID: 24753616
  3. An aberrant transcription factor network essential for Wnt signaling and stem cell maintenance in glioblastoma. Rheinbay E, Suvà ML, Gillespie SM, Wakimoto H, Patel AP, Shahid M, Oksuz O, Rabkin SD, Martuza RL, Rivera MN, Louis DN, Kasif S, Chi AS, Bernstein BE. Cell Reports. 2013 May 30;3(5):1567-79.  2013.04.021. Epub 2013 May 23. PMID: 23707066
  4. The COMBREX project: design, methodology, and initial results. Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, Mazumdar V, McGettrick M, Osmani L, Pokrzywa R, Rachlin J, Swaminathan R, Allen B, Housman G, Monahan C, Rochussen K, Tao K, Bhagwat AS, Brenner SE, Columbus L, de Crécy-Lagard V, Ferguson D, Fomenkov A, Gadda G, Morgan RD, Osterman AL, Rodionov DA, Rodionova  IA, Rudd KE, Söll D, Spain J, Xu SY, Bateman A, Blumenthal RM, Bollinger JM, Chang WS, Ferrer M, Friedberg I, Galperin MY, Gobeill J, Haft D, Hunt J, Karp P, Klimke W, Krebs C, Macelis D, Madupu R, Martin MJ, Miller JH, O’Donovan C, Palsson B, Ruch P, Setterdahl A, Sutton G, Tate J, Yakunin A, Tchigvintsev D, Plata G, Hu J, Greiner R, Horn D, Sjölander K, Salzberg, SL, Vitkup D, Letovsky S, Segrè D, DeLisi C, Roberts RJ, Steffen M, Kasif S. PLoS Biology. 2013 Aug;11(8):e1001638. doi: 10.1371/journal.pbio.1001638. Epub 2013 Aug 27. PMID: 24013487
  5. Biochemical Characterization of Hypothetical Proteins from Helicobacter pylori. Choi HP, Juarez S, Ciordia S, Fernandez M, Bargiela R, Albar JP, Mazumdar V, Anton BP, Kasif S, Ferrer M, Steffen M. PLoS One. 2013 Jun 18;8(6):e66605. Print 2013. PMID: 23825549
  6. Deep sequencing of the oral microbiome reveals s signatures of periodontal disease, Bo Liu1,2*, Lina Faller3*, Niels Klitgord3*, Varun Mazumdar3*, Mohammad Ghodsi1,2,Daniel D. Sommer1, Theodore R. Gibbons1,4, Todd J. Treangen1,10, Yi-Chien Chang3,Shan Li5, O. Colin Stine5, Hatice Hasturk8, Simon Kasif3,7,9, Daniel Segrè3,6,7#,  Mihai Pop1,2,4#, Salomon Amar3,8#,, PLOS One, 2012
  7. Network Signatures of Insulin Resistance and Life Span, M. Liu, M. Mori, IS. Kohane, ME Patti, A. Goldfine, CR Kahn, S. Kasif, in preparation.
  8. Chris Nogiec, Allison Burkhart, Simon Kasif* and ME Patti*, Metabolic Analysis of Insulin Resistance, in revision (contributed equally authors).
  9.  Thousands of new genes with strong computational support found in bacterial genomes, Derrick E. Wood, Henry Lin, Ami Levy-Moonshine, Rajiswari Swaminathan, Yi-Chien Chang, Brian P. Anton, Lais Osmani4, Martin Steffen, Simon Kasif, and Steven L. Salzberg, Biology Direct, 2012.
  10. Bridging Computation and Experiments: Using Computers to Drive Biology, (in revision), Science.
  11. Towards a Personal Repeatome: A Genome-Wide Survey of Length Variation in Short Tandem Repeats in Human Gene Transcripts, in preparation.
  12. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG,   Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segr D, Salzberg , SL, Delisi C, Steffen M, Kasif S. Nucleic Acids Res. 2010 Nov 21. PMID: 21097892
  13. Functional Characterization of the YmcB and YqeV tRNA Methylthiotransferases of\ Bacillus subtilis Brian P. Anton, Susan Russell, Patrick A. Limbach, Simon Kasif,  Elisabeth A. Raleigh, and Richard J. Roberts, Nucleic Acid Research 2010.
  14. The Evolution of Gene Annotation, Simon Kasif and Martin Steffen, Nature Chemical Biology, Jan 2010.
  15. Triplet repeat length bias and variation in the human transcriptome,  Molla M*, Delcher A*, Cantor CR, Kasif S, Proc. of National Academy of Sciences, Oct. 2009, PMID: 19805156
  16. Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance. Park PJ, Kong SW, Tebaldi T, Lai WR, Kasif S, Kohane IS. Bioinformatics. 2009 Dec 1;25(23):3121-7. Epub 2009 Sep 28.PMID: 19786482
  17. Seeing the Forest for the Trees: Using the Gene Ontology to Restructure Hierarchical clustering.  Dotan-Cohen D, Kasif S, Melkman AA, Kasif S, Bioinformatics, 2009
  18. Biological process linkage networks. Dotan-Cohen D, Letovsky S, Melkman AA, Kasif S. PLoS ONE. 2009;4(4):e5313. Epub 2009 Apr 23. PMID 19390589
  19. Quantitative analysis of single nucleotide polymorphisms within copy number variation. Lee S, Kasif S, Weng Z, Cantor CR. PLoS ONE. 2008;3(12):e3906. Epub 2008 Dec 18. PMID: 19093001
  20. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. Ku M, Koche RP, Rheinbay E, Mendenhall EM, Endoh M, Mikkelsen TS, Presser A, Nusbaum C, Xie X, Chi AS, Adli M, Kasif S, Ptaszek LM, Cowan CA, Lander ES, Koseki H, Bernstein BE. PLoS Genet. 2008 Oct;4(10):e1000242. Epub 2008 Oct 31. PMID: 18974828
  21. Integration of relational and hierarchical network information for protein function prediction. Jiang X, Nariai N, Steffen M, Kasif S, Kolaczyk ED. BMC Bioinformatics. 2008 Aug 22; 9:350. PMID: 18721473
  22. Analysis of gene expression in a developmental context emphasizes distinct biological leitmotifs in human cancers. Naxerova K, Bult CJ, Peaston A, Fancher K, Knowles BB, Kasif S, Kohane IS. Genome Biol. 2008;9(7):R108. Epub 2008 Jul 8. PMID: 18611264
  23. Anton BP, Saleh L, Benner JS, Raleigh EA, Kasif S, Roberts RJ.RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A. 2008 Feb 12;105(6):1826-31. Epub 2008 Feb 5. PMID: 18252828
  24. Context specific protein function prediction. Nariai N, Kasif S. Genome Inform. 2007; 18:173-82. PMID: 18546485
  25. Tullai JW, Schaffer ME, Mullenbrock S, Sholder G, Kasif S, Cooper GM. Immediate-early and delayed primary response genes are distinct in function and genomic architecture, J Biol Chem. 2007 Jun 15;
  26. Nariai N, Kolaczyk ED, Kasif S. Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS ONE. 2007 Mar 28;2:e337. PMID: 17396164
  27. Zhang L, Kasif S, Cantor AC. Quantifying DNA-protein binding specificities by using oligonucleotide mass tags and mass spectroscopy. Proc Natl Acad Sci U S A. 2007 Feb 27;104(9):3061-6. Epub 2007 Feb 20. PMID: 17360609
  28. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles., PLoS Biol. 2007 Jan 9;5(1):e8 PMID: 17214507
  29. Murali TM, Wu CJ, Kasif S. The art of gene function prediction. Nature Biotechnology. 2006 Dec;24(12):1474-5. PMID: 17160037
  30. Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006 Oct 19;7:265. PMID: 17052348
  31. Lee S, Kasif S. The complete genome sequence of a dog: a perspective. Bioessays. 2006 Jun;28(6):569-73. PMID: 16700068
  32. Lee S, Kohane I, Kasif S. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics. 2005 Nov 27;6:168. PMID: 16309559
  33. Zheng Y, Anton BP, Roberts RJ, Kasif S. Phylogenetic detection of conserved gene clusters in microbial genomes. BMC Bioinformatics. 2005 Oct 3;6:243. PMID: 16202130
  34. Szustakowski JD, Kasif S, Weng Z. Less is more: towards an optimal universal description of protein folds. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii66-ii71. PMID: 16204127
  35. Rachlin J, Ding C, Cantor C, Kasif S. Computational tradeoffs in multiplex PCR assay design for SNP genotyping. BMC Genomics. 2005 Jul 25;6:102.PMID: 16042802
  36. Wu CJ, Kasif S.  GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W596-9. PMID: 15980544
  37. Rachlin J, Ding C, Cantor C, Kasif S.  MuPlex: multi-objective multiplex PCR assay design. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W544-7. PMID: 15980531
  38. Zheng Y, Roberts RJ, Kasif S, Guan C.  Characterization of two new aminopeptidases in Escherichia coli. J Bacteriol. 2005 Jun;187(11):3671-7. PMID: 15901689
  39. Zheng Y, Roberts RJ, Kasif S.  Identification of genes with fast-evolving regions in microbial genomes. Nucleic Acids Res. 2004 Dec 2;32(21):6347-57. Print 2004. PMID: 15576679
  40. Zhang L, Kasif S, Cantor CR, Broude NE. GC/AT-content spikes as genomic punctuation marks. Proc Natl Acad Sci U S A. 2004 Nov 30;101(48):16855-60. Epub 2004 Nov 17. PMID: 15548610
  41. Tullai JW, Schaffer ME, Mullenbrock S, Kasif S, Cooper GM. Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK/ERK signaling pathways. J Biol Chem. 2004 May 7;279(19):20167-77. Epub 2004 Feb 9. PMID: 14769801
  42. Zheng Y, Roberts RJ, Kasif S.  Segmentally variable genes: a new perspective on adaptation. PLoS Biol. 2004 Apr;2(4):E81. Epub 2004 Apr 13. PMID: 15094797
  43. Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2888-93. Epub 2004 Feb 23. PMID: 4981259
  44. Wu CJ, Fu Y, Murali TM, Kasif S.  Gene expression module discovery using Gibbs sampling. Genome Inform. 2004;15(1):239-48. PMID: 15712126
  45. Stitziel NO, Binkowski TA, Tseng YY, Kasif S, Liang J.  topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D520-2. PMID: 14681472
  46. Wu J, Kasif S, DeLisi C.  Identification of functional links between genes using phylogenetic profiles. Bioinformatics. 2003 Aug 12;19(12):1524-30. PMID: 12912833
  47. Zhang L, Pavlovic V, Cantor CR, Kasif S. Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res. 2003 Jun;13(6A):1190-202. Epub 2003 May 12. PMID: 12743024
  48. Pervouchine DD, Graber JH, Kasif S. On the normalization of RNA equilibrium free energy to the length of the sequence.  Nucleic Acids Res. 2003 May 1;31(9):e49. PMID: 12711694
  49. Stitziel NO, Tseng YY, Pervouchine D, Goddeau D, Kasif S, Liang J.  Structural location of disease-associated single-nucleotide polymorphisms. J Mol Biol. 2003 Apr 11;327(5):1021-30. PMID: 12662927
  50. Letovsky S, Kasif S.  Predicting protein function from protein/protein interaction data: a probabilistic approach.  Bioinformatics. 2003;19 Suppl 1:i197-204. PMID: 12855458
  51. Kasif S, Weng Z, Derti A, Beigel R, DeLisi C. A computational framework for optimal masking in the synthesis of oligonucleotide microarrays. Nucleic Acids Res. 2002 Oct 15;30(20):e106. PMID: 12384608
  52. Zheng Y, Roberts RJ, Kasif S. Genomic functional annotation using co-evolution profiles of gene clusters. Genome Biol. 2002 Oct 10;3(11):RESEARCH0060. Epub 2002 Oct 10. PMID: 12429059
  53. Zheng Y, Szustakowski JD, Fortnow L, Roberts RJ, Kasif S.  Computational identification of operons in microbial genomes. Genome Res. 2002 Aug;12(8):1221-30. PMID: 12176930
  54. Walker M, Pavlovic V, Kasif S A comparative genomic method for computational identification of prokaryotic translation initiation sites. Nucleic Acids Res. 2002 Jul 15;30(14):3181-91. PMID: 12136100
  55. Pavlovic V, Garg A, Kasif S. A Bayesian framework for combining gene predictions. Bioinformatics. 2002 Jan;18(1):19-27. PMID: 11836207
  56. Lander ES et al,  Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.
  57. Cai D, Delcher A, Kao B, Kasif S. Modeling splice sites with Bayes networks. Bioinformatics. 2000 Feb;16(2):152-8. PMID: 10842737
  58. Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL.  Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. Genomics. 1999 Dec 15;62(3):500-7. PMID: 10644449
  59. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999 Dec 1;27(23):4636-41. PMID: 10556321
  60. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. Alignment of whole genomes. Nucleic Acids Res. 1999 Jun 1;27(11):2369-76. PMID: 10325427
  61. Salzberg SL, Delcher AL, Kasif S, White O.  Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 1998 Jan 15;26(2):544-8. PMID: 9421513
  62. Kasif, S.,  “Datascope: Mining Biological Sequences”, special issue on Data Mining, IEEE Intelligent Systems,  pp. 38–45, 1999.
  63. Kasif, S., “Towards a Constraint-Based Engineering Framework for Algorithm Design and Application”, Journal of Constraints, 1997.
  64. Delcher, A, A. Grove, S. Kasif and J. Pearl, “Logarithmic Time Queries and Updates in Probabilistic Networks”, Journal of Artificial Intelligence Research, Vol. 4., pp. 37–59, 1996.
  65. Waltz, D. and S. Kasif, “On Reasoning from Data”, Computing Surveys, 1996.

Books, Books Chapters and Special Reports

  1. Rich Roberts, Peter Karp, Simon Kasif and Stuart Linn, “An Experimental Approach to Gene Function”, Executive Report, American Academy for Microbiology.
  2. Satoru Miyano Jill Mesirov, Simon Kasif, Sorin Istrail Pavel Pevzner, Michael Waterman, Research in Computational Molecular Biology: 9th Annual International Conference, RECOMB 2005, Cambridge, MA, USA, May 14-18, 2005, Proceedings Lecture Notes in Bioinformatics.


Unpublished Technical Reports

  1. Zhang, L., Pavlovic, E. Green, G. Cantor, C.R. and S. Kasif, “Novel Human Gene Discovery using Cross-Species Analysis: Is the Mouse the Right Reference Point?”.
  2. B. Logan, P. Moreno, B, Suzek , Z. Weng and S. Kasif, “Learning Remote-Homology Using Probabilistic Feature Vectors“, Cambridge Research Lab TR 2000.
  3. Walker, M. and Kasif. S., “Comparative Analysis of lipid biosynthesis and metabolism component of the DnaA regulon”.
  4. Joseph D. Szustakowski, Ulas Karaoz, Serafim Batzouglou, James Galagan, Tarjei Mikkelsen, Zhiping Weng, Joel H. Graber, S. Kasif, On the Organization of Ancient and Modern Genes in the Human Genome.