Machine Learning
Learning
We worked on a broad range of both theoretical and practical prolearning focusing on either delivering widely used systems or studying creative non-standard learning formalisms.
Some examples of this work include:
- One of the earliest theoretical frameworks for “exact” learning with limited memory over data streams where efficiency is measured by the number of passes through the data the system must take to learn a simple concept or discover patterns in a data stream (1990). We introduce a general theoretical framework to study pattern discovery in data streams and proved both upper and lower bounds on the number of passes needed to discover the patterns.
- *Heath, D., S. Kasif, S. R. Kosaraju, S. Salzberg and G. Sullivan, “Learning Nested Concept Classes with Limited Storage”, Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI-91), pp. 777-782, 1991.
This area is now called Data Streaming Algorithms and has hundreds of papers.
- An early attempt to formalize Probabilistic Databases based on Bayes networks. We introduce a new factorization of graphical models that enables solving inference problems in parallel logarithmic time, logarithmic space and incremental (data stream model) logarithmic time.
- Delcher, A, A. Grove, S. Kasif and J. Pearl, “Logarithmic Time Queries and Updates in Probabilistic Networks”, Journal of Artificial Intelligence Research, Vol. 4., pp. 37–59, 1996.
- An early version of Learning with a Helpful Teacher where the teacher is trying to teach a computer a concept by giving it examples it should learn from.
- A very popular decision tree system OC1 that focused on efficiency of implementation and a very early use of randomization in decision tree induction (prior to Random Forests) and use of random projections.
- A new theoretical formalism of Learning Subgraphs with Queries.
- Applications to Computational Biology, Systems Biology and Bioinformatics
- Early on we tried to shift the focus of Machine Learning to Learning Complex Behaviors. We organized a AAAI symposium in 1995-6, (with Stuart Russell at Berkeley) on this topic. Today this area finally became a central area in Machine Learning and has many names including: Deep Learning and Learning Representations.
Selected publications can be found below.
Journal Publications
- A Critical Assessment of Genomic and Clinical Breast Cancer Biomarkers. Chang-Jiun Wu, Tianxi Cai, Zoltan Szallasi, Kamila Naxerova, Martin Steffen, L.J. Wei, Isaac S. Kohane*, Simon Kasif*, co-corresponding authors, in revision.
- Integrative Pheno-genomic module discovery with application to breast cancer, Chang-Jiun Wu, Tianxi Cai, Isaac S. Kohane*, Simon Kasif*, in revision.
- A Systems Biology Approach Links Inflammation to the Predisposition to Metabolic Diseases in Mouse Strains, Marcelo Mori*, Manway Liu*, Simon Kasif, Ron Kahn, Diabetes 2010.
- Transcriptome Analysis Reveals Parallel Dysregulation of Oxidative Metabolism and Inflammation in Muscle and Adipose Tissue With Progression of Insulin Resistance in Humans, Mary-Elizabeth Patti, Manway Liu, Wanzhu Jin, Carles Lerin, Jonathan Dreyfuss, Martha Vokes, Joshua Schroeder, Elizabeth Tatro, Peter Park, Isaac Kohane, Igor Leykin, Simon Kasif, and Allison Goldfine, in preparation.
- COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Roberts RJ, Chang YC, Hu Z, Rachlin JN, Anton BP, Pokrzywa RM, Choi HP, Faller LL, Guleria J, Housman G, Klitgord N, Mazumdar V, McGettrick MG, Osmani L, Swaminathan R, Tao KR, Letovsky S, Vitkup D, Segr D, Salzberg , SL, Delisi C, Steffen M, Kasif S. Nucleic Acids Res. 2010 Nov 21. PMID: 21097892
- Drug Response Phosphorylation Signatures of Lung Cancer, Chang-Jiun Wu, Martin Steffen, Simon Kasif in preparation.
- The Evolution of Gene Annotation, Simon Kasif and Martin Steffen, Nature Chemical Biology, Jan 2010.
- A Predictive phosphorylation signature of lung cancer. Chang-Jiun Wu, Tianxi Cai, Klarisa Rikova, David Merberg, Simon Kasif,* Martin Steffen*, co-corresponding authors, PLoS ONE, Nov 2009.
- Triplet repeat length bias and variation in the human transcriptome, Molla M*, Delcher A*, Cantor CR, Kasif S, Proc. of National Academy of Sciences, Oct. 2009, PMID: 19805156
- Integration of heterogeneous expression data sets extends the role of the retinol pathway in diabetes and insulin resistance. Park PJ, Kong SW, Tebaldi T, Lai WR, Kasif S, Kohane IS. Bioinformatics. 2009 Dec 1;25(23):3121-7. Epub 2009 Sep 28.PMID: 19786482
- Integration of relational and hierarchical network information for protein function prediction. Jiang X, Nariai N, Steffen M, Kasif S, Kolaczyk ED. BMC Bioinformatics. 2008 Aug 22; 9:350. PMID: 18721473
- Analysis of gene expression in a developmental context emphasizes distinct biological leitmotifs in human cancers. Naxerova K, Bult CJ, Peaston A, Fancher K, Knowles BB, Kasif S, Kohane IS. Genome Biol. 2008;9(7):R108. Epub 2008 Jul 8. PMID: 18611264
- Anton BP, Saleh L, Benner JS, Raleigh EA, Kasif S, Roberts RJ.RimO, a MiaB-like enzyme, methylthiolates the universally conserved Asp88 residue of ribosomal protein S12 in Escherichia coli. Proc Natl Acad Sci U S A. 2008 Feb 12;105(6):1826-31. Epub 2008 Feb 5. PMID: 18252828
- Context specific protein function prediction. Nariai N, Kasif S. Genome Inform. 2007; 18:173-82. PMID: 18546485
- Nariai N, Kolaczyk ED, Kasif S. Probabilistic protein function prediction from heterogeneous genome-wide data. PLoS ONE. 2007 Mar 28;2:e337. PMID: 17396164
- Alon N, Asodi V, Cantor C, Kasif S, Rachlin J. Multi-node graphs: a framework for multiplexed biological assays. J Comput Biol. 2006 Dec;13(10):1659-72. PMID: 17238837
- Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles., PLoS Biol. 2007 Jan 9;5(1):e8 PMID: 17214507
- Murali TM, Wu CJ, Kasif S. The art of gene function prediction. Nature Biotechnology. 2006 Dec;24(12):1474-5. PMID: 17160037
- Gustafson AM, Snitkin ES, Parker SC, DeLisi C, Kasif S. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics. 2006 Oct 19;7:265. PMID: 17052348
- Rachlin J, Cohen DD, Cantor C, Kasif S. Biological context networks: a mosaic view of the interactome. Nature/Embo. Mol Syst Biol. 2006;2:66. Epub 2006 Nov 28. PMID: 17130868
- Lee S, Kohane I, Kasif S. Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics. 2005 Nov 27;6:168. PMID: 16309559
- Zheng Y, Anton BP, Roberts RJ, Kasif S. Phylogenetic detection of conserved gene clusters in microbial genomes. BMC Bioinformatics. 2005 Oct 3;6:243. PMID: 16202130
- Szustakowski JD, Kasif S, Weng Z. Less is more: towards an optimal universal description of protein folds. Bioinformatics. 2005 Sep 1;21 Suppl 2:ii66-ii71. PMID: 16204127
- Wu CJ, Kasif S. GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W596-9. PMID: 15980544
- Noga Alon, Richard Beigel and Simon Kasif and Steven Rudich and Benny Sudakov, “Learning a Hidden Matching“, SIAM Journal of Computing, 2004.
- Zheng Y, Roberts RJ, Kasif S. Identification of genes with fast-evolving regions in microbial genomes. Nucleic Acids Res. 2004 Dec 2;32(21):6347-57. Print 2004. PMID: 15576679
- Zhang L, Kasif S, Cantor CR, Broude NE. GC/AT-content spikes as genomic punctuation marks. Proc Natl Acad Sci U S A. 2004 Nov 30;101(48):16855-60. Epub 2004 Nov 17. PMID: 15548610
- Zheng Y, Roberts RJ, Kasif S. Segmentally variable genes: a new perspective on adaptation. PLoS Biol. 2004 Apr;2(4):E81. Epub 2004 Apr 13. PMID: 15094797
- Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2888-93. Epub 2004 Feb 23. PMID: 4981259
- Wu CJ, Fu Y, Murali TM, Kasif S. Gene expression module discovery using Gibbs sampling. Genome Inform. 2004;15(1):239-48. PMID: 15712126
- Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S. RankGene: identification of diagnostic genes based on expression data. Bioinformatics. 2003 Aug 12;19(12):1578-9. PMID: 12912841
- Wu J, Kasif S, DeLisi C. Identification of functional links between genes using phylogenetic profiles. Bioinformatics. 2003 Aug 12;19(12):1524-30. PMID: 12912833
- Zhang L, Pavlovic V, Cantor CR, Kasif S. Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res. 2003 Jun;13(6A):1190-202. Epub 2003 May 12. PMID: 12743024
- Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics. 2003;19 Suppl 1:i197-204. PMID: 12855458
- Zheng Y, Szustakowski JD, Fortnow L, Roberts RJ, Kasif S. Computational identification of operons in microbial genomes. Genome Res. 2002 Aug;12(8):1221-30. PMID: 12176930
- Walker M, Pavlovic V, Kasif S A comparative genomic method for computational identification of prokaryotic translation initiation sites. Nucleic Acids Res. 2002 Jul 15;30(14):3181-91. PMID: 12136100
- Pavlovic V, Garg A, Kasif S. A Bayesian framework for combining gene predictions. Bioinformatics. 2002 Jan;18(1):19-27. PMID: 11836207
- Cai D, Delcher A, Kao B, Kasif S. Modeling splice sites with Bayes networks. Bioinformatics. 2000 Feb;16(2):152-8. PMID: 10842737
- Delcher AL, Harmon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999 Dec 1;27(23):4636-41. PMID: 10556321
- Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. Alignment of whole genomes. Nucleic Acids Res. 1999 Jun 1;27(11):2369-76. PMID: 10325427
- Salzberg SL, Delcher AL, Kasif S, White O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 1998 Jan 15;26(2):544-8. PMID: 9421513
- Kasif, S., Salzberg, S., Waltz, D., J. Rachlin, and Aha, D., “Towards of a Framework for Memory-Based Reasoning”, Artificial Intelligence, pp. 287–311, 1998.
- Kasif, S., “Towards a Constraint-Based Engineering Framework for Algorithm Design and Application”, Journal of Constraints, 1997.
- Delcher, A, A. Grove, S. Kasif and J. Pearl, “Logarithmic Time Queries and Updates in Probabilistic Networks”, Journal of Artificial Intelligence Research, Vol. 4., pp. 37–59, 1996.
- Waltz, D. and S. Kasif, “On Reasoning from Data”, Computing Surveys, 1996.
- Salzberg, S., D. Heath, A. Delcher and S. Kasif, “Best Case Analysis of Nearest Neighbours Algorithms”, IEEE Transaction on Pattern Analysis and Machine Intelligence, 17:6, pp.599–608, June 1995.
- Delcher, A. and S. Kasif, “Term Matching on a Mesh-Connected Array of Processors”, Annals of Mathematics and Artificial Intelligence, Volume 14, pp.177–186, 1995.
- Murthy, S., S. Kasif and S. Salzberg, “ System for Induction of Oblique Decision Trees”, Journal of Artificial Intelligence Research, 2:1, pp.1–33, 1994.
- Heath, D., S. Kasif, S. R. Kosaraju, S. Salzberg and G. Sullivan, “Learning Nested Concept Classes with Limited Memory”, Journal of Experimental and Theoretical AI, 1996 (accepted 1992).
- Kasif, S., “Optimal Parallel Algorithms for Quad-Tree Problems”, Journal of Computer Vision and Image Processing, pp.281–285, May 1994.
- Kasif, S. and A. Delcher, “Analysis of Local Consistency in Parallel Constraint Networks”, Artificial Intelligence, 69, pp.307–327, 1994.
- Heath, D. and S. Kasif, “On Voronoi Covers with Applications to Machine Learning”, Computational Geometry: Theory and Applications, pp. 289-305, Nov. 1993.
- Kasif, S., S. Banerjee, A. Delcher and G. Sullivan, “Some Results on the Complexity of Symmetric Connectionist Networks”, Annals of Mathematics and Artificial Intelligence, pp.327-344, Nov. 1993.
- Delcher, A. and S. Kasif, “Efficient Parallel Term Matching and Anti-Unification”, Journal of Automated Reasoning, pp. 391–406, 1992.
- Kasif, S., “On the Parallel Complexity of Discrete Relaxation in Constraint Networks”, Artificial Intelligence, pp. 275-286, October 1990.
- Kasif, S., L. Kitchen and A. Rosenfeld, “A Hough Transform Technique for Subgraph Isomorphism”, Pattern Recognition Letters. Vol.2, pp.83–88, December 1983.
- Kasif, S. and A. Rosenfeld, “Pyramid Linking as a Special Case of Isodata”, IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-13, No.1, January 1983.
Conference Papers (highly selective papers are marked with a *).
- *Kasif, S. and A. Rosenfeld, “The Fixpoints of Images and Scenes”, Conf. on Computer Vision and Pattern Recognition, pp.454-456, June 1983.
- *Kasif, S., “On the Parallel Complexity of Some Constraint Satisfaction Problems”, National Conf. on Artificial Intelligence (AAAI-86), pp.349-353, August 1986.
- Kasif, S., J. Reif and D. Sherlekar, “Formula Dissection: A Parallel Algorithm for Constraint Satisfaction”, IEEE Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, pp.51-57, October 1987.
- Delcher, A. and S. Kasif, “On the Complexity of Incremental Parallel Computations in Artificial Intelligence”, IEEE Workshop on Computer Architecture for Pattern Analysis and Machine Intelligence, pp.59-64, October 1987.
- Kasif, S., “Efficient Parallel Quad-Tree Algorithms”, Proc. of 1988 ICAI, Tel Aviv, Israel, pp.353-363, December 1988.
- Delcher, A. and S. Kasif, “Parallel Term Matching on Mesh Connected Computers”, Proc. of 1988 ICAI, Tel Aviv, Israel, pp.~441-452, December 1988.
- *Kasif, S., “Parallel Solutions to Constraint Satisfaction Problems”, IEEE Conf. on Principles of Knowledge Representation and Reasoning, pp. 180-187, May 1989.
- *Delcher, A. and S. Kasif, “Parallel Term Matching and Anti-Unification”, International Conf. on Logic Programming, pp.355–369, June 1990.
- *Heath, D., S. Kasif, S. R. Kosaraju, S. Salzberg and G. Sullivan, “Learning Nested Concept Classes with Limited Storage”, Proceedings of the International Joint Conference on Artificial Intelligence(IJCAI-91), pp. 777-782, 1991.
- *Salzberg, S., D. Heath, A. Delcher and S. Kasif, “Learning with a Helpful Teacher”, Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI-91), pp. 705-711, 1991.
- *S. Kasif and A. Delcher, “Improved Decision Making in Game Trees: Recovering from Pathology”, Proceedings of the National Conference on Artificial Intelligence (AAAI-92), pp. 513-518, July 1992.
- D. Heath, S. Kasif and S. Salzberg, “Learning Oblique Decision Trees”, Computational Learning Theory and Natural Learning Systems, 1992.
- Kasif, S. and A. Delcher, “Analysis of Local Consistency in Parallel Constraint Networks”, International Conference on Artificial Intelligence and Vision, pp. 217-231, 1992.
- S. Kasif, “Iterative Focusing and Hashing: An Alternative to Alpha-Beta”, International Conference on Artificial Intelligence and Vision, pp. 59-72, 1992.
- *D. Heath, S. Kasif and S. Salzberg, “Learning Oblique Decision Trees”, Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI 93), pp. 1002–1007, August 1993.
- *Murthy, S., S. Kasif, S. Salzberg and R. Beigel, “OC1: A Randomized Algorithm for Building Oblique Decision Trees”, Proceedings of the National Conference on Artificial Intelligence, (AAAI-93), pp. 322–327, July 93.
- Heath, D., S. Kasif and S. Salzberg, “k-DT: A Multi-Tree Learning Method”, Proceedings of the Second International Workshop on Multi-strategy Learning (pp. 138–149), Harpers Ferry, West Virginia, 1993.
- *Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Protein Secondary-Structure Modeling with Probabilistic Networks”, International Conference on Intelligent Systems and Molecular Biology, pp. 109–117, 1993.
- *Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Application of Probabilistic Causal Trees to Analysis of Protein Secondary Structure”, Proceedings of the National Conference on Artificial Intelligence, pp. 316–321, July 1993.
- *Bright, J., Kasif, L. Stiller, “Exploiting Algebraic Structure in Parallel State-Space Search”, Proc. of the 11-th National Conf. on Artificial Intelligence, July 1994, (AAAI-94), pp. 1341-1346, preliminary version presented in the AAAI Symposium on Massively Parallel AI”, March 1993.
- *Rachlin, J., S. Kasif, S. Salzberg and D. Aha, “Toward of a better understanding of Memory-Based Classifiers”, (plenary talk), Proceeding of the 11-th Intern. Conf. on Machine Learning, pp. 242–250, July 1994.
- *Fulton, T., S. Kasif and S. Salzberg, “Efficient Algorithms for Finding Multi-Way Splits for Decision Trees”, JHU TR, December 1993, Proceeding of the 12-th Intern. Conf. on Machine Learning, July 1995.
- *Delcher, A, A. Grove, S. Kasif and J. Pearl, “Logarithmic Time Queries and Updates in Probabilistic Networks”, Proceedings of the 1995 Conference on Uncertainty in AI, August 1995.
- D. Dobkin, D. Gunopoulous, S. Kasif, “Induction of Low-Depth Decision Trees”, International Conference on Mathematics and Artificial Intelligence, 1996,
- S. Weiss, S. Kasif, and E. Brill, “Towards a Framework for Adaptive Information Retrieval”, AAAI Spring Symposium on Information Retrieval (1996).
- *T. Fulton, S., Kasif, S. Salzberg, and D. Waltz, “Local Induction of Decision Trees”, Proceedings of the 1996 Conference of Knowledge Discovery in Databases”, August 1996.
- R. Grossman, S. Bailey, S. Kasif, “Papyrous: A System for Distributed Data Mining”, Workshop on Distributed Data Mining, NYC, 1998.
- *Beigel, R., N. Alon, S. Apaydin, L. Fortnow, and S. Kasif, “An Optimal Multiplex PCR Protocol for Closing Gaps in Whole Genomes”, RECOMB, April 2001.
- * Noga Alon, Richard Beigel and Simon Kasif and Steven Rudich and Benny Sudakov, “Learning a Hidden Matching”, Foundations of Computer Science, FOCS 2002.
- T.M. Murali and S. Kasif, “Extracting Conserved Gene Expression Motifs from from Microarray Data”, Pacific Symposium on Biocomputing, January 2003.
- B. Logan, P. Moreno, B. Suzek, Z. Weng, and S. Kasif, “Remote Homology Detection Using Feature Vectors Formed Using Alignments of Small Motifs”, RECOMB 2002 (poster and patent 2000).
- D. Pervouchine, J. Graber, and S. Kasif, “Stable RNA Secondary Structure of Human Donor Splice Sites”, RECOMB 2002 (poster)
- M. Walker, V. Pavlovic, and S. Kasif, “A Comparative Genomic Method for Computational Identification of Prokaryotic Translation Initiation Sites”, RECOMB 2002 (poster)
- Y. Zheng, R. J. Roberts, and S. Kasif, “Computational Identification of Operons in Microbial Genomes”, RECOMB 2002 (poster).
- Murali TM, Kasif S. Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput. 2003;:77-88. PMID: 12603019
- *S. Letovsky and S. Kasif, “A Probabilistic Approach to Gene Function Assignment and Propagation in Protein Interaction Networks”, June 2003, ISMB 2003.
Books, Books Chapters and Special Reports
- S. Kasif and A. Delcher, “Analysis of Local Consistency in Parallel Constraint Networks”, Principle and Practice of Constraint Programming, published by MIT press, editors Pascal van Henteryck and V.J. Saraswat, 1994.
- Heath, D., S. Kasif and S. Salzberg, “Committees of Decision Trees”, Cognitive Technology, North Holland Publishers, 1995
- Simon Kasif and Stuart Russell (Eds.), Proceedings of the AAAI Fall Symposium on Learning Complex Behaviors, AAAI Press, 1996.
- J. Flannagan, T. Huang, P. Jones, and S. Kasif, “Human Centered Systems: Information, Interactivity and Intelligence”, Executive NSF Steering Committee Report, July, 1997.
- R. Grossman, S. Kasif, J. Ullman, et al, “Data Mining”, Executive Committee NSF Report, 1998.
- Kasif, S. and A. Delcher, “Biological Data Modeling using Probabilistic Networks”, in Salzberg, S., D. Searls, and S. Kasif, “Computational Methods in Molecular Biology”, Elsevier, Publ. 1998.
- S. Salzberg, D. Searls and S. Kasif, eds, “Computational Methods in Molecular Biology”, Elsevier Publ. 1998. (2nd Printing, February 1999).
- Rich Roberts, Peter Karp, Simon Kasif and Stuart Linn, “An Experimental Approach to Gene Function”, Executive Report, American Academy for Microbiology.