My Research Journey (in construction)
I am sharing these highlights not to brag about accomplishments but to educate and share lessons from many exceptional mentors, students, fellows, collaborators and unusual colleagues I was fortunate to work with over the course of my career.
Because of my unusual style of collaboration, the story also tracks several major developments and historical twists in several fields such as Artificial Intelligence, Parallel Computing, Machine Learning, Computational Biology/Bioinformatics, Genomics, Systems Biology, Genomic Network Systems Biology and Medicine, AI applications to Biology and Medicine, Science Informatics, Community Science, and more.
This review is chronological and spans over 30 years of work. If you are interested in only one topic please skip to it.
I co-mentored more than 60 students and fellows. I collaborated directly (one on one) with over 70 scientists (not including consortium level collaborators) The full list can be found here:
My eco-system link
The main lesson from my long career so far is :
Dreaming is easy. Building is hard. Combining dreaming and building is the hardest.
Our eco-system (including lab members and affiliates, collaborators, mentors, resources, and more than anyone in academia, our students or fellows) make “dreaming and building” possible. However, we collectively cannot loose track of dreaming. If we just keep up with the present, we would be constantly surprised by the future.
Chapter 1: PhD Research (Parallel AI, Parallel Logic Programming (Deductive Databases), Computer Vision)
(mentors: Jack Minker and Azriel Rosenfeld)
“Turf matters as much as seeds”. Many years after PhD I was on an advisory board for Alberta Innovation Academy with several exceptional individuals. One of them was Richard Taylor, a Nobel winner in Physics. Dick said repeatedly “I grew up on a farm and turf matters as much as seeds”. This lesson is often severely underestimated by scientists and college administrators who see science as an individual sport. It rarely is.
I started my academic journey in Artificial Intelligence.
I became interested in AI early on mostly because of the work by Zohar Manna on “Program Synthesis”. I discovered Zohar Manna’s work while taking a course from Amir Pnueli. Zohar’s approach to program synthesis was based on deductive reasoning. Brilliant work but I felt deduction can (and should be) integrated with induction (learning).
Originally, I was hoping to marry computer vision with logic because I thought it would be fundamental and a key to progress in general AI.
I brought my own ideas into this mix but the work I did with my PhD co-mentors Azriel and Jack was largely independent of each other with some conceptual leakage but not a real marriage.
I believe that integration of logic and perception remains a major unsolved AI problem (40 years later).
I took an unusual path during my PhD because I was co-mentored by two legendary AI researchers who were very different ideologically, philosophically and methodologically. Jack Minker was following the logic foundation of AI and was focusing on Logic Programming. He was one of the fathers of Deductive Databases or what is sometimes called Datalog. Azriel Rosenfeld was one of the fathers of Computer Vision focusing on Perception, Cognition, Computer Vision, Constraints, Neuronal and Pyramidal Networks. However they shared one important thing in common: a BIG heart and empathy at a rare scale.
I learned many things from Azriel and Jack but the most important lessons were:
Lessons 3 & 4
Have an open mind. There is more than one way to study AI (in contrast to several old AI “gurus” that treated AI as a religion rather than an expansive field similar to Biology and Physics. Open AI should mean “Open minded AI” and be inclusive.
“Educating the mind without educating the heart is no education at all”.. Aristotle but both of my PhD mentors were subscribed to it. They routinely engaged in humanitarian and other activities outside their immediate research area.
I co-authored 11+ papers during my PhD. All credit goes to the unusual and creative environment that Azriel and Jack had in their centers.
For some reason that I still cannot fully understand, the connectionists / constraint network / semantic network parts of the AI world firmly believed in parallelism (Azriel Rosenfeld was definitely a supporter of parallelism). However, the main stream logic and rule based AI scientists had little interest in it. The main lesson to young scientists. Don’t judge, just follow your passion. Our work with Jack Minker and his group included the first Parallel Logic Programming System that was actually implemented and tested on a real parallel processor (ZMOB, a 128 processors machine invented by Chuck Rieger). Chuck was an AI visionary who shared our belief that parallel computing is an important and perhaps even essential trait of AI systems. This vision was later reignited by Marvin Minsky and Danny Hillis by building the Connection Machine (a 64000+ processor machine).
This was an extremely complex implementation since the machine had NO operating system and we had to implement all messaging and parallel execution on our own. Many highly capable people were involved in this project but much credit goes to Madhur Kohli (a very gifted systems researcher) for carrying the heaviest programming on his shoulders and Jack of course for his early vision.
We pioneered the first implementation of fork join in parallel AI. Fork-join is an fundamental primitive in parallel programming. It was extended significantly many years later to the very famous and widely used MAP-REDUCE paradigm.
- Eisinger, N., S. Kasif and J. Minker, “Logic Programming: A Parallel Approach”, First International Logic Programming Conf., Faculte des Sciences de Luminy Marseille, France, pp.71–77, September 1982.
- Kasif, S., M. Kohli and J. Minker, “PRISM—A Parallel Inference System Based on Logic”, Logic Programming Workshop, pp.123–152, Portugal, June 1983.
- *Kasif, S., M. Kohli and J. Minker, “Control Facilities of PRISM—A Parallel Inference System Based on Logic”, International Joint Conf. on Artificial Intelligence, August 1983. (Introducing Fork-Join in Parallel Logic Programming).
- Chakravarthy, U. S., S. Kasif, M. Kohli, J. Minker and D. Cao, “Logic Programming on ZMOB: A Highly Parallel Machine”, Proc. 1982 International Conf. on Parallel Processing, IEEE Press, pp.347—349 New York, 1982.
- C. Asper, D. Cao, U.S. Chakravarthy, A. Csoek-Poeskh, S. Kasif, M. Kohli, J. Minker, R. Piazza and D. Wang, Parallel problem solving on ZMOB,Proc. Trends and Applications 83 (1983) pp. 142–146.
- *Kasif, S. and J. Minker, “The Intelligent Channel: A Scheme for Result Sharing in Parallel Logic Programs”, International Joint Conf. on Artificial Intelligence, pp.29-31, August 1985.
- Kasif, S. and A. Rosenfeld, “Pyramid Linking as a Special Case of Isodata”, IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-13, No.1, January 1983. (An early approach for Deep Learning (Unsupervised) in Pyramidal Networks used for Image Segmentation. We proved the unsupervised learning algorithm converges).
- *Kasif, S. and A. Rosenfeld, “The Fixpoints of Images and Scenes”, Conf. on Computer Vision and Pattern Recognition, pp.454-456, June 1983. (A very abstract formulation of cognition that proposes a mathematical foundation of image/scene interpretation using monotone operators in Banach Spaces).
- Kasif, S., L. Kitchen and A. Rosenfeld, “A Hough Transform Technique for Subgraph Isomorphism”, Pattern Recognition Letters. Vol.2, pp.83–88, December 1983. (An early proposal for matching images and graphs using graph walks. We cannot take any credit for the follow-ups of course but today random walks and their spectral network formulations are widely used for graph and network matching).
- Simon Kasif, Parallel Searching and Merging on ZMOB. 1984. University of Maryland TR. (Fast parallel searching and merging to implement joins and other queries in deductive databases).
- Kasif, S., “Control and Data Driven Execution of Logic Programs: A Comparison”, Journal of Parallel Programming.Vol.15, No.1, pp.73–100, February 1987. (thesis chapter complexity analysis of fork join control vs data flow in parallel logic programs, it took a while to publish).
Fast Chapter 2: Academic Jobs Search
In 1985 AI was experiencing explosive growth. The timing was fortunate for anyone in AI seeking academic positions.
My area of parallel AI was particularly hot in part due to the 5th Generation AI Machines Project in Japan that forced the US to respond.
But I expect that the positive response to my applications was in part driven by the generosity of my letter writers and their considerable stature. This academic joke is humorous and self deprecating version of faculty interviews. There is a lot of good work out there of course but the joke is still amusing and provides a perspective.
I visited eight universities and received seven offers straight out of PhD.
I had a very unusual approach to job applications (not recommended). We intersected nice and relatively small cities ranked high for quality of life in the Almanac with top 20 CS departments. I did not apply to any universities in major cities or places that were not ranked high in quality of life in the Almanac. We also excluded the West Coast. At the completion of the interviews I was mostly considering U. Wisconsin (Madison) and Duke U. But at the end, for complex personal and geographic reasons to stay in the DC area, I applied late to Johns Hopkins University (which was neither top 20 in CS nor in highly ranked city in the Almanac) and joined the department of EECS a week or so after my interview. Johns Hopkins U. is a unique institution and I met many life long friends there. I was particularly influenced by Terry Sejnowski, Fred Jelinek, Vernon Mountcastle, Ham Smith, Adi Karni (I audited his Decision Theory Class as a faculty guest), Sol Snyder (I audited his Neuroscience class as a faculty guest) and many direct collaborators and students.
As mentioned, our academic generation was very lucky. Five years later the market dried up, AI was experiencing a winter and it became very challenging to obtain an academic position in AI.
In 2020 (today) AI applicants are lucky again as deep learning is driving enormous interest and hiring in AI.
I am describing this experience in detail because many young PhDs take their interviews way too personally. The faculty interview success is heavily dependent on many factors that are completely independent of the candidate’s ability and accomplishments. So relax and enjoy the ride. Five years later it may change completely :)
Chapter 3: Early Academic Research (Parallel AI)
(with Art Delcher and Lewis Stiller)
After joining Hopkins, I din’t want to continue with heavy software development work and implementations of parallel AI systems. I realized that parallel machines were not ready for us and the overhead of developing the operating system, messaging, and parallel execution on raw architectures were exceptionally challenging but had little todo with basic AI research. Also Hopkins was very small (4 CS faculty total) and unlike universities such as CMU or Illinois the group did not have opportunities for collaboration with computer systems researchers.
I decided to shift research interests and marry parallel algorithms and complexity with AI. Together with exceptional PhD students we produced several foundational results developing basic AI algorithms on abstract parallel models of computation. It was a niche area since parallel algorithms were largely studied by CS theorists who focused on basic graph problems or sorting not AI algorithms. Most of core AI people at that time worked on heuristic search or ad-hoc machine learning ignoring parallel machines (with few notable exceptions).
This allowed our group to have relatively little competition in this space and we established many basic theoretical and other textbook results that stood the test of time. Selected results are listed below and the surprising or fundamental results are highlighted. All the work was theoretical (except the amazing work by Lewis Stiller on the Connection Machine 2).
- Optimal Parallel Term Matching (with Art Delcher) on Shared Memory Parallel Machines (textbook results)
- 2. Lower Bounds on Parallel Constraint Satisfaction (on my own) – A fundamental AI constraint propagation procedures previously believed to be highly parallelizable is inherently sequential (textbook result in AI).
- Optimal Parallel Anti-Unification (with Art Delcher) on Shared Memory Parallel Machines (textbook results)
- Parallel Propositional Satisfiability (with John Reif and Deepak Sherlekar)
- Parallel log time Inference in Bayes Tree Networks (with Art Delcher)
- Parallel Chess End Game Analysis (Lewis Stiller PhD Thesis) – a classical result in Computer Chess
- Incremental Parallel Computation (with Art Delcher, a new complexity class highlighting problems that are difficult to solve incrementally)
- Parallel Inference in Bayes Networks (with Art Delcher, Adam Grove and Judea Pearl
I was very fortunate working with exceptional PhD students in Parallel AI who walked on water. In particular, Art Delcher became a life long friend and collaborator. Lewis Stiller did exceptionally challenging research on chess end game analysis on a 65000 processor parallel machine (Thinking Machines, CM2). He used techniques from computer algebra and group theory integrated with AI and parallel software development.
We also had a number of very gifted undergraduate students rotating with us and doing projects in parallel AI. They are listed on the eco system page.
We learned many profound theoretical, technical and practical ideas, frameworks, and methods. However, one particular lesson that stands out is a question posed by then undergraduate research student Phoebe Sengers who was programming on the Connection Machine (64000+ processors) as an undergraduate project. “What would AI look like if it was pioneered and driven by women?” The most influential people historically early on were Marvin Minsky, John McCarthy, Allen Newell and Herb Simon. Phoebe is now a professor at Cornell working at the intersection of culture and Science .
Chapter 4: Machine Learning: Theory and Widely Used System Development
(with Steven Salzberg, David Heath, S. Murthy, John Rachlin, David Waltz and others)
I wanted to go back to software development and because of Lewis Stiller’s work read up on Decision Diagrams and Decision Trees. At the same time Steven Salzberg joined Hopkins. Steven was an applied machine learning researcher who was interested in developing and validating ML software. Our collaboration proved very fruitful because I was interested in theoretical and conceptually novel developments and Steven cared deeply about dissemination, testing and validation.
We collaborated both on systems development and novel theoretical ideas.
The main topics are highlighted below.
- We introduced a novel model for learning (data mining) using limited memory (1991). This development proceeded the broad interest in data-streaming that became a major field in theoretical CS and data mining. We proved both upper and lower bounds on the number of rounds needed for mining data. We also suggested a cognitive motivation.
- We pioneered a new model for Learning with a Helpful Teacher that allows to learn from very few examples.
- We developed a widely used open access system for decision tree induction, OC1. This system introduced a number of new ideas in decision tree induction:
- Sorting on attributes to enable scalable learning
- Randomization (prior to Random Forest). For the record Random Forest randomization was more rigorously conceived and analyzed.
- We implemented a voting approach on multiple decision trees using randomization (also prior to Random Forest).
- With David Waltz we proposed the first Kernel based on Bayes Networks (BM) to implement a K-NN learning approach using the BN transformed data.
In my view we do not have enough long term and sustainable direct collaborations in AI between applied and theoretical research. However, a review of the high impact literature suggests many successful examples (e.g., McCallum/Lafferty/Pereira, Blum/Mitchell, Freund/Shapire, LISP and many, many more). Many funded projects do lip service to these theory-practice collaborations but largely leave the participants in their own caves. It takes special people without fragile egos and ability to mutually appreciate the intellectual challenge in doing both theory and experimental research at the highest level. At the lowest level, theory is indeed a superior discipline at the intellectual level because it requires formal proofs at any level of the field. However, at the highest level experimental research is as magical and elegant as any theory. I personally remain fully committed to both. However, one needs to be able to selectively lower the bar on either theoretical or empirical methodologies to identify novel paths at the intersection of both. This is one of the great challenges for scientists in these philosophically and methodologically different areas. However, if we learn how to simultaneously advance theory and experiments, the effective translation to practice can be shortened significantly with tremendous benefits to society. Which is what matters the most!
Chapter 5: Bayes Networks (Graphical Models) and Biology
We helped conceive and help popularize the use of Bayes networks (Graphical Models) in Biology starting 1992.
We made two predictions in our 1993-5 papers that provide a good motivation for the utility of graphical models in reasoning about biology or modeling biological systems.
“To summarize, scientific analysis of data is an important potential application of Artificial Intelligence (AI) research. We believe that the ultimate data analysis system using AI techniques will have a wide range of tools at its disposal and will adaptively choose various methods.
It will be able to generate simulations automatically and verify the model it constructed with the data generated during these simulations. When the model does not fit the observed results the system will try to explain the source of error, conduct additional experiments, and choose a different model by modifying system parameters. If it needs user assistance, it will produce a simple low-dimensional view of the constructed model and the data. This will allow the user to guide the system toward constructing a new model and/or generating the next set of experiments.”
“We believe that flexibility, efficiency and direct representation of causality in probabilistic networks are important desirable features that make them very strong candidates as a framework for biological modeling systems.”
This quote From Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Protein Secondary-Structure Modeling with Probabilistic Networks”, International Conference on Intelligent Systems and Molecular Biology, pp. 109–117, 1993.
Delcher, A., S. Kasif, H. Goldberg and W. Xsu, “Application of Probabilistic Causal Trees to Analysis of Protein Secondary Structure”, Proceedings of the National Conference on Artificial Intelligence, pp. 316–321, July 1993.
Based on Prediction 1, we also proposed a relatively efficient computational framework to implement in-silico directed evolution (synthetic biology) using graphical models. More specifically, the framework must perform mutagenesis (e.g., amino acid substitutions in proteins) and screen for mutations and adaptations that satisfy structural and functional constraints. For each such perturbation (mutation) we must perform an efficient inference in the graphical model to assess the probability a given property of a protein or system was affected by the perturbation. Our novel procedure enables to COMPILE THE GRAPHICAL MODEL using dynamic data structures and subsequently compute the change in a specific property in time that is exponentially faster than using normal algorithms. This is described in detail in the paper below.
Delcher, A, A. Grove, S. Kasif and J. Pearl, “Logarithmic Time Queries and Updates in Probabilistic Networks”, Journal of Artificial Intelligence Research, Vol. 4., pp. 37–59, 1996
Chapter 6. Computational Genomics (Human Genome Project, Before and After)
(With Steven Salzberg, Art Delcher, Owen White, Herve Tettelin, many amazing TIGR biologists, Human Genome Project Consortium and many pioneers in computational genomics, Rich Roberts, Charles Cantor, Yu Zheng, Megon Walker, Chunmin Ding, Charles DeLisi, Zhiping Weng and many, many more.)
Selected Results, Ideas, Frameworks:
GLIMMER (one of the most widely used gene finders in bacterial genomes)
MUMMER (widely used comparative genome system, first open access whole genome comparison system for bacteria).
PUBLIC HUMAN GENOME PROJECT (HGP) – with an amazing group of collaborators that include many computational biology pioneers
BAYES NETS FOR ANALYSIS OF SPLICING (with Art Delcher)
Bayes Networks for Genomic Data Integration and Genome Annotation
REMOTE HOMOLOGY PREDICTION USING SIMPLE KERNELS FOR Human Genome Annotation (with Pedro Moreno and others at HP Labs, summer 2000)
MULTIPLEX PCR (for Prenatal Diagnostics of Down Syndrome and Liquid Biopsy).
(With Charles Cantor, John Rachlin, Noga Alon and others.)
Early Network and Genomic Data Integration for Genome Annotation and Gene Function Prediction
(with Rich Roberts and Yu Zheng)
COMPARATIVE ANALYSIS (many papers in human, mouse, model organisms, bacterial genomes).
and more …
We were very lucky to be part of this transformative period but there is a quote that captures this better than most.
“Fortune favors the prepared mind’, Louis Pasteur
Chapter 7. Computational Genomic Network Systems Biology
(with Rich Roberts, Yu Zheng, Stan Letovsky and many others)
1. Network Based Function Prediction (2002-)
Partners: Stan Letovsky, T.M. Murali, Naoki Naryai, Charles Cantor, Rich Roberts, and many others.
We helped conceive and popularize systematic whole genome network based function prediction. Our work was derived from frameworks previously used in AI, Computer Vision and CS that evolved from Markov Random Fields, Neural Nets (Hopfield Networks), Graph Cuts, Graph Diffusion and Label Propagation in Networks. Today, network based function prediction is a widely used framework for the field.
2. Network Based Disease Biomarkers for Diabetes and Metabolic Disease
Partners: Zak Kohane, Ron Kahn, Manway Liu, Terrence Wu, Tianxi Cai and others as part of the I2B2 NIH National Center at Harvard Partners)
3. Multiplex Networks, Biological Context Networks, Multi-node graphs (2004-)
(With Noga Alon, Vera Asodi, Charles Cantor, John Rachlin and others)
4. Regulatory Networks Discovery and Experimental Validation (2001-)
(With Geoff Cooper, John Tullai, Michael Schaffer, Jim Collins, Tim Gardner, J. Faith, Boris Hayette, Esther Rheinbay, Mario Suva, Brad Bernstein and many more.)
Chapter 8. Rich Roberts and the Computational Bridges to Experiments (COMBREX) Project
Chapter 9. I2B2 (Informatics to Bedside) at Harvard Partners
(with Zak Kohane, Tianxi Cai, ME Patti, Ron Kahn, Terrence Wu and many others).
Chapter 10. Joslin Diabetes NIH Center and Regional Systems Biology Core (Diabetes Type 2, Aging)
(With Michael Molla, Jonathan Dreyfuss, Ron Kahn, ME Patti, Allison Goldfine, George King and others)
Based on my experience working with Joslin and inspired by the quote “Make things as simple as possible but not any simpler”, it is tempting to develop a theory of wellness which is minimal, supported by evolution and enables direct control.
Chapter 11. AI2BIO
After working for almost 30 years at the intersection of AI and Biology I believe that the integration of AI and Biology has unlimited potential. This goes way beyond the old arguments about biology inspired AI (e.g., evolutionary algorithms, neural computation, etc). I am alluding to the transformative methodologies developed in AI that can be used to understand nature. In fact, in a recent talk at MIT I have argued “Should Machines Understand Nature to Pass the Turing Test? Co-evolving AI and Systems/Synthetic Biology”
AI and Biology (with Rich Roberts)
AI and Ethics (review)
Tracing Predictions to Experimental Ground Truth (With Rich Roberts)
Chapter 12: ?