ITR: COLLABORATIVE RESEARCH: Diagnosis and Assessment of Faults, Misbehavior and Threats in Distributed Systems and Networks

Funding Agency: National Science Foundation, Directorate for Engineering, Division of Electrical and Communications Systems, Information Technology Research for National Priorities (ITR) Project.

Award Number: ECS-0426453.

Principal Investigators: Yannis Paschalidis, Boston University (with C. Hadjicostis, C. Beck, R. Sreenivas at the University of Illinois at Urbana-Champaign, J. Tsitsiklis at MIT, S. Tatikonda at Yale, and K. Thulasiraman at the University of Oklahoma)

Project Summary

The proposed research develops theory and techniques for monitoring and diagnosing faults, hazards or, more generally, functional changes in dynamic systems and networks, under limited and possibly corrupted information. We present a unifying and multifaceted approach to this problem that decomposes the large body of fault diagnosis research into six topics: (i) deterministic fault diagnosis, (ii) model-based probabilistic diagnosis, (iii) adaptive and sequential diagnosis, (iv) distributed system-level diagnosis with communication constraints in wired/wireless networks, (v) fault diagnosis via distributed belief propagation algorithms, and (vi) model-independent diagnosis. The research team will leverage its expertise in the areas of fault diagnosis, sequential detection, system-level diagnosis, distributed control, modeling, analysis and performance evaluation, applied probability, graph theory, belief propagation and model reduction to the problem of detecting, identifying and localizing faults and abnormalities in dynamically evolving environments. Beyond intellectual value, the research program proposed will have broader impacts in a variety of ways.

Broader Impact: Networks and networked systems are increasingly solidifying their roles as building blocks of the nation’s economic and social foundation. Numerous emerging commercial, governmental, medical, military and security applications are vitally dependent on these systems, creating a growing need for ensuring that these critical infrastructures are reliable and trustworthy in spite of malicious or non-malicious disruptions. Building trustworthy networked systems using off-the-shelf components and software presents a significant hurdle that needs to be overcome in order to exploit the full potential of networked systems. The proposed project outlines a synergistic and comprehensive approach for scalable methodologies for diagnosing faults, adversarial behavior and threats in complex systems and networks, under uncertain information and possibly in the presence of communication errors and constraints. The successful completion of this project will make a substantial and timely contribution to the National Priority Area of National and Homeland Security (NHS) because of its ramifications in the monitoring, testing, and reliable and secure operation of networked systems, communication networks, and complex digital systems. The development of distributed algorithms for fault diagnosis and the resulting overall enhancement of distributed systems in ways that make them more reliable also contributes to the National Priority Area of Advanced Science and Engineering (ASE). The integration of data models, distributed algorithms, system dynamics, control and decision making is aligned with the technical focus area of data, models and communications (dmc), and the development of critical support mechanisms for reliable operation of complex dynamic systems and networks is aligned with the technical focus area of integration of computing (int).

Intellectual Merit: The intellectual merit of this proposal lies in the synergistic and comprehensive exploration of different dimensions within the broad area of detection and identification of faults or, more generally, abnormal behavior in complex dynamic systems and networks. The ultimate goal is to develop appropriate models and innovative distributed algorithms that integrate and unify techniques from a number of diverse disciplines, including fault diagnosis in discrete event systems, detection and estimation, graph theory and optimization, distributed system-level diagnosis, belief propagation, model reduction and information theory. Apart from advancing the forefront of the various individual approaches to diagnosis, the overarching theme is the integration of these ideas into a well-defined approach that achieves the advantages of both deterministic and probabilistic methodologies via scalable models and algorithms. While extending the frontiers in the broad area of fault diagnosis in complex dynamic systems and networks, this research will at the same time leverage the applicability of these techniques to the design of test platforms for experimenting with distributed fault diagnosis in ad-hoc mobile networks and fault localization in indoor sensor networks.

Educational Impact and Outreach: The main educational goals of this program are two-fold: (i) To develop courses and educational materials that discuss systematic approaches for algorithms and architectures for fault diagnosis and tolerance in complex systems and networks. For example, Web-based lectures on special topics of interest will be established for practicing engineers in the field; a senior/graduate level course on this topic will be developed; a central Web-page will be maintained at the University of Illinois to disseminate new results within the members of the team as well as to the broader scientific and research community. (ii) To continue to actively recruit and mentor participants from underrepresented groups in our respective research programs.