Dependability in a connected world: From the very large to the very small

Speaker:  Saurabh Bagchi – West Lafayette, IN, United States
Topic(s):  Networks and Communications


Much of the computational infrastructure that we encounter today and that we increasingly rely on for critical applications is provided by a distributed system, be it the air traffic control system or the smart electric grid or the cyber physical systems that embed sentient sensors in our physical spaces. These systems are increasing in scale, both in terms of the number of executing elements and the amount of data that they need to process. For example, algorithms for solving complex genomics problems are requiring a larger and larger number of executing elements, while the amount of genomics data that the algorithms have to process are also increasing rapidly. Another emerging trend in distributed systems is that they are being built out of heterogeneous components – different kinds of computer platforms and different software platforms. These two trends have thrown the challenge to the designers of computer systems of how to ensure their high dependability. Dependability is a broad term that encompasses the functionality of detection (tell quickly that there is something wrong), diagnosis (what is the root  cause of the failure), containment (how to prevent the failure from propagating through the system), and in some cases, prediction (of an impending failure, so that proactive mitigation actions can be triggered). The dependability mechanisms must not overly intrude on the application or the execution environment, either in terms of performance impact or in terms of the level of changes that is required from them. 

In this talk, I will describe briefly traditional dependability architectures and in more detail, newer mechanisms that are being devised to handle the challenges mentioned above. I will give a few case studies where dependability mechanisms have achieved great success (financial transactions, embedded medical devices) and some systems studies where a distressing lack of dependability mechanisms have come to the fore. I will conclude by pointing out lessons in systems design that should apply to the emerging classes of distributed systems.

About this Lecture

Number of Slides:  40
Duration:  50 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.