Overview

The unprecedented ability to collect massive datasets from large scientific instruments and enterprise data warehouses offers grand challenges for data-intensive computing. At the same time, computing infrastructure is itself undergoing swift changes both in architectures and resource access models. With even smart phones and Raspberry Pi’s being equipped with multiple CPU cores, distributed computing is the norm rather than the exception.

This confluence of “Big Data” applications with emerging computing infrastructure can lead to transformative scientific and societal advances. However, translating this opportunity to scientific discovery and sustainable cities requires advances in the software platforms and middleware. This includes novel programming abstractions to compose distributed applications over new classes of datasets such as streams and dynamic graphs, innovative algorithms that can make use of the potential of such abstractions to scale their techniques, and execution platforms that allow transparent, resilient and efficient usage of distributed computing facilities. Such an integrated framework needs to be as much tuned to the characteristics of the data that they operate upon (e.g., volume, velocity, variety) as to the computing infrastructure that they execute upon (e.g., elasticity, cost, power).

The Distributed Research on Emerging Applications and Machines Lab (DREAM:Lab) focuses on holistic distributed systems research that enables the effective and efficient use of emerging distributed data and computing systems, using scalable software architectures, innovative programming and data abstractions, and algorithms for optimal distributed execution, to support data intensive scientific and engineering applications, which can lead to transformative advances to society.

Housed at the Indian Institute of Science‘s Department of Computational and Data Sciences (CDS), a unique inter-disciplinary department in India offering programs on computational and data sciences, the DREAM:Lab explores the verticals of the data science stack, from data-driven applications to Big Data platforms to emerging distributed infrastructure. Prof. Yogesh Simmhan heads the group.

Some of the concepts the lab explores include:

  • System software fabrics offer the equivalent of an “OS for distributed machines”. While Cloud fabrics have used virtualization to manage thousands of servers at data-centers efficiently, we are examining the role of containers for supporting light-weight sandboxing of application environments and resource allocation. In particular, we are developing ECHO as an IoT Fabric to offer a manageable interface over thousands of edge and Fog devices that will be part of IoT deployments.
  • Big Data Platforms & programming abstractions are a core competency of our team.
  • Data Science Algorithms, Applications and Benchmarks: As the scientific and engineering domains contend with an influx of massive data, they offer a valuable context to apply the advances made in distributed systems research as well as a rich space for discovering novel problems that are as yet unaddressed. Distributed algorithms help translate the application requirements to underlying programming and runtime abstractions, and we particularly work on distributed and dynamic graph algorithms. We also investigate benchmarks to validate emerging applications, platforms or machines, such as for stream processing and edge analytics [101. RIoTBench: A Real-time IoT Benchmark for Distributed Stream Processing Platforms, Anshu Shukla, Shilpa Chaturvedi, Yogesh Simmhan, Concurrency and Computation: Practises and Experience, 2017 (To Appear)],[102. Benchmarking Fast Data Platforms for the Aadhaar Biometric Database, Yogesh Simmhan, Anshu Shukla, Arun Verma, Workshop on Big Data Benchmarking (WBDB), 2015]. Smart Cities offers a vast application domain with its foundations in Cyber Physical Systems (CPS) and Internet of Things (IoT). The IISc Smart Campus project aims to validate distributed technologies on the field to make a sustainable impact [103. Towards a Practical Architecture for Internet of Things: An India-centric View, Prasant Misra, Yogesh Simmhan and Jay Warrior, IEEE IoT Newsletter, 2015],[104. An Open Smart City IoT Test Bed: Street Light Poles as Smart City Spines, Amrutur, Rajaraman, Acharya, Ramesh, Joglekar, Sharma, Simmhan, Lele, Mahesh and Sankaran, International Conference on Internet-of-Things Design and Implementation (IoTDI), 2017]. See the Software and Smart Campus project pages for more details.

The research activities of the DREAM:Lab will advance fundamental knowledge on effectively scaling data-driven scientific applications on contemporary and emerging distributed computing infrastructure. Further, the applied nature of this research will translate novel research outcomes into sustainable software prototypes that will help accelerate scientific discovery in critical application domains of national importance. Taking an integrated view across the research stack, from the system to the application, is important. We also collaborate with industry partners such as NetApp and VMWare, and other research groups at the Robert Bosh Center for Cyber Physical Systems and the University of Melbourne. It is avoids conducting research in a vacuum, under idealized conditions detached from reality. This is particularly important for systems research due to the fast changing nature of computing technology and advances in hardware architectures. At the same time, this must not degenerate to building software, systems or applications as an end in themselves, in the absence of tangible research outcomes.  Such practical grounding will also illustrate to students the value of inter-disciplinary research while also helping train the research scientists and work-force of the future on advance technologies.

We acknowledge the support of our current and past sponsors:

  • IISc’s Robert Bosch Center for Cyber Physical Systems (RBCCPS)
  • GoI’s Ministry of Electronics and Information Technology (MeitY)
  • NetApp Inc.
  • Microsoft Azure for Research
  • TechMahindra

 

Footnotes