Distributed Machines


The New Normal

Computing infrastructure is undergoing swift changes both in architectures (e.g. power-efficient ARM processors, GPGPU accelerators) and resource access models (elastic Cloud computing). Even mobile phones, let alone workstations and servers, are equipped with multiple CPU cores, while public Clouds offer hundreds of Virtual Machines on-tap using a pay as you go model. As a consequence, single processor systems are giving way to distributed computing as the new normal.

These emerging computing platforms have also democratized access to advanced computing resources, lowering the barrier to entry for a large pool of scientists. Thus, uniform “supercomputing” resources are being ably complemented by accelerated and hybrid distributed computing infrastructure that are often cheaper, power efficient, easier to access and widely available.

The contemporary space of distributed platforms includes commodity clusters, based off affordable entry-level servers, Ethernet connectivity and spinning disks (or more recently, SSDs). These may be abstracted using private Cloud fabrics such as OpenStack or containerized approaches to offer an on-demand model of resource allocation. They may also have “Big Data” platforms, such as Hadoop/MapReduce, Giraph/Pregel and Storm/Spark deployed, either on bare-metal or VMs, for a Platform as a Service model. A similar setup is possible in the public/commercial Cloud space, with IaaS Clouds from Amazon AWS and Microsoft Azure.wwc

Challenges

At the DREAM:Lab, we tackle several challenges that these distributed systems present. The ability to use the elastic resource intelligently while also ensuring robust execution in the presence of unreliable commodity hardware is a challenge. Offering deterministic quality of service for time sensitive applications running in a multi-tenant virtualized environment with non-uniform performance, and being able to trade-off cost (in real ₹/$) and energy (kWh) against performance metrics is another problem. The latter is made more interesting through innovations like spot Cloud markets.

More interesting distributed systems are emerging in the medium to long term, in the commodity cluster and Cloud space, that we are exploring. In particular, energy efficiency is a key concern for data centre operations and we are seeing significant innovation from server vendors. These range from ARM’s 64-bit low-power processors, to AMD’s APUs and Systems on a Chip (SoC) to Intel’s Near Threshold Voltage computing. Mobile platforms also offer a tangible computing surface, given the multi-core capabilities of current generation Smart Phones. As these systems come to market, and are deployed at data centers at scale, it opens up valuable opportunities it investigate algorithms, techniques, models and frameworks to make the optimal use of such heterogeneous computing resources.