Software cyber-infrastructure bridge the gap between theory — abstractions and algorithms — and their effective execution in practice on real distributed systems. Programming frameworks form critical pieces of the software cyber-infrastructure stack, and help compose and execute real applications at scale onto distributed computing resources. For e.g. MapReduce is a programming abstraction, Pagerank an algorithm mapped to MapReduce, while Apache Hadoop is framework that executes it on commodity clusters. Such empirical validation and benchmarking is also important given the tectonic shifts that are taking place in the hardware cyber-infrastructure space, with innovative architectures and resource access models. This is where the rubber meets the road! [i]box

In developing frameworks, several classes of research problems emerge. One is of scheduling application logic to the distributed computing resources. Such resource allocation decisions are guided by various factors: unique characteristics of the hardware such as elasticity and specialized accelerators; distinctive QoS needs of the applications such as throughput, resiliency, cost, and deadline constraints; pricing and resource constraint models such as spot markets and pay-as-you-go schemes; constraints/opportunities posed by the programming abstractions that suggest data partitioning and co-location, disk and network I/O optimizations, and data/task dependencies. Even within the confines of the graph and stream analytics abstractions, these offer a rich space for exploration when coupled with the novel features of Cloud computing and other emerging hybrid infrastructure. In particular, scalable computing is a key tenet of these research activities.


[i] “Where the Rubber Meets the Sky: Bridging the Gap between Databases and Science,” pdf, Jim Gray, Alexander S. Szalay, MSR-TR-2004-110, October 2004, IEEE Data Engineering Bulletin, December 2004, Vol. 27.4, pp. 3-11.