Scalable Graph Analytics
Vertex-centric distributed graph programing models, such as Google’s Pregel/Apache Giraph, are similar to MapReduce but for large graph datasets. Here, the computation logic is developed from the perspective of a single vertex and executed in a Bulk Synchronous Parallel (BSP) pattern. We have proposed a sub-graph centric distributed processing model for graphs that offers the additional flexibility of using shared memory algorithms to deliver significant performance benefits, as demonstrated in our publications. We operate in a commodity cluster/Cloud environment where the graph is k-way partitioned over its vertices across k machines, and independent tasks operate on subgraphs — weakly connected components — within each partition. The GoFFish software platform is a prototype implementation of this abstraction.
As ongoing research, we are modeling the complexity of such subgraph-centric graph algorithms to help predict runtime behavior and optimize their scheduling and partitioning in distributed environments. We are also extending existing shared-memory graph algorithms to subgraph-centric distributed algorithms to significantly improve their performance.
There is a novel and emerging class of dynamic graphs, such as in social networks where new members join (new vertices), new friends are made (new edges), and messages are posted (vertex and edge values change). We explore the temporal dimensions of changing graphs, that we term time-series graphs, with proposed programming abstractions to develop new time-series graph algorithms, perform graph querying, and scale their storage and analysis on distributed systems. This is an extension to GoFFish.
Our current research topics includes
1. Design and analyze different subgraph centric algorithm on GoFFish platform and develop an empirical model for complexity in terms of computation, network and I/O costs.
2. Efficient scheduling of distributed tasks on different partitions.
3. Designing an efficient graph partitioning strategy for GoFFish.
- Ravikant Dindokar and Yogesh Simmhan, “Elastic Resource Allocation for Non-stationary Distributed Graph Algorithms”, in 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (IEEE/ACM CCGrid 2016),
- Ravikant Dindokar, Neel Choudhury, and Yogesh Simmhan, “Analysis of Subgraph-centric Distributed Shortest Path Algorithm,” in International Workshop on Parallel and Distributed Computing for Large Scale Machine Learning and Big Data Analytics (ParLearning), 2015.
- Yogesh Simmhan, Neel Choudhury, Charith Wickramaarachchi, Alok Kumbhare, Marc Frincu, Cauligi Raghavendra and Viktor Prasanna, Distributed Programming over Time-series Graphs, IEEE International Parallel & Distributed Processing Symposium (IPDPS) , 2015
- Nitin Chandra Badam and Yogesh Simmhan, Subgraph Rank: PageRank for SubgraphCentric Distributed Graph Processing, International Conference on Management of Data (COMAD) , 2014