Temporal Graphs

Interval-centric Model for Distributed Computing over Temporal Graphs

Swapnil Gandhi

Motivation

The rapid growth in web and social networks has led to the emergence of large-scale linked data which have both a large network structure and associated key-value properties. Such property graphs are processed at scale by distributed graph platforms like Apache Giraph and GraphX, and using vertex-centric programming models like Google’s Pregel. Graph datasets in emerging applications domains also have temporal variability, such as community evolution in social networks or traffic flows in transport networks. However, there is limited work on distributed programming abstractions to design algorithms over such temporal graphs, or scalable platforms to process them. Prior works have reduced temporal graphs into static graphs for processing – as a series of graph snapshots, or an expanded structure that introduces vertex replicas for each time point. These are inelegant, inefficient or limited in the algorithms they support.

Contributions

We propose a novel interval-centric model (ICM) of computing to operate over such temporal graphs. Our model combines the benefits of a component-centric message-passing model like Pregel and the functional primitives like Spark, besides surfacing time as a first-class entity, to ease temporal graph processing. ICM supports both Time-Dependent and Time-Independent algorithms uniformly, and out-performs existing baseline abstractions by an order of magnitude.

Software

  • TBD

Publications

  • S. Gandhi, and Y. Simmhan, “An Interval-centric Model for Distributed Computing over Temporal Graphs”, 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, Texas. (To appear)

Online Analytics over Time-evolving Graphs

Swapnil Gandhi

Motivation

Many modern applications in diverse fields of science like social, transport, communication and financial networks generate an enormous amount of linked data, captured at an increasing fine granularity. Often such rich linked data is modeled in-form of a graph and needs to be processed within minutes of its arrival. For Example, an advertiser may want to measure user engagement for an active marketing campaign in real-time. Despite the dynamism exhibited in such emerging applications, systems that enable processing such dynamic data are a relatively unexplored area. Existing graph processing solutions are either designed for static graph processing (Apache Giraph, GraphLab, GraphX), which lacks support for dynamic updates or supports offline batch processing (Kineograph, GraphTau), where point-in-time graph snapshots are processed incrementally. On the other end of spectrum, stream processing engines (Apache Storm, Flink) are capable of processing online data at low latency but lack support for graph abstractions.

Distributed Storage and Querying over Temporal Property Graphs

Shriram R and Swapnil Gandhi

Motivation

<>

Contributions

<>

Software

  • TBD

Publications

  • TBD

Motif-based Streaming Graph Partitioning

Siddharth Jaiswal

Motivation

Partitioning is a very significant problem for large graph datasets. Distributed algorithms leverage the partitioned storage and their performance is correlated to the partitioning itself. Therefore, identifying a good partitioning solution becomes important to allow algorithms to scale through this distributed storage and to improve downstream performance for the algorithm itself. Streaming Partitioning is relevant because it allows partitioning without having to store the entire graph in single machine or distributed storage, thus allowing for better scalability and also immediate consumption by downstream algorithms for real time analytics. Identifying motifs(triangles, squares, etc) as part of the partitioning objective allows every partition to be dense in the number of motifs present on it. This should improve performance of algorithms which depend on locality and density of a graph neighbourhood.

Contributions

  • Partitioning of a streaming graph input in the form of an edge list such that the motifs identified on each partition are maximised along with keeping the vertex replication factor to a minimum.
  • Comparative study against state-of-the art solutions for edge based Streaming Graph Partitioning solutions.

Publications

  • TBD