Second International Workshop on

High Performance Big Graph Data

Management, Analysis, and Mining

November 1, 2015

To be held in conjunction with the

2015 IEEE International Conference on Big Data (IEEE BigData 2015)
Hyatt Regency Santa Clara, Santa Clara, CA, USA

Workshop Description

Modern Big Data increasingly appears in the form of complex graphs and networks. Examples include the physical Internet, the world wide web, online social networks, phone networks, and biological networks. In addition to their massive sizes, these graphs are dynamic, noisy, and sometimes transient. They also conform to all five Vs (Volume, Velocity, Variety, Value and Veracity) that define Big Data. However, many graph-related problems are computationally difficult, and thus big graph data brings unique challenges, as well as numerous opportunities for researchers, to solve various problems that are significant to our communities.

Big graph problems are currently solved using several complementary paradigms. The most popular approach is perhaps by exploiting parallelism, through specialized algorithms for supercomputers, shared-memory multicore and manycore systems, and heterogeneous CPU-GPU systems. However, since real-world graphs are sparse and highly irregular, there are very few parallel implementations that can actually deliver high performance. The major challenges to scaling and efficiency include irregular data dependencies, poor locality, and high synchronization costs of current approaches. In addition to parallelism, researchers are developing approximation algorithms that use sampling for compressing and summarizing graph data. Streaming algorithms are also being considered for scenarios where the rate of updates is too fast to process the entire graph in a single pass. Further, out-of-core algorithms are necessary for massive graphs that do not fit in the main memory of a typical system. Researchers can use graph-based solutions for solving problems from many diverse disciplines, including routing and transportation, social networks, bioinformatics, computational science, health care, security and intelligence analysis.

This workshop aims to bring together researchers from different paradigms solving big graph problems under a unified platform for sharing their work and exchanging ideas. We are soliciting novel and original research contributions related to big graph data management, analysis, and mining (algorithms, software systems, applications, best practices, performance). Significant work-in-progress papers are also encouraged. Papers can be from any of the following areas, including but not limited to:

Submissions must be at most 8 pages long, including all figures, tables, and references. They must be formatted according to the style files used by the IEEE BigData 2015 conference proceedings. Papers must be submitted online through the workshop submission page by 11.59 pm PDT (Pacific Daylight Time) on September 5, 2015.

Past Workshops

BigGraphs 2014

Important Dates

Keynote

Mohammed J. Zaki
Mohammed J. Zaki
Professor, Dept. of Computer Science and Engineering
Rensselaer Polytechnic Institute

Distributed Graph Mining in Massive Networks
Abstract: Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms are not a good match for distributed graph mining problems like mining frequent subgraphs. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics.

In this talk I will present Arabesque, a distributed data processing platform for implementing graph mining algorithms. Arabesque defines a high-level API that explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions. I will also talk about techniques to scale graph mining methods to web-scale networks with billions of nodes and edges.

Keynote talk slides

Papers

Guoyao Feng, Xiao Meng, and Khaled Ammar
DISTINGER: A Distributed Graph Data Structure for Massive Dynamic Graph Processing

Shaikh Arifuzzaman, Maleq Khan, and Madhav Marathe
A Fast Parallel Algorithm for Counting Triangles in Graphs using Dynamic Load Balancing

Janani Balaji and Rajshekhar Sunderraman
Scalable Storage Structure for Pattern Matching on Big Graph Data

Olivier Cure, Hubert Naacke, Tenfry Randriamalala, and Bernd Amann
LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs

Harris Lin, Ngot Bui, and Vasant Honavar
Learning Classifiers from Remote RDF Data Stores Augmented with RDFS Subclass Hierarchies

Alireza Rezaei Mahdiraji and Peter Baumann
MQuery: A Query Language for Scientific Meshes

Workshop Organizers

Mohammad Al Hasan
Mohammad Al Hasan
Department of Computer and Information Science
Indiana University - Purdue University
Indianapolis, IN 46202

Kamesh Madduri
Kamesh Madduri
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802

Fengguang Song
Fengguang Song
Department of Computer and Information Science
Indiana University - Purdue University
Indianapolis, IN 46202

Program Committee

Leman Akoglu (Stony Brook University)
Medha Atre (University of Pennsylvania)
Juan Colmenares (Samsung Research America)
Oded Green (Georgia Institute of Technology)
Lian Liu (University of Kentucky)
Mahantesh Halappanavar (Pacific Northwest National Laboratory)
Mohammad Al Hasan (Indiana University Purdue University)
Kamesh Madduri (The Pennsylvania State University)
Erik Saule (University of North Carolina at Charlotte)
Fengguang Song (Indiana University Purdue University)
Chen Tian (Huawei Technologies USA)
Stanimire Tomov (University of Tennessee Knoxville)
Yinglong Xia (IBM Research)
Mohammed J. Zaki (Rensselaer Polytechnic Institute)

Contact

Please send email to one of the workshop organizers.