First International Workshop on

High Performance Big Graph Data

Management, Analysis, and Mining

October 27, 2014

Full-day Workshop at

2014 IEEE International Conference on Big Data (IEEE BigData 2014)
Hyatt Regency Bethesda, Bethesda, MD, USA

Workshop Description

Modern Big Data increasingly appears in the form of complex graphs and networks. Examples include the physical Internet, the world wide web, online social networks, phone networks, and biological networks. In addition to their massive sizes, these graphs are dynamic, noisy, and sometimes transient. They also conform to all five Vs (Volume, Velocity, Variety, Value and Veracity) that define Big Data. However, many graph-related problems are computationally difficult, and thus big graph data brings unique challenges, as well as numerous opportunities for researchers, to solve various problems that are significant to our communities.

Big graph problems are currently solved using several complementary paradigms. The most popular approach is perhaps by exploiting parallelism, through specialized algorithms for supercomputers, shared-memory multicore and manycore systems, and heterogeneous CPU-GPU systems. However, since real-world graphs are sparse and highly irregular, there are very few parallel implementations that can actually deliver high performance. The major challenges to scaling and efficiency include irregular data dependencies, poor locality, and high synchronization costs of current approaches. In addition to parallelism, researchers are developing approximation algorithms that use sampling for compressing and summarizing graph data. Streaming algorithms are also being considered for scenarios where the rate of updates is too fast to process the entire graph in a single pass. Further, out-of-core algorithms are necessary for massive graphs that do not fit in the main memory of a typical system. Researchers can use graph-based solutions for solving problems from many diverse disciplines, including routing and transportation, social networks, bioinformatics, computational science, health care, security and intelligence analysis.

This workshop aims to bring together researchers from different paradigms solving big graph problems under a unified platform for sharing their work and exchanging ideas. We are soliciting novel and original research contributions related to big graph data management, analysis, and mining (algorithms, software systems, applications, best practices, performance). Significant work-in-progress papers are also encouraged. Papers can be from any of the following areas, including but not limited to:

Submissions must be at most 8 pages long, including all figures, tables, and references. They must be formatted according to the IEEE Computer Society Proceedings manuscript preparation guidelines.

Important Dates

Keynote

Srinivasan Parthasarathy
Professor, Dept. of Computer Science and Engineering and Dept. of Biomedical Informatics
The Ohio State University

Large Scale Data Analytics: Challenges, and the role of Stratified Data Placement
Abstract: With the increasing popularity of XML data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this ecosystem according to the needs of the application to maximize locality, balance load, minimize data skew or even take into account energy consumption. Results on several real-world applications validate the efficacy and efficiency of our approach.

Papers

Amlan Chatterjee, Sridhar Radhakrishnan, and Chandra N. Sekharan
Connecting the dots: Triangle completion and related problems on large data sets using GPUs

Naga Shailaja Dasari, Ranjan Desh, and Zubair M
ParK: An Efficient Algorithm of k-core Decomposition on Multicore Processors

William Eberle and Lawrence Holder
A Partitioning Approach to Scaling Anomaly Detection in Graph Streams

Ghizlane Echbarthi and Hamamache Kheddochi
Fractional Greedy and Partial Restreaming Partitioning : New Methods For Massive Graph Partitioning

S. M. Faisal, Srinivasan Parthasarathy, and P Sadayappan
Global graphs: A middleware for large scale graph processing

Ronald Hagan, Charles Phillips, Kai Wang, Gary Rogers, and Michael Langston
Toward an Efficient, Highly Scalable Maximum Clique Solver for Massive Graphs

David Mizell, Kristyn Maschhoff, and Steve Reinhardt
Extending SPARQL with graph functions

Josephine Namayanja and Vandana Janeja
Change Detection in Temporally Evolving Computer Networks: A Big Data Framework

Christian L. Staudt, Yassine Marrakchi, and Henning Meyerhenke
Detecting Communities Around Seed Nodes in Complex Networks

Ichitaro Yamazaki, Theo Mary, Jakub Kurzak, Stanimire Tomov, and Jack Dongarra
Access-averse Framework for Computing Low-rank Matrix Approximations

Angen Zheng, Alexandros Labrinidis, and Panos Chrysanthis
Architecture-Aware Graph Repartitioning for Data-Intensive Scientific Computing

Workshop Organizers

Mohammad Al Hasan
Department of Computer and Information Science
Indiana University-Purdue University
Indianapolis, IN 46202

Kamesh Madduri
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802

Fengguang Song
Department of Computer and Information Science
Indiana University - Purdue University
Indianapolis, IN 46202

Program Committee

Nesreen Ahmed, Purdue University
Medha Atre, University of Pennsylvania
Mohammad Al Hasan, Indiana University-Purdue University
Aydin Buluc, Lawrence Berkeley National Laboratory
Kamesh Madduri, The Pennsylvania State University
David Mizell, YarcData/Cray Inc.
Xia Ning, NEC Laboratories
Siva Rajamanickam, Sandia National Laboratories
Saeed Salem, North Dakota State University
Manu Shantharam, San Diego Supercomputer Center
Fengguang Song, Indiana University-Purdue University
Guangming Tan, Chinese Academy of Sciences
Chen Tian, Huawei Technologies USA
Stanimire Tomov, University of Tennessee Knoxville
Jeff Vetter, Oak Ridge National Laboratory and Georgia Tech
Daniel Waddington, Samsung Research America
Mohammed J. Zaki, Rensselaer Polytechnic Institute

Contact

Please send email to one of the workshop organizers.