Eighth International Workshop on
High Performance Big Graph Data
Management, Analysis, and Mining
December 16, 2021
To be held in conjunction with the
2021 IEEE International Conference on Big Data (IEEE BigData 2021)
Workshop Description
Modern Big Data increasingly appears in the form of complex graphs and networks. Examples include the physical Internet, the world wide web, online social networks, phone networks, and biological networks. In addition to their massive sizes, these graphs are dynamic, noisy, and sometimes transient. They also conform to all five Vs (Volume, Velocity, Variety, Value and Veracity) that define Big Data. However, many graph-related problems are computationally difficult, and thus big graph data brings unique challenges, as well as numerous opportunities for researchers, to solve various problems that are significant to our communities.
Big graph problems are currently solved using several complementary paradigms. The most popular approach is perhaps by exploiting parallelism, through specialized algorithms for supercomputers, shared-memory multicore and manycore systems, and heterogeneous CPU-GPU systems. However, since real-world graphs are sparse and highly irregular, there are very few parallel implementations that can actually deliver high performance. The major challenges to scaling and efficiency include irregular data dependencies, poor locality, and high synchronization costs of current approaches. In addition to parallelism, researchers are developing approximation algorithms that use sampling for compressing and summarizing graph data. Streaming algorithms are also being considered for scenarios where the rate of updates is too fast to process the entire graph in a single pass. Further, out-of-core algorithms are necessary for massive graphs that do not fit in the main memory of a typical system. Researchers can use graph-based solutions for solving problems from many diverse disciplines, including routing and transportation, social networks, bioinformatics, computational science, health care, security and intelligence analysis.
This workshop aims to bring together researchers from different paradigms solving big graph problems under a unified platform for sharing their work and exchanging ideas. We are soliciting novel and original research contributions related to big graph data management, analysis, and mining (algorithms, software systems, applications, best practices, performance). Significant work-in-progress papers are also encouraged. Papers can be from any of the following areas, including but not limited to:
- Graph embeddings and representation learning for graph data
- Graph neural networks
- Deep Learning-based models for learning on graph data
- Extreme-scale computing for large tensor, network, and graph problems
- Parallel algorithms for big graph analysis on HPC systems
- Heterogeneous CPU-GPU solutions to solve big graph problems
- Sampling and summarization of large graphs
- Graph algorithms for large-scale scientific computing problems
- Graph clustering, partitioning, and classification methods
- Scalable graph topology measurement: diameter approximation, eigenvalues, triangle and graphlet counting
- Parallel algorithms for computing graph kernels
- Inference on large graph data
- Graph evolution and dynamic graph models
- Graph streams
- Computational methods for visualization of large-scale graphs
- Graph databases, novel querying and indexing strategies for RDF data
- Novel applications of big graph problems in bioinformatics, health care, security, and social networks
- New software systems and runtime systems for big graph data mining
Regular paper submissions must be at most 10 pages long, including all figures, tables, and references. They must be formatted according to the paper submission formatting guidelines provided in the IEEE BigData 2021 Call for Papers. Additionally, we encourage short paper submissions (at most 6 pages) describing new work in progress.
Past Workshops
BigGraphs 2014
BigGraphs 2015
BigGraphs 2016
BigGraphs 2017
BigGraphs 2018
BigGraphs 2019
BigGraphs 2020
Important Dates
Oct 1, 2021 (11.59 pm Anywhere on Earth time): Submission deadline
Nov 3, 2021: Notification of paper acceptance to authors
Nov 15, 2021: Camera-ready submissions due
Dec 16, 2021: Workshop to be held virtually
Keynote
David A. Bader
Distinguished Professor and Director of the Institute for Data Science
New Jersey Institute of Technology
Solving Global Grand Challenges with High Performance Data Analytics
Abstract: Data science aims to solve grand global challenges such as: detecting and preventing disease in human populations; revealing community structure in large social networks; protecting our elections from cyber-threats, and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and architectures, and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data science for applications in social sciences, physical sciences, and engineering.
Bio: David A. Bader is a Distinguished Professor and founder of the Department of Data Science and inaugural Director of the Institute for Data Science at New Jersey Institute of Technology. Prior to this, he served as founding Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. Dr. Bader is a Fellow of the IEEE, AAAS, and SIAM, a recipient of the IEEE Sidney Fernbach Award, and advises the White House, most recently on the National Strategic Computing Initiative (NSCI) and Future Advanced Computing Ecosystem (FACE). Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 300 scholarly papers and has best paper awards from ISC, IEEE HPEC, and IEEE/ACM SC. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH). Recently, Bader received an NVIDIA AI Lab (NVAIL) award, and a Facebook Research AI Hardware/Software Co-Design award. Dr. Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and General Co-Chair of IPDPS 2021, and previously served as Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. In 2021, ROI-NJ recognized Bader on its inaugural list of technology influencers, and in 2012, Bader was the inaugural recipient of University of Maryland’s Electrical and Computer Engineering Distinguished Alumni Award. In 2014, Bader received the Outstanding Senior Faculty Research Award from Georgia Tech. Bader has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor and Director of an NVIDIA GPU Center of Excellence. In 1998, Bader built the first Linux supercomputer that led to a high-performance computing (HPC) revolution. He is a cofounder of the Graph500 List for benchmarking “Big Data” computing platforms. He is recognized as a “RockStar” of High Performance Computing by InsideHPC and as HPCwire’s People to Watch in 2012 and 2014.
Workshop Program
December 16, 2021
9:15 am -- 12:25 pm U.S. Eastern Standard Time
Location: IEEE BigData 2021 virtual platform.
20-minute presentations (15-minute talk video will be broadcast and 5 minutes for live Q&A)
9:15 am Opening remarks
Workshop organizers
9:20 am Building Graphs at a Large Scale: Union Find Shuffle
Sai Gopal Thota, Mridul Jain, Sai Kiran Reddy Malikireddy, Pruthvi Raj Eranti, Albin Kuruvilla, and Nishad Kamat
9:45 am Event-based Product Carousel Recommendation with Query-Click Graph
Luyi Ma, Nimesh Sinha, Parth Vajge, Jason H.D. Cho, Sushant Kumar, and Kannan Achan
10:00 am Coffee break
10:15 am Solving Global Grand Challenges with High Performance Data Analytics (keynote)
David A. Bader
11:00 am A Pre-training Oracle for Predicting Distances in Social Networks
Gunjan Mahindre, Rasika Karkare, Randy Paffenroth, and Anura Jayasumana
11:20 am HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks
Avishek Bose, Huichen Yang, William Hsu, and Daniel Andresen
11:40 am Correlation and pattern detection in event networks
Valerio Bellandi, Paolo Ceravolo, Samira Maghool, Margherita Pindaro, and Stefano Siccardi
12:00 pm Drug Abuse Detection in Twitter-sphere: Graph-Based Approach
Khaled Mohammed Saifuddin, Muhammad Ifte khairul Islam, and Esra Akbas
12:25 pm Closing remarks
Workshop organizers
Workshop Organizers
Nesreen Ahmed
Intel Labs
Santa Clara, CA 95054
Mohammad Al Hasan
Department of Computer and Information Science
Indiana University - Purdue University
Indianapolis, IN 46202
Shaikh Arifuzzaman
Department of Computer Science
The University of New Orleans
New Orleans, LA 70148
Kamesh Madduri
Department of Computer Science and Engineering
The Pennsylvania State University
University Park, PA 16802
Contact
Please send email to one of the workshop organizers.